# Assignment 1

## Assignment 1

1. Identify the level of measurement for each of the following variables:

(A) Region of birth

(B) Favorite academic subject (rank 1, 2, 3, etc.)

(C) Weight in pounds

(D) Opinion of President Bush (Favorable, Neutral, Unfavorable)

(E) Verbal SAT Score

(F) Year in College

(G) College Major

2. For each of the following research questions, identify the unit of analysis, the independent variable, and the dependent variable:

(A) Do poorer neighborhoods have higher rates of former prisoners living in them?

(B) Are children of fathers who have not attended college less likely to attend college?

(C) Are individuals who are high school dropouts less likely to vote than those with more education?

3. Assume that the answers to the research questions in Problem 2 are all yes. Consider the following additional variables that go with the above questions:

(A) percent African-American

(B) family income

(C) age

For each research question, decide whether the additional variable is a possible source of spuriousness, a possible mechanism, or a possible modifier of the implied relationships in question 2. Explain each answer in one or two sentences and draw a diagram showing how each additional variable is related to the independent and dependent variables from each part of Question 2. (You should end up with 3 explanations and 3 diagrams).

[Note that
mechanism and
mediator are synonyms and that
interaction,
specification, and
modification are synonyms.]

4. Suppose that a professor is interested in determining how much time her students spend on studying. On a Monday morning, she decided to conduct a survey of her class in which she asked students to list the number of hours they spent on their coursework during the past two days. The numbers reported by each student were as follows:

7, 5, 12, 10, 8, 6, 33, 13, 8, 7, 11, 10, 6, 9, 43, 7, 9, 8, 11, 9, 10, 38, 12, 9, 11.

Show your work for each of the following (for repetitive calculations, such as relative frequency, you only have to show work for the first few repetitions):

(A) Determine absolute and relative frequencies (i.e., proportions) of each value as well as the cumulative relative frequencies at each point. Do this by filling out a table that follows the following format:

 Hours doing coursework ( X) Frequency ( f) Relative Frequency ( p) Relative Cumulative Frequency ( p)

(Note: you should include as many rows as you need to display frequencies for each data value). Do not use “grouped frequencies” (as discussed on p.37-38) in the rows of this table – in other words, do not group the data into class intervals. Instead, use every possible value in the entire range as a row.

(B) Plot your data on a frequency histogram. This time you will need to group your data into class intervals. Follow the example provided in Figure 2.2, p.42. Explain how you picked your intervals.

(C) Describe the shape of your frequency histogram.

## Lecture One

·
Statistics:

·
Descriptive:

· What? Distribution of one variable (i.e., its center and dispersion); joint distribution of two variables (i.e., their relationship).

· How? Tabular, graphic, and numerical.

·
Inferential:

· (Probability) sample population (of interest)

Sample statistic population parameter

· How? Confidence interval, hypothesis testing.

·
Variable: things that vary. That is, different cases have different values. [draw data structure]

·
Unit of analysis: what the statement/variable is about. To see why this matters, consider the following hypothetical example:

· Are female applicants less likely to get admitted than male applicants
at the department level?

· Are female applicants less likely to get admitted than male applicants
at the college level?

·
Scale of measurement:

· Variables are always coded using numeric values, but values have different functions.

·
Nominal: values are used for differentiation only. Race, gender, marital status, etc.

·
Ordinal: ranking order of values is meaningful. Life satisfaction, level of support, etc.

·
Interval: distance between values is meaningful. HDI, IQ, temperature, etc.

·
Ratio: true zero, i.e., 0 means “none”. Income in dollars, number of children, etc.

· Usually, we call both interval and ratio variables
continuous variables.

· Why important? It determines what technique to use.

· [optional] Reliability and validity issues:

·
Reliability: consistency across repeated measurements.

·
Validity:

·
Construct validity: are we measuring what we want to measure?

·
Internal validity: causality = association + temporal order + lack of spuriousness.

·
External validity: generalizability.

·
Reliability and
construct validity are criteria for measurement evaluations.

·
Internal and
external validities are criteria used to evaluate research designs.

· There are more kinds of validities, which mean different things to different researchers.

· More on
variable:

· Why variable? Population heterogeneity == the fundamental truth of social science.

· Essentialism vs. Population thinking: is variation REAL?

· Data structure: row == case, column == variable

· What method to use: depends on the scale of measurement/type of the variable(s)

· What it is about: unit of analysis

· Variables in relationship:
independent variable (a.k.a. predictor, explanatory variable, exogenous variable)
dependent variable (a.k.a. response, outcome, endogenous variable)

· Univariate description:

· The choice of methods (mainly) depends on the level of measurement.

·
Frequency Table: title, frequency count (
f), relative frequency (
p), relative cumulative frequency

·
Bar Chart: title, axes, labels, bars, (spacing b/w bars)

·
Histogram: title, axes, labels, bars, (no spacing b/w bins), (class interval/bin width), (skewness)

·
Contingency Table: cross-classification, cell counts, marginal totals, row/column percentages.

· To be continued…

· Bivariate relationship:

·
Association: X is associated/correlated with Y.

·
Influence: X has an impact on Y.

·
Causality: X causes Y.

· (Randomized Controlled) Experiment: random assignment

· Observational Study: association/influence + temporal order + no spuriousness

· Important implication: association or influence ≠ causality

· A third variable Z might complicate the observed bivariate relationship b/w X and Y:

·
Spuriousness: the observed XY relationship is due to a common cause Z.

· Interaction/specification/
modification: the observed XY relationship differs toward different groups of Z.

·
Mediation/mechanism/intervening: the observed XY relationship mediates through Z.

### Practice Problems

1. Identify the level of measurement for each of the following variables:

(A) Type of residence (i.e., dorm, off-campus apartment, condominium, or parent’s home, other)

(B) Height in inches

(C) The rating of the overall quality of a textbook on a scale from “Excellent” to “Poor”

(D) Lab section

(E) Level of measurement

2. For each of the following research questions, identify the unit of analysis, the independent variable, and the dependent variable:

(A) Are social movement participation rates higher in countries that have a longer history of democratic rule?

(B) Do school districts with more highly educated teachers tend to have higher standardized test scores among the students in the district?

3. Assume that the answers to the research questions in Problem 2 are both yes. Consider the following additional variables that go with the above questions:

(A) median household income of the country

(B) median household income of the school district

For each research question, decide whether the additional variable is most likely to be a source of spuriousness, a possible mechanism, or a possible modifier of the implied relationships in question 2. Explain each answer in one or two sentences and draw a diagram showing how each additional variable is related to the independent and dependent variables from each part of Question 2.

[Note that
mechanism and
mediator and intervenor are synonymous and that
interaction,
specification, and
modification are synonyms.]

4. Among many other things, the United Nations Development Program collects data from various countries on the percentage of the adult workforce made up by women. Here are the percentages for 25 countries included in a study done by the program in the mid-1990’s: 37; 39; 46; 39; 46; 42; 39; 27; 48; 39; 42; 42; 45; 45; 42; 39; 30; 47; 42; 30; 37; 42; 46; 42; 42.

Show your work for each of the following (for repetitive calculations, such as relative frequency, you only have to show work for the first few repetitions):

(A) Determine absolute and relative frequencies (i.e., proportions) of each value as well as the cumulative relative frequencies at each point. Do this by filling out a table that follows the following format:

 % women in the workforce ( X) Frequency ( f) Relative Frequency ( p) Relative Cumulative Frequency ( p) 27 1 4 4 30 2 8 12 37 2 8 20 39 5 20 40 42 8 32 72 45 2 8 80 46 3 12 92 47 1 4 96 48 1 4 100

(Note: you should include as many rows as you need to display frequencies for each data value). Do not group the data into class intervals. Instead, use one row for each value present in the dataset.

(B) Plot your data on a frequency histogram. Now you will need to group your data into class intervals. Describe how you determine the intervals.

(C) Describe the shape of your frequency histogram.

## Sheet1

 Example 1-1: Department vs. College Male Female # of applicants % admitted # of applicants % admitted College 2175 47 0 849 31 Dept A 825 62 108 82 511.5 88.56 Dept B 560 63 25 68 352.8 17 Dept C 417 33 375 35 137.61 131.25 Dept D 373 6 341 7 22.38 23.87 1024.29 260.68

## Sheet1

 THE BIG PICTURE Tabular Graphic Numerical Univariate Discrete Nominal Frequency Table Bar Chart Mode Ordinal Mode, Median, Mean Continuous Interval Histogram Ratio Bivariate Discrete-Discrete Contingency Table TBC TBC