1.2 Data for SOC 4015

If you are enrolled in SOC 4015, you have a choice:

  1. The easier option is to follow the instructions in this section, which direct you to a pre-selected data set and provide you a list of possible outcomes to “study” for the final project.
  2. The more difficult option is to identify your own data set. If you want to pursue this option, see the instructions in the next section.

1.2.1 The General Social Survey

The General Social Survey (GSS) is fielded by the National Opinion Research Center (NORC), which is based at the University of Chicago. It has been in continuous operation since 1972:

The GSS gathers data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes. Hundreds of trends have been tracked since 1972. In addition, since the GSS adopted questions from earlier surveys, trends can be followed for up to 70 years.

The GSS contains a standard core of demographic, behavioral, and attitudinal questions, plus topics of special interest. Among the topics covered are civil liberties, crime and violence, intergroup tolerance, morality, national spending priorities, psychological well-being, social mobility, and stress and traumatic events.

You can read more about the GSS on NORC’s website.

A data set containing the 2016 General Social Survey variables has been created and made available through GitHub. Since some questions are only asked in certain years, the data set that I am providing has been limited only to questions used in 2016. Keep this in mind as you look through the documentation!

1.2.2 Picking an Outcome

There are at least 37 continuous variables (or combinations thereof) that make sense for use as outcomes for the final project. Pick two outcomes, a first choice and a second choice, from the list below.

Every effort will be made to give you your first choice, but in the event that two or more of your colleagues also have the same first choice, I will randomly select the two students who may use that outcome variable.

Possible outcome variables:

  1. hrs1 - respondent’s hours worked last week
  2. prestg10 - prestige of respondent’s occupation
  3. prestg105plus - prestige of respondent’s occupation, alternate formula

  4. sphrs1 - spouse’s hours worked last week
  5. sppres10 - prestige of spouse’s occupation
  6. sppres105plus - prestige of spouse’s occupation, alternate formula

  7. papres10 - prestige of father’s occupation
  8. papres105plus - prestige of father’s occupation, alternate formula

  9. mapres10 - prestige of mother’s occupation
  10. mapres105plus - prestige of mother’s occupation, alternate formula

  11. sibs - number of siblings
  12. hompop - number of people living in household
  13. babies - number of household members under 6 years of age
  14. preteen - number of household members between 6 and 12 years of age
  15. teens - number of household members between 13 and 17 years of age
  16. adults - number of household members over 17 years of age
  17. unrelat - number of household members not related
  18. earnrs - number of earners in the household

  19. sei10 - respondent’s socioeconomic index
  20. spsei10 - spouse’s socioeconomic index
  21. pasei10 - father’s socioeconomic index
  22. masei10 - mother’s socioeconomic index

  23. snsmyear - year first joined social network
  24. intwkdyh - internet use, weekday, hours and intwkdym - internet use, weekday, minutes
  25. intwkenh - internet use, weekend, hours and intwkenm - internet use, weekend, minutes

  26. racethwh - ten point scale for racial identity, white
  27. racethhi - ten point scale for racial identity, Latino
  28. racethbl - ten point scale for racial identity, black or African american
  29. racethas - ten point scale for racial identity, Asian
  30. racethna - ten point scale for racial identity, native american
  31. racethot - ten point scale for racial identity, other

  32. usualhrs - usual number of hours worked per week
  33. mosthrs - greatest number of hours worked per week in last month
  34. leasthrs - least number of hours worked per week in last month

  35. numwomen - number of female partners respondent has had sex with since their 18th birthday
  36. nummen - number of male partners respondent has had sex with since their 18th birthday

  37. agekdbrn - age at birth of first child

There may be other variables in the GSS that can be used as well. The major requirements is that the variable is asked of a majority of respondents in 2016 and is continuous. If you find another variable that you think may work, check with Chris before proceeding.

1.2.4 Selecting Independent Variables

Use the code book included in the final project data release to see if variables similar to those you identified above are also in the GSS. Feel free to also use variables not mentioned in the articles, as long as you can make an argument that they are plausibly connected to the outcome. Your goal here is to create a theoretically motivated list of independent variables that are rooted in the literature. Once again, do this only for the first choice variable you’ve selected.

If this were a more substantial project, you would want to look at far more articles than just two! We typically conduct full literature searches before selecting a group of variables to use in a particular analysis.