1.3 Data for SOC 5050
These instructions apply to two groups of students:
- You are enrolled in SOC 5050, or
- you are enrolled in SOC 4015 and want to pick your own data set rather than use the 2016 General Social Survey
For both groups of students, these instructions will lay out the process for identifying an appropriate data set for your final project.
1.3.1 Characteristics of an Appropriate Data Set
Data sets for your final project will have a number of salient characteristics:
- It should have a substantial sample size of at least several hundred respondents
- There should be a continuous variable that is a suitable outcome variable - i.e. something that we can estimate variation based on other variables in the data
- There should be at least six to eight possible constructs that can be used as independent variables - i.e. the variables that are used to estimate variation in our outcome
1.3.2 Other Considerations
For those of you enrolled in SOC 5050, there are a few other considerations to take into account. If you have already identified a possible thesis topic, pick a data set that is either a possible candidate for inclusion in your thesis or, at the very least, is conceptually related. You want to maximize the impact that your coursework has, so even if you are not sure whether or not the data set itself will be helpful, picking something in the same topic area will mean that your literature search can be put to use on future assignments (such as in your Research Methods course).
1.3.3 Finding an Appropriate Data Set
In general, you are free to use any resource to identify a suitable data set that meets the above criteria with a couple of caveats:
- There is not time for you to collect your own data.
- There is not time for you to go through the IRB process to gain access to confidential data (either data that is not publicly available or data collected by a thesis adviser or other faculty member).
- The data you use should be licensed for re-use (it cannot be proprietary or otherwise restrictively licensed).
- The data should be well documented - you want to be very sure what each variable represents. If there is no code book or documentation, the data set is probably not appropriate for this project. See Chris if you have questions about this.
- The data should not be the 2016 General Social Survey. See Chris if you want to use another iteration of the GSS.
- If your data are longitudinal, you will want to pick data from one time period.
If you are not sure where to start, the best option is to search through ICPSR - the Inter-university Consortium for Political and Social Research. SLU is an institutional member, so you will want to log in to ICPSR’s website by using this link and entering in your SLU login credentials. Once you are logged in, you can use the search tools to conduct keyword searches.
1.3.4 A Quick Literature Search
Once you have an outcome identified, go to Sociological Abstracts, enter your SLU login credentials, and conduct a keyword search using the main construct represented by your selected outcome variable.
For example, if I picked a hypothetical variable sushi
, representing the number of times the respondent had eaten sushi in the last year, I might use “sushi” or “Japanese food” as search terms.
Look for two recent peer reviewed articles (i.e. in the last twenty or so years) that assess this same outcome quantitatively (i.e. using statistics), read the articles, and take note of the independent variables used. Be aware that Soc Abstracts will also return results from theses and other documents, so be sure to restrict your search and reading to peer reviewed journal articles.
Qualitative studies can (and should!) also be used to inform variable selection, but since you are only being asked to find two relevant articles, we are going to prioritize quantitative research here.
1.3.5 Selecting Independent Variables
Use the documentation included with your data to see if variables similar to those you identified above are also in the the data set you’ve selected. Feel free to also use variables not mentioned in the articles, as long as you can make an argument that they are plausibly connected to the outcome. Your goal here is to create a theoretically motivated list of independent variables that are rooted in the literature.
If this were a more substantial project, you would want to look at far more articles than just two! We typically conduct full literature searches before selecting a group of variables to use in a particular analysis.