Skip to Main Content

SOCI 4392 & SOCI 3320: Research Methods & Sociology Capstone Research Guide

Data Basics

""When doing research in Sociology, you may find it helpful to refer to data or statistics to support your arguments or provide you with more context. But what exactly are data and statistics? To start, it's important to note that these terms mean different things and are not meant to be used interchangeably. While data can be considered unique pieces of information that can be analyzed, statistics are often the result of doing that analysis to answer questions of "why" or "how."

 

 

You can also watch the following video from the University of Houston to learn more about different types of data and how they might be used:

Acknowledgment: Information in the box came from the University of Houston Libraries Finding Data Research Guide.

Data Definition

Data is defined as facts or information that can be used for reporting, calculations, planning, or analysis. Data can be analyzed and interpreted using statistical procedures to answer “why” or “how.” Data is used to create new information and knowledge, and has the following characteristics:

  • "Disaggregated" collection of observations with one or more characteristics
  • Generally requires manipulation or extraction using utilities
  • Can be values or observations of characteristics

Qualitative data describes the qualities or characteristics of something. It is non-numerical and often collected through interviews, participant observation, and focus groups. It can be subjective and typically describes a perception or point of view. It is particularly useful for gaining cultural insight into the social contexts and beliefs of a particular population. Qualitative data can take the form of field notes, audio, transcripts, and video.

Quantitative data attempts to quantify an answer to a question(s). It is numerical and often collected through measurements, surveys, and observations. Quantitative data is analyzed usually in programs such as Excel, R, SPSS, STATA, and more.

Additional Terminology:

  • Primary Data are data collected through your own research study directly through instruments such as surveys, observations, etc.
  • Raw Data are the actual observations that are made when the data is collected.
  • Secondary Data are data from a research study conducted by someone else.  Usually when you are asked to locate statistics on a topic you are using secondary data. 
  • Time Series is a sequence of data points spaced over time intervals.
  • Codebook describes the contents, structure, and layout of a data collection. Learn more.
  • Data Archive preserves and makes accessible research data.  Microdata are data on the lowest level of observation such as individual answers to questions.  For example, the U.S. Census Bureau's Public-Use Microdata Samples (PUMS files) is a data set of individual housing unit responses to census questions.

Open Data

Open data and content can be freely used, modified, and shared by anyone for any purpose. Open data should

  • have no monetary cost associated with use. That is, open data should be free.
  • be usable by as many people or organizations as possible. It must be available in a machine readable format that is easily accessible for processing on computers.
  • be available for commercial and non-commercial purposes and for combining with other datasets.
  • require no login or personal account to access.

For more information about open data and examples of open data, see unlocking the power of open data. A tutorial created by the Data Equity for Main Street Project, a partnership between the California State Library and the Washington State Office of the Chief Information Officer and funded by the John S. and James L. Knight Foundation.  

Proprietary data

Proprietary data are generally documented in contracts and legally should not be published or disclosed to outside entities. Proprietary data may be protected under copyright, patent, or trade secret laws. Examples of proprietary data include:

  • financial data
  • product research and development
  • computer software
  • business processes and marketing strategies

Data from library subscription databases are proprietary data. The use of data from library databases requires authentication, and generally cannot be shared freely on the web or with people outside of the university.

Restricted-use data

Restricted-use data contain sensitive information (i.e., information that can cause potential harm if revealed) or information that enables the potential identification of respondents. Data may also be restricted-use because of confidentially promises or proprietariness.

Examples of sensitive information are reports of sexual behavior, criminal history, drug use, mental health history, HIV status, information collected from minors, or other materials that warrant extra discretion.

However, such data offers potential for research. Therefore, the government and the University want to ensure that restricted data is handled in a way that will safeguard the respondents/research subjects while allowing access to research which benefits our society as a whole. Files containing the confidential information are available to researchers only under certain conditions and agreements. Standard requirements may include the following:

  • Your institution must classify you as a Principal Investigator (PI), eligible to lead a research project.
  • Proof of IRB review. For more information about IRB, contact the IRB office.

Strategies for finding & evaluating data

Be specific about your topic so that you can narrow your search, but be flexible enough to tailor your needs to existing sources.

Identify the Unit of Analysis

You should be able to define the following:

Who or What?

Social Unit: This is the population that you want to study.
It can be...

  • People
    For example: individuals, couples, households
  • Organizations and Institutions
    For example: companies, political parties, nation states
  • Commodities and Things
    For example: crops, automobiles, arrests

When?

Time: This is the period of time you want to study.
Things to think about...

  • Point in time
    A "snapshot" or one-time study
  • Time Series
    Study changes over time
  • Current information
    Keep in mind that there is usually a time lag before data will be published.  The most current information available may be a couple years old.
  • Historical information

 Where?

Space: Geography or place.
There are two main types of geographic classifications...

  • Political boundaries
    For example: nation, state, county, school district, etc.
  • Statistical/census geography
    For example: metropolitan statistical areas, tracts, block groups, etc.

Keep in mind...
Data is not available for every thinkable topic. Some data is private, must be purchased, uncollected, or unavailable. Be prepared to try alternative data.

Content from MSU Libraries-Finding Data & Statistics

 

Once you have defined the boundaries of your topic, you can use them to identify search terms, or keywords, to get started in the search process. This will ensure that your search methods are efficient and effective, save you time, and yield the most relevant results. 

  • To get started, divide your topic into different pieces and identify main concepts. 

For example, You are studying education equity in schools and would like to collect median household income for Houston from 2010 to 2015.  For this question, your main concepts are: 2010-2015, median household income, Houston

  • Use these concepts as keywords when searching for your data. Be sure to consider synonyms and word variations when coming up with appropriate search terms.   

For example, instead of “Houston household income" you might try the search terms “Houston income”, “Houston family income” or “Houston family earnings”.

 

Acknowledgment: Information in the box came from the University of Houston Libraries Finding Data Research Guide.

Search strategy #1: Search in a Data Archive

This is a good strategy if you are not sure what types of variables exist or what data would be relevant for your project. Look within a data archive that collects within the general subject area that you are searching for.

Search strategy #2: Targeted search

Ask yourself: Who might collect and publish this type of data?  This can be a good strategy if you are familiar with library databases or have a sense of who is a major source of the sort of data you are seeking. Visit the Data and Statistics Library Databases page, or go to the website of a relevant organization to look for data. There are several commonly used secondary data sources listed on this guide you can try. 

These are some of the main types of producers of statistical information:

  • Government Agencies: The government collects data to aid in policy decisions and is the largest producer of statistics overall. For example, the U.S. Census Bureau, the City of San Antonio, and Texas government are examples of government agencies that collect data. 
  • Non-Government Organizations: Many independent non-commercial and nonprofit organizations collect and publish statistics that support their social platform. For example, the International Monetary Fund, United Nations, World Health Organization, and many others collect and publish statistics.
  • Academic Institutions: Academic research projects funded by public and private foundations create a wealth of data. For example, the Michigan State of the State Survey and many other research projects publish statistics based on their data collection projects. Some statistical publications are available freely online, but others may require access through library resources.
  • Private Sector: Commercial firms collect and publish data and statistics as a paid service to clients or to sell broadly. Examples include marketing firms, pollsters, trade organizations, and business information. This information is almost always is fee-based and may not always be available for public release

Search strategy #3: Turn to the literature.

By searching through existing literature, you can discover datasets. When you find a relevant article, it may point you to the dataset it used. What data sources are they using in their methods?  Are they working with general-purpose datasets, or did the researchers have to collect their own data? This will give you an idea of the possibilities and limitations of data on your topic. If they are using a secondary dataset, you can try to track that source down. Knowing the exact name of a specific source (or even better, the DOI) can make it much easier to locate.

The library provides access to hundreds of databases that you can search to find scholarly articles on your subject. Check out our Databases A-Z list and limit by subject to find databases that may work for your subject area.

Search strategy #4: Google Search

I know, it's obvious!  When searching Google, be sure to identify your topic keywords carefully and try using synonyms. Add in terms like “data” or “statistics” or method terms like survey. 

You may need to include dates and variables you are looking for in your search. For example,  “2010 Houston median household income.” If you are getting too few results, try decreasing the number of concepts in your search. For example, we could change our original search to “Houston income” or “Houston household income ." Another way to broaden your search is to use synonyms or related terms. In our example here, you may also try “Houston family income” or “Houston family earnings”.

Search strategy #5: Ask for help.

Contact your subject librarian for assistance if you encounter problems in locating the data source you need.

Content from MSU Libraries-Finding Data & Statistics

When considering whether or not to use data created by someone else in your research it is important that you are able to evaluate it and determine its usefulness to your work and its validity and trustworthiness. To do this, ask yourself these questions:

  • What is the purpose of the data?
    • What context was it meant to be used in? 
    • Does the data seem to have a bias towards one outcome or answer? Can you acknowledge that bias and still use the data responsibly?
  • Who collected the data?
    • Does the person who collected the data have the necessary skills or training to do so? 
    • Are they an expert in the field? 
  • When was the information collected?
    • Some topics require data collected over a long time period while others require very timely data. 
  • How was the information obtained?
    • What tools did the collector use and are those standard for their field?
    • Was the sample size large enough for the study?
    • What was the response rate of that sample size?
    • How were the respondents selected? Who was asked? Who wasn’t? Why or why not?
  • Does the data make sense in context of previous studies?
    • Data that contradicts other data in the field should be looked at skeptically. What did they do differently to cause such a contradiction? 

 

Acknowledgment: Information in the box came from the University of Houston Libraries Finding Data Research Guide.​​​​​​
 

Data & Statistics

What is secondary data? Secondary data is data that was collected in the past by someone other than the researcher using it. This data can be analyzed to address the researcher’s questions.

Below are examples of resources that provide data or statistics on a variety of topics. This list is not meant to be exhausted.  For more examples, see the Data for Social Sciences Guide.

Public Opinion

Crime and Criminal Justice

Education

Health

State Resources

Campaign & Elections Data

Economics & Housing

The online data sources below cover several disciplines including health, the social sciences, science, and more!