Select Page

In the modern age, data is everywhere. We have become so inundated with technology and information that we almost instinctively recognize certain bits of information as ‘data.’ Because of our easy recognition of data points, we can be overwhelmed by the volume of it all, and struggle to recognize where the most relevant and meaningful data comes from. But the power of data in biostatistics cannot be overstated. Without data, there is no study, and without a study, there is nothing to draw from for answers. And with the wrong data, the wrong conclusions could be drawn.

Important steps to take in data:

Understand the Investigations’ Objectives and What Data is Needed

Once a public health crisis is identified, an epidemiologist will task a field investigation to define what data are necessary and relevant to the crisis and its response. Food recalls are the most common steps taken, and these are enacted thanks to investigations that took food and grocery habits into account. It’s important to recognize the objective of the study when you develop a list of data-points to examine. Are you trying to identify a source of an outbreak? Do you need to Identify behaviors that facilitate the spread of illness? Are there environmental factors that need to be identified? Understanding the objectives and purposes of an investigation is critical to its success.

Make the Most of Data Sources

After identifying objectives, an epidemiologist should look for useful sources for addressing the objectives. As stated in the introduction, we are inundated with data collections that could address any number of questions. Don’t think that you need all information to be collected from scratch, there are numerous statistics and data points which can be used without even leaving the lab:

  • Mortality statistics – Death rates and mortality statistics have been collected for hundreds of years, and are commonly available to the public through many sources
  • Disease Reporting – the Council of State and Territorial Epidemiologists typically designates which diseases and conditions are worth putting to the CDC, but each state and territory has mandates for reporting.
  • Laboratory Data – Most states require laboratories that identify diseases or causative agents to send information to public health agencies. 
  • Population Surveys – Though census data is useful, here we are using ‘population surveys’ to mean specific populations – for example, the Pregnancy Risk Assessment Monitoring System tracks issues relating to pregnancy.
  • Environmental Exposure data – Environmental contaminants and zoonotic diseases are two of the most common vectors for disease exposure. Environmental data is hugely important for tracking any disease outbreaks, and data relating to this can come from any number of areas. Animal migration patterns or location distribution could be under this category, local information such as water systems and housing quality could be useful as well.\


Because of the power that data has in Biostatistics, we must take care to appropriately and wisely assess the purpose of our data collection, as well as the quality and efficiency of our sources.