Where Does Healthcare Data Come From? Identifying Sources of Data in the Healthcare Industry

When we talk about Big Data in the healthcare industry, or in any industry, we’re not talking about a single repository of data. Instead, we’re referring to the culmination of a large amount of data from many sources. Healthcare data is complex and comes from a variety of sources. While each of those sources could be the subject of its own thorough review and investigation, we’ll briefly touch on the sources most commonly used to aggregate healthcare data.

Clinical Data

Sources of healthcare data: clinical data.

Clinical data is most likely what comes to mind when the average person thinks about healthcare data. It’s usually derived from sources like trials and health records and can be informative when it comes to health trends, demographic data, and treatment development.

Electronic Health Records (EHR)

Electronic Health Records (EHR) contain information about individual patients. On their own, they serve the same purpose as a physical file. They contain information about a single patient, that patient’s history, their medications, any diagnoses they have, and other pertinent information. However, collectively, EHRs can provide extremely useful information regarding trends, patterns, treatment success, prescription efficacy, and more.

Clinical Trials Data

Perhaps the most obvious source of clinical data is clinical trials data. These trials are conducted frequently to make a variety of medical determinations. Some clinical trials exist to test the efficacy of a certain treatment on a particular disease or condition. Others exist to deepen existing knowledge about a certain condition and more fully explore treatment options based on the data collected. While clinical trials don’t typically have an end goal of informing the Big Data collective, they inevitably do so anyway. It’s a fortunate reality that even individual pursuits aimed at specific understandings can inform the larger healthcare data pool and lead to benefits that reach far beyond the trial’s initial realm of consideration and intent.

Health Surveys

Health surveys also contribute to healthcare data. Surveys can be used in a number of ways and can provide data on everything from patient habits, patient sentiment, lifestyle choices, satisfaction ratings, health and safety habits, and more. This data, when analyzed cumulatively in a big data context, can provide valuable information that healthcare researchers and practitioners can utilize to make better-informed decisions.

Cost Data

Sources of healthcare data: cost data.

Cost data involves Medicare and Medicaid records, group practice costs, and physician accounting records to name a few. While volumes could be written on each contributory source of cost data in the healthcare system, it’s enough to know that understanding where expenditures are being made, where waste occurs most frequently, and where money is utilized most efficiently can help HCOs and administrative teams. By utilizing cost data, these teams and organizations can develop strategies and policy that address areas of waste and create more efficient methodologies and practices where fiscal issues are concerned. This data can also help software developers and technology innovators create products and services that better address the pertinent accounting and record-keeping needs of HCOs and facility management teams. Additionally, efforts to improve data collection practices in the area of cost data can also help to improve the quality of information gathered from these sources.

Patient-generated Health Data (PGHD)

Sources of healthcare data: patient-generated health data.

Unlike clinical data where physicians and healthcare professionals are in charge of the collection and dissemination of the data, patient-generated health data (PGHD) comes entirely from the patients. It’s the patients who decide what to share, with whom to share it, and how to share it. For example, using a blood glucose meter at home and tracking those measurements would, if the patient shared them with their physician or a trial, be an example of PGHD.

Other types of PGHD include health and treatment histories that patients provide to their caregivers. This type of information is very often divulged during the course of initial paperwork but can be shared verbally, as well, and recorded by the healthcare practitioner. Biometric data, such as the blood glucose meter readings mentioned above, can also be informative, as can patient descriptions of symptoms. Finally, a patient’s lifestyle choices can greatly impact the efficacy of any treatments they receive, and any information they’re willing to divulge, when assessed cumulatively with other patients’ lifestyle data, can be a wealth of information all its own.

Claims Data

Sources of healthcare data: claims data.

Claims data, such as insurance claim reporting and health services research information, can help inform the healthcare community about insurance-related and coverage-related activity. Health expenditure data, for example, can help provide information about where money is going within the system and what areas of healthcare cost the most. Understanding the strengths and limitations of databases, processes, and procedures within the claims and insurance industry can also be a benefit as it can help professionals in that area to develop better practices. Health services research can provide insights as to how the various healthcare services are performing and where there is room for increased efficiency and overall improvement. Medicaid and Medicare data also helps to inform researchers, policymakers, and physicians alike as they seek to better understand which claims are made, how often they’re made, who tends to make them, and where increased quality and efficiency or additional programs could benefit the most patients.


There are many forms and sources of healthcare data that contribute to Big Data and analytics. The more information we have, the better decisions we can make and the more appropriate solutions we can formulate as an industry, from clinical care to artifical intelligence and technology. Big Data and its role in pushing healthcare data into being seen as a more centralized asset than departmentalized and siloed record-keeping can only improve the system at large and the quality of care administered by participants in it. No single source of data would be sufficient to inform the healthcare industry of what needs to be done and which areas of need are most pertinent. But when we combine all the available sources of reliable data together, real change can happen.