Counting and units

Establish counts of records and unique records

It is important when linking data to keep track of the numbers of records at each level of analysis.

Before making any alterations to your data, establish the initial counts of the total number of rows (records) in each dataset. Also, take a note of the number of unique persons, and events in each table. This will assist you in identifying any duplicate or missing records, or potential errors, before you begin detailed analysis. 

This is another limitation of being provided with wide data that incorporates multiple levels. You are relying on the data analyst who prepared your data correctly joining the sets together. Providers of data are not immune to making mistakes, so the mistakes should be as easily detectable as possible.

Unit of analysis

Identifying the appropriate 'unit of analysis' is an important step in analysing linked data.

To give a simple example, consider the datasets introduced earlier.

You can see that there are 6 people in the study cohort, of which only 4 are represented in the hospital admissions data. These 4 people represent a total of 6 admissions.

If you were to calculate statistics related to deaths based on this data, you could say:

  • 1/6 = 16.7% of the cohort died, or
  • 1/4 = 25.0% of the admitted people died, or
  • 1/6 = 16.7% of all admission episodes were recorded as ending in a death.

The meaningfulness of these calculations depends on your research question. You must however be clear about both the numerator and denominator of your analysis when dealing with linked data that is measured at different levels.

Last updated: 6 November 2018