Skip links and keyboard navigation

Introduction and references

Introduction

The ultimate aim of working with linked data should not be to produce a single, integrated dataset containing all the information from each table. 'Linkage' should not be confused with 'merging':

For records that can be linked, data merging refers to the process of combining individual records (or information in those records) into an integrated dataset. (National Statistical Service, 2015)

Linkage of separate datasets by a common identifier or key facilitates this merging of subsets of multiple datasets.

References

Borer, E.T., Seabloom, E.W. Jones, M.B. & Schildhauer, W. (2009). Some simple guidelines for effective data management. Bulletin of the Ecological Society of America, 90, 205–214. https://doi.org/10.1890/0012-9623-90.2.205
Braun, M.T., Kuljanin, G. &  DeShon, R.P. (2018). Special considerations for the acquisition and wrangling of Big Data. Organizational Research Methods21(3), 633-659.  https://doi.org/10.1177/1094428117690235
Broman, K.W. & Woo, K.H.     (2018). Data Organization in Spreadsheets, The American Statistician, 72(1), 2-10. https://doi.org/10.1080/00031305.2017.1375989
Ellis, S.E., & Leek, J.T. (2018). How to share data for collaboration. The American Statistician72(1), 53-57. https://doi.org./10.1080/00031305.2017.137598
Hart, E.M., Barmby, P., LeBauer, D., Michonneau, F., Mount, S., Mulrooney, P., Poisot, T., Woo, K.H., Zimmerman, N.B. and Hollister, J.W. (2016). Ten simple rules for digital data storage. PLoS Computational Biology12(10), p.e1005097 - https://doi.org/10.1371/journal.pcbi.1005097
ORNL DAAC [Oak Ridge National Laboratory Distributed Active Archive Center]. (2018). Data Management - Best Practices for Data Management. Accessed 20 July 2018 from https://daac.ornl.gov/datamanagement/
Murrell, P. (2013). “Data Intended for Human Consumption, Not Machine Consumption,” in              Bad Data Handbook, ed. MacCallum, Q.E.  Sebastopol, C.A.:  O’Reilly Media, pp.31–51.
National Statistical Service(2015). A Guide for Data Integration Projects Involving Commonwealth Data for Statistical and Research Purposes. Accessed 18 July 2018 from:  https://statistical-data-integration.govspace.gov.au/project-delivery/linking-and-merging-of-data
Schildhauer, M. (2018). "Data Integration: Principles and Practice" in Ecological Informatics, ed. Recknagel, F. & Michener, W.K. Springer, Cham,     pp. 129-157.
Strasser, C.A., Cook, R., Michener, W.K., & Budden, A. (2012). Primer on Data Management: What you always wanted to know. UC Office of the President: California Digital Library http://escholarship.org/uc/item/7tf5q7n3
White, E.P., Baldridge, E., Brym, Z.T., Locey, K.J., McGlinn, D.J., & Supp, S.R. (2013). Nine simple ways to make it easier to (re)use your data. Ideas in Ecology and Evolution,   6(2), 1-10. https://doi.org/10.4033/iee.2013.6b.6.f
Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Accessed 18 July 2018 from:  http://r4ds.had.co.nz/relation-data.html
(Join diagrams reproduced here under Creative Commons licence  - https://creativecommons.org/licenses/by-nc-nd/3.0/us/)
Wickham, H. (2014). Tidy data.     Journal of Statistical Software59(10), 1-23.     https://www.jstatsoft.org/article/view/v059i10/v59i10.pdf
Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2017). Good enough practices in scientific computing.     PLoS Computational Biology13(6), e1005510.     https://doi.org/10.1371/journal.pcbi.1005510

Last updated: 9 November 2018