Journal:Top Spinal Cord Injury Rehabilitation
Year, Volume, Issue, Page(s):, 26, 4, 221-231
Background: Linking records from the National Spinal Cord Injury Model Systems (SCIMS) database to the National Trauma Data Bank (NTDB) provides a unique opportunity to study early variables in predicting long-term outcomes after traumatic spinal cord injury (SCI). The public use data sets of SCIMS and NTDB are stripped of protected health information, including dates and zip code.
Objectives: To develop and validate a probabilistic algorithm linking data from an SCIMS center and its affiliated trauma registry.
Method: Data on SCI admissions 2011-2018 were retrieved from an SCIMS center (n = 302) and trauma registry (n = 723), of which 202 records had the same medical record number. The SCIMS records were divided equally into two data sets for algorithm development and validation, respectively. We used a two-step approach: blocking and weight generation for linking variables (race, insurance, height, and weight).
Results: In the development set, 257 SCIMS-trauma pairs shared the same sex, age, and injury year across 129 clusters, of which 91 records were true-match. The probabilistic algorithm identified 65 of the 91 true-match records (sensitivity, 71.4%) with a positive predictive value (PPV) of 80.2%. The algorithm was validated over 282 SCIMS-trauma pairs across 127 clusters and had a sensitivity of 73.7% and PPV of 81.1%. Post hoc analysis shows the addition of injury date and zip code improved the specificity from 57.9% to 94.7%.
Conclusion: We demonstrate the feasibility of probabilistic linkage between SCIMS and trauma records, which needs further refinement and validation. Gaining access to injury date and zip code would improve record linkage significantly.