VDH LTCF Outbreaks Dataset Inconsistencies

Posted by: 0PENalbert1979 - Posted on:

VDH’s LTCF Outbreaks dataset comes in two flavors: machine-readable (CSV), and third-party vendor Tableau. It took VDH almost two weeks after producing the Tableau flavor, to release/produce the machine-readable version; annoying as that is, I mistakenly assumed that they were the same dataset, and as such stopped performing daily dataops on the Tableau version and only focused on the CSV.

My fault entirely, I apologize Virginia.

Today, it was brought to my attention that they are not the same datasets, silly me. The raw CSV has data that the Tableau version does not, which can be seen by downloading both, and searching for Canterbury Health and Rehab in Henrico County. The raw data version pulls up two instances, while the Tableau version pulls up one. More differences may exist between the dataset formats, but at the moment I’m not willing to pull the data out of the Tableau version to perform a diff on both. Not because it is not worth it, but because I simply have no time. Prior to this, the obvious differences to me were the columns: the raw data has a FIPS column that the Tableau does not, and the Tableau version has three essentially worthless columns added. I don’t consider any of these differences to be dealbreakers and/or noteworthy like the differences in actual data that this post is pointing out.

VDH LTCF Dataset for 2020-07-09 in Tableau, showing one record for Canterbury Health and Rehab.
Screenshot of VDH LTCF Dataset for 2020-07-09 in Tableau, showing one record for Canterbury Health and Rehab.
VDH LTCF Dataset for 2020-07-09 in CSV (rendered in GitHub), showing two records for Canterbury Health and Rehab.
Screenshot of VDH LTCF Dataset for 2020-07-09 in CSV (rendered in GitHub), showing two records for Canterbury Health and Rehab.

This is reproducible by doing the following: downloading the raw data and the Tableau data and comparing them. Unfortunately, these options will not work following the daily data post that VDH does in the morning. The archive I’ve been maintaining throughout COVID-19, and the Wayback Machine provide reproducibility once the data changes. Tableau makes archiving incredibly difficult from the source, therefore I exported the Tableau version in PDF and TWBX. The CSV version is also backed up in the repository, and more importantly in the Wayback Machine, for all to access and view.

Leave a Comment

Your email address will not be published. Required fields are marked *