VDH LTCF Outbreaks Dataset Inconsistencies
VDH’s LTCF Outbreaks dataset comes in two flavors: machine-readable (CSV
), and third-party vendor Tableau. It took VDH almost two weeks after producing the Tableau flavor, to release/produce the machine-readable version; annoying as that is, I mistakenly assumed that they were the same dataset, and as such stopped performing daily dataops on the Tableau version and only focused on the CSV
.
My fault entirely, I apologize Virginia.
Today, it was brought to my attention that they are not the same datasets, silly me. The raw CSV
has data that the Tableau version does not, which can be seen by downloading both, and searching for Canterbury Health and Rehab in Henrico County. The raw data version pulls up two instances, while the Tableau version pulls up one. More differences may exist between the dataset formats, but at the moment I’m not willing to pull the data out of the Tableau version to perform a diff on both. Not because it is not worth it, but because I simply have no time. Prior to this, the obvious differences to me were the columns: the raw data has a FIPS column that the Tableau does not, and the Tableau version has three essentially worthless columns added. I don’t consider any of these differences to be dealbreakers and/or noteworthy like the differences in actual data that this post is pointing out.


This is reproducible by doing the following: downloading the raw data and the Tableau data and comparing them. Unfortunately, these options will not work following the daily data post that VDH does in the morning. The archive I’ve been maintaining throughout COVID-19, and the Wayback Machine provide reproducibility once the data changes. Tableau makes archiving incredibly difficult from the source, therefore I exported the Tableau version in PDF
and TWBX
. The CSV
version is also backed up in the repository, and more importantly in the Wayback Machine, for all to access and view.