Why Oneida County briefly showed the most COVID-19 symptoms in Upstate NY By Brenton Recht
Facebook recently released a map of COVID-19 symptoms based on a survey of Facebook users conducted by Carnegie Mellon University's Delphi Research Center. Local journalists quickly noticed that Oneida County stuck out compared to its neighbors.
According to the map, 1.97% of surveyed Facebook users in Oneida County reported symptoms, the highest percentage in NY state outside of the NY Metro area.
In a news conference, Oneida County Executive Anthony Picente claimed the map was “irresponsible” and misrepresented Oneida County. Oneida County’s percentage has subsequently decreased on Facebook’s map to less than 1%, a number close to other upstate counties.
What happened? Did Oneida County really have a spike in COVID-19 cases last week?
The Delphi Research Center makes their data available online. In addition to what Facebook shows on their map, this data includes the standard error of their estimates, which is a measure of how uncertain the estimate is. There will always be some uncertainty in any realistic survey.
Below is a graph of the survey results in Oneida County. The dots are the estimate of each day. The shaded area is an estimate of a confidence interval for each dot. Don’t worry too much about the exact definition of confidence interval: just think of it as the range that the actual value could probably be for the entire population, given the degree of uncertainty.
Higher standard error means wider confidence intervals. The confidence intervals in this graph use a standard approach in statistics, assuming a normally distributed variable. The 95% confidence interval is approximately the estimate plus or minus two times the standard error.
The confidence intervals are pretty large, especially around the April 12th data point used in Facebook’s original map.
There the interval ranges from 0.8% to 3.2%. The lower end of that interval is similar to the estimates for other upstate counties. The intervals begin to tighten after April 13th and 14th.
Let’s compare with the borough of Queens – one of the hardest-hit areas in the US – and the state of NY as a whole.
These confidence intervals are much smaller. What this comes down to is simple: there are a lot more people in Queens and New York State than Oneida County.
In their methodology section, Delphi states that “As of mid April, about 150,000 such surveys were filled daily throughout the United States.” A ballpark estimate amounts to about 100 surveys a day for Oneida County, 1,000 in Queens, and 9,000 in all of New York State, possibly fewer before mid-April. It’s a basic fact of survey design that the more respondents, the better the estimate.
Oneida County isn’t unique. Other parts of the country have expressed concern about county mapping results.
If one looks across the U.S. on Facebook’s map, less-populated areas stick out. These indicate run-of-the-mill statistical uncertainty, not a rash of isolated COVID-19 hot-spots.
Realistic data is never perfect. Understanding that uncertainty helps prevent confusion and false conclusions, whether here in Oneida County or elsewhere.
Brenton Recht is a data scientist from Whitesboro, NY who is currently working in property and casualty insurance
Python code to obtain Delphi’s data and create the graphs in this article is available on Github Gist