How Covid-19 stats are grossly exaggerated: a brief summary of the data

18 min readSep 30, 2021

This is a brief and accessible summary of the various sources of Covid-19 statistical inaccuracies that, when combined, lead to a gross exaggeration of the direct harm from the virus

Tam Hunt, J.D.

Blaine Williams, M.D.

Daniel Howard, Ph.D

The US FDA warned in November 2020 that up to 96% of all Covid-19 antigen test positives could be false positives in screening programs at a low 0.1% active disease prevalence. We have been at or near that level of active disease prevalence for most of the pandemic.

UK government officials warned in internal email discussions that up to 98% of the antigen test positive results, after antigen tests were rolled out widely for weekly or even biweekly screening in that country, could be false positives because of low active disease prevalence.

A number of virologists warned in a New York Times article from August 2020 that PCR tests — the alleged gold standard for Covid-19 testing — were being used in such a way that up to 90% of the test positives were effectively false positives.

Similarly, a January 2021 study from Stanford researchers and published in a CDC journal, found fully 96% of PCR test positives in people who had no symptoms were non-infectious, contradicting the then-widespread view that asymptomatic Covid-19 carriers could spread the virus.

A Pfizer vaccine study, released in October 2021 and evaluating the vaccine’s effectiveness six months after the jab, found, by using full genetic sequencing (the technique used to study Covid-19 “variants” but also to determine PCR test accuracy) to verify PCR test results, that 91% of the PCR test positives were false positives (see appendix table S4).

A 2021 peer-reviewed study of pediatric Covid-19 hospitalizations (under 22 years old) in a large Southern California hospital found 86% were either not actually related to Covid-19 or were “minimally related,” and that the patients were in the hospital for other reasons not related to Covid-19, but were still counted as “Covid-19 hospitalizations.”

For older patients, who inevitably acquire lasting comorbidities (average number of comorbidities in Covid-19-related deaths tracked by CDC is 4.0), this tendency to count patients as “Covid-19 hospitalizations” due to incidental or minimally-related Covid-19 will increase.

In 2021, the Italian health minister stated that 88% of all deaths being attributed to Covid-19 in that country were not causally related to Covid-19: “On re-evaluation by the [Italian] National Institute of Health, only 12 per cent of death certificates have shown a direct causality from coronavirus, while 88% patients who have died have at least one pre-morbidity — many had two or three.”

What’s going on with all of these numbers showing very significant exaggeration or over-statement of the direct impacts from Covid-19?

In what follows, we’re going to review the definitions for a Covid-19 “case,” “hospitalization” and “death” in an attempt to show how these terms were defined to be maximally inclusive, and thus highly exaggerated, all in the name of “erring on the side of caution.”

What happened with the implementation of these definitions and related policies was a catastrophic backfire: erring on the side of caution led to a massively distorted perception of the actual harm from the virus — and then the large majority of harm that did in fact occur resulting from policy choices and individual behavior, rather than the virus.

The end result is that the cases, hospitalizations and deaths from Covid-19 are in most areas exaggerated such that the actual numbers (i.e. causally and meaningfully related to the virus itself rather than pandemic policies or other normal causes of illness and death) are probably around 90% less than reported. Let’s go through each category to get a better understanding of what happened.

How are Covid-19 cases defined?

(For a more detailed description of CDC’s Covid-19 case definition and the related history see here).

For almost the first time in history with respect to a respiratory illness, in April of 2020, CDC defined a “confirmed case” of Covid-19 as a positive lab test result only. A “probable case” was even more loosely defined, with not even a test required. No consideration of symptoms is required to be a confirmed or probable case (see also Cohen et al. 2020 noting the same).

Dr. Monica Gandhi at UC San Francisco affirmed our analysis, independently, in an essay in the San Francisco Chronicle:

In the history of epidemiology, experts have never counted individuals without symptoms into these definitions. A “case” was previously defined as a symptomatic person who displayed signs of illness stemming from a pathogen.

In practice, what has happened is complete reliance on positive test results, regardless of symptoms, for almost all “cases” tallied in public databases.

Previously, including for SARS and many other similar outbreaks, the definition of a confirmed case included consideration of symptoms as well as lab tests. Consideration of both of these data points (symptoms and test results) is highly important.

The CDC provides a page on the history of the surveillance case definitions for various diseases. Uniform case definitions for a number of diseases were first established by CDC and CSTE in 1990. The last time the general criteria were modified was in 1997. The 1997 report states:

“These case definitions are to be used for identifying and classifying cases, both of which are often done retrospectively, for national reporting purposes. They should not be used as criteria for public health action.”

This admonition was not heeded with respect to the Covid-19 pandemic. But it should have been.

Recent case definitions from CDC on, for example, the SARS outbreak in 2003 and the H1N1 in 2008, required clinical symptoms plus laboratory confirmation for a case to be “confirmed”. The CDC’s 2003 case definition for SARS requires (p. 2): “Clinically compatible illness (i.e., early, mild-to-moderate, or severe) that is laboratory confirmed.”

The influenza (flu) case definition, last updated in 2012, also requires both clinical and lab evidence for a confirmed case: “A case that meets the clinical and laboratory evidence criteria.”

As previously quoted, the CDC’s “confirmed case” definition for Covid-19 requires only “confirmatory laboratory evidence.”

So the 2020 case definition for Covid-19 was a substantial break from the policies in place for decades prior to 2020. This change in case definition, all by itself, played a major role in how the pandemic unfolded and was exaggerated, as we’ll see below.

In sum, the new CDC Covid-19 surveillance case definition adopted a very loose set of criteria when it was adopted in April of 2020 — and was loosened even further in its August 2020 update to allow for antigen tests for laboratory confirmation — setting up the U.S. for a self-reinforcing chain of pandemic data that rests largely on this new case definition and laboratory tests that are in many cases highly inaccurate.

Most other nations, following WHO’s guidance (with the U.S., Gates Foundation and UK, in that order, the three biggest WHO funders, giving rise to accusations of “philanthroimperialism” by some critics like Vandana Shiva), adopted similarly over-inclusive case definitions, which is why we’re seeing this same dynamic unfold in many other countries too.

Any future respiratory ailment should, based on common sense and good public policy, require consideration of symptoms as well as lab tests for a “confirmed case.”

How does this extremely inclusive definition of “case” affect case counts and other Covid-19 stats?

The primary way it affects case counts is through testing of asymptomatic people in screening and surveillance programs in schools, for travel requirements, in universities, hospitals, sports teams, and other workplaces where regular testing is conducted of all people or randomly chosen people.

As the FDA and UK government emails described in the quotes at the beginning of this piece, Covid-19 testing when disease prevalence is low will result in a vast majority of false positives in screening and surveillance programs (which, contrary to “diagnostic” testing, don’t require any symptoms to be seen in the people being tested).

And disease prevalence, in terms of active infections, has been low throughout the pandemic (generally well under 1% of the population, as various studies have found, including the large vaccine clinical trials from Moderna and Johnson & Johnson).

It’s counter-intuitive but here’s why false positives are such a problem: it’s because screening program tests will produce the same percentage of false positives at high or low disease prevalence, but as disease prevalence goes down the false positives will start to swamp the true positives. And with active disease prevalence well under 1% at all times during the pandemic this translates to the vast majority of screening program test positives being false positives.

Harvard Medical School professor and epidemiologist Westyn Branch-Elliman recently wrote about this phenomenon in U.S. News and World Report. She and her coauthors described how at the 0.1% or so Covid-19 active infectoin rate we’re seeing in schools this summer and fall, and a 95% accurate test, we’re likely to see literally 71 out of 72 test positives be false positives.

You read that right: the professors write that just one of 72 tests would in this scenario be a true positive. Recent CDC data found that active infections in kids were actually much lower than 0.1% even during the height of the Delta spike in August 2021, at about 0.03% — so the false positives as school Covid-19 screening programs are ramped up will probably be even higher than the already catastrophically high rates that Branch-Elliman and others are warning about.

Another major cause of false positives is overly aggressive use of PCR tests.

The New York Times dropped a quiet bombshell at the end of August of 2020 with a story titled “Your coronavirus test is positive. Maybe it shouldn’t be.” It quotes a number of academics and researchers who express strong concerns about how the “gold standard” PCR tests for the coronavirus are being applied.

In short, the tests are being applied in a way that amplifies their sensitivity far beyond what is warranted for tracking active infections of Covid-19 — which is the purpose of the PCR test.

Here’s the key quote from the article:

In three sets of testing data that include cycle thresholds, compiled by officials in Massachusetts, New York and Nevada, up to 90 percent of people testing positive carried barely any virus, a review by The Times found.
On Thursday, the United States recorded 45,604 new coronavirus cases, according to a database maintained by The Times. If the rates of contagiousness in Massachusetts and New York were to apply nationwide, then perhaps only 4,500 of those people may actually need to isolate and submit to contact tracing.

The article never uses the term “false positive” but the primary researchers who track this issue do use this term. Jefferson et al. 2020 “Viral cultures for COVID-19 infectivity assessment — a systematic review (Update 3)”, the third update issued by Jefferson’s team calibrating the accuracy of PCR tests (by using viral cultures) through comprehensive tracking of published test results around the world, concludes (emphasis added):

Prospective routine testing of reference and culture specimens are necessary for each country involved in the pandemic to establish the usefulness and reliability of PCR for Covid-19 and its relation to patients’ factors. Infectivity is related to the date of onset of symptoms and cycle threshold level. A binary Yes/No approach to the interpretation RT-PCR unvalidated against viral culture will result in false positives with segregation of large numbers of people who are no longer infectious and hence not a threat to public health.

In layman’s terms, the PCR tests are being used overly aggressively to amplify a very small signal, which is probably in most cases dead viral fragments, through an excessive number of cycles, or simply background human or microbial genetic material (due to insufficiently specific PCR test probes and primers).

By going beyond the now-established cycle threshold (CT), for detecting live infections, of 25–30 cycles, the PCR tests are in most cases creating an artificial positive test result through excessive amplification.

Another quote from the NY Times article:

“I’m really shocked that it could be that high — the proportion of people with high C.T. value results,” said Dr. Ashish Jha, director of the Harvard Global Health Institute. “Boy, does it really change the way we need to be thinking about testing.”

The article adds, quoting another virologist:

Any test with a cycle threshold above 35 is too sensitive, agreed Juliet Morrison, a virologist at the University of California, Riverside. “I’m shocked that people would think that 40 could represent a positive,” she said.
A more reasonable cutoff would be 30 to 35, she added. Dr. Mina said he would set the figure at 30, or even less. Those changes would mean the amount of genetic material in a patient’s sample would have to be 100-fold to 1,000-fold that of the current standard for the test to return a positive result — at least, one worth acting on.

Dr. Mina is Harvard Medical School epidemiologist Michael Mina, an assistant professor at the Center for Communicable Diseases. He also told the NY Times:

In Massachusetts, from 85 to 90 percent of people who tested positive in July with a cycle threshold of 40 would have been deemed negative if the threshold were 30 cycles, Dr. Mina said. “I would say that none of those people should be contact-traced, not one,” he said.

If 85–90% of positive tests “would have been deemed negative,” these are indeed false positive test results, even though Mina, in a dialogue with one of the authors (Hunt), shied away from using that term (he preferred “late stage noninfectious positive,” rather clunky and avoiding the truth of the matter: that these are simply false positives).

These testing issues are just two of the ways in which the extremely loose “case definition” for Covid-19 and faulty testing has led to a vast inflation of case numbers. But it’s not just case numbers that are exaggerated. Case numbers form the beginning of the pandemic data chain and they affect all later links in that chain, as we’ll see below.

How are Covid-19 hospitalizations defined?

Let’s turn now to “Covid-19 hospitalizations,” the next link in the chain. This term is defined in many states and by CDC’s COVIDNET program as anyone who has tested positive for Covid-19 either in the 14 days prior to hospital admission (it doesn’t matter why the person is admitted to the hospital), or during admission to the hospital (almost all hospitals screen for Covid-19 during admission), or during their time in the hospital (many hospitals test patients in the hospital on a regular basis too).

None of these criteria require any causal connection between the positive test result and either symptoms of Covid-19 itself or any causal connection between Covid-19 and why the patient is in the hospital.

And if the large majority of these Covid-19 tests are false positives, as we’ve seen in the previous section, either because they’re resulting from screening of asymptomatics upon admission to the hospital, or because of overly aggressive PCR cycle threshold levels, or some other reason, then the large majority of these “Covid-19 hospitalization” figures will also be inaccurate.

We are now starting to see the peer-reviewed data arrive to support this logic.

A new peer-reviewed study (Webb et al. 2021) of pediatric (under 22 year olds) Covid-19 hospitalizations at a hospital serving 12 million people in Southern California found that 86% of the “Covid hospitalizations” were not related to Covid or were only minimally related.

“[T]here is a substantial portion of patients for whom COVID-19 was either incidental or minimally related to hospitalization. In our large population, that percentage was a majority of patients (126 of 146, 86%). As such, we sought to provide a classification that more accurately represents the disease burden.”

This is actually an under-estimate because they exclude 7 false positives detected in their study that were otherwise tallied as a “Covid hospitalization” based on the general definition that requires only a positive test result before, upon admission, or during a hospital stay to be considered a “Covid-19 hospitalization.” Including the 7 false positives brings the total “Covid-19 hospitalization” tally that isn’t actually causally related to Covid-19 to over 90%.

They also state that ‘there was virtually no statistically or clinically meaningful difference between “incidental diagnosis” and “potentially symptomatic.”’

So the 86% figure means that 9 in 10 patients, on average, that were tallied as Covid-19 hospitalizations should have been placed in a different category and the 1 in 10 that were significantly symptomatic should have received appropriate care.

Is this study of one pediatric hospital indicative of Covid-19 hospitalizations nationwide? Additional studies will arrive and will likely support this high level of exaggeration. Two other studies have found approximately half of Covid-19 hospitalizations were not causally related, including a large study of all VA hospitals (Fillmore et al. 2021), that found that 52–64% of Covid-19 hospitalizations had moderate to severe Covid-19, and the rest were asymptomatic or light cases (based on a review oxygen level data alone), and another looking at pediatric hospitalizations (Kushner et al. 2020) that found ~45% of Covid-19 hospitalizations were not causally related to Covid-19.

A commentary by David Zweig on these studies provides a helpful summary and states:

The reported number of COVID-19 hospitalizations, one of the primary metrics for tracking the severity of the coronavirus pandemic, was grossly inflated for children in California hospitals, two research papers published Wednesday concluded.

The most important figure of them all: Covid-19 deaths

Let’s now turn to the next link in the chain, the really big one: deaths caused by Covid-19.

First we need to examine CDC’s mortality statistics and be careful in examining their definitions. CDC tracks deaths “involving” Covid-19, making no claims about causality. CDC’s website counts “all deaths involving Covid-19” in Table 1, as the figure shows:

CDC’s provisional mortality data tracking deaths “involving” Covid-19, influenza or pneumonia (source).

CDC’s definition of a death “involving” Covid-19 is as follows (see footnote 1 to Table 1): “Deaths with confirmed or presumed COVID-19, coded to ICD–10 code U07.1.”

Coding a death as U07.1 requires no claims about causality between Covid-19 (whether confirmed by a test or not) and the death at issue. Rather, CDC and state-level guidelines all but require any relationship to Covid-19 whatsoever to be listed in the death certificate. And when Covid-19 is listed the U07.1 code must be used and that’s how CDC picks it up, based on a largely automated data collection system, as a “death involving Covid-19.”

(One of the authors [Hunt] wrote a detailed explanation of the new CDC death certification guidelines issued in April 2020 here.)

Given these extremely broad definitions and guidelines, how many people are simply dying “with” Covid-19 rather than “from” it?

Dying “with” rather than “from” Covid-19

“If you died of a clear alternate cause, but you had Covid at the same time, it’s still listed as a Covid death. Everyone who is listed as a Covid death, doesn’t mean that was the cause of the death, but they had Covid at the time of death,” Illinois’s director of public health, Dr. Ngozi Ezeke, explained to reporters in April 2020, early in the pandemic.

She expanded on these definitions as follows:

“I just want to be clear in terms of the definition of ‘people dying of Covid.’ The case definition is very simplistic. It means, at the time of death, it was a Covid positive diagnosis. That means, that if you were in hospice and had already been given a few weeks to live, and then you also were found to have Covid, that would be counted as a Covid death.”

Oregon is similarly broad: “Deaths in which a patient hospitalized for any reason within 14 days of a positive COVID-19 test result dies in the hospital or within the 60 days following discharge.” An interview with KGW news about this definition, in an article entitled, “Are dying with COVID-19 and dying from COVID-19 the same thing? In Oregon, they are,” led to this statement from Oregon Health Authority spokesman, Fred Modie:

We asked Modie about a hypothetical case where someone died from a motorcycle crash and also had COVID-19. Would that be counted as a COVID-19 death?
“It would be,” Modie explained.

This is the approach used for tracking almost all U.S. Covid-19 deaths because each state tallies the death statistics in much the same manner, counting a “Covid-19 death” as any death associated with Covid-19 in any way, regardless of the actual cause of death.

This loose approach to deaths tracking was summed up well by Dr. Deborah Birx, the former White House Coronavirus Response Coordinator, in her statement that “if someone dies with Covid-19 we are counting that” as a Covid-19 death.

Some countries with high Covid-19 death counts are even more liberal in their definitions than the US. The UK, for example, defined Covid-19 deaths initially as any death associated with a positive Covid-19 test at any point in time: “all deaths in people with a positive test at any point were reported to avoid underestimating the impact of COVID-19.”

This absurdly broad definition was later modified (see bottom of page 8 of the link) to be mildly less absurd as “as any death within 28 days of a positive SARS-CoV-2 test” or (p. 14) “all deaths who have laboratory-confirmed COVID and died within 60-days or where COVID-19 is mentioned on their death registration regardless of their time to death.”

A number of counties and states have begun to reduce their Covid-19 death tallies because of a recognition that they were far too inclusive, including:

— UK, which reduced their deaths by over 10% in August 2020 due to the change in definition discussed above

— Alameda County in California reduced their death count by 25% in June 2021, resulting from changing the definition from any association with the virus to requiring causation or at least a plausible pathway of causation (still far too inclusive)

— Santa Clara County reduced their death count by 22% in July 2021 for the same reason

These are still vastly over-inclusive definitions and death counts, as we’ll see further below, but at least these changes demonstrate a little more critical thinking on the part of policymakers and public health departments, recognizing the importance of more accurate data.

When the general public learns that motorcycle accident deaths of people who had previously tested positive for Covid-19 were counted as Covid-19 deaths it seriously undermines confidence in public statistics — and rightfully so.

So here’s the “Covid-19 deaths” chain of logic:

— an extremely inclusive definition of a Covid-19 “case” requiring simply a positive test result in most cases, either PCR or antigen test

— highly aggressive testing protocols, either by widespread testing of asymptomatic people or overly high PCR cycle thresholds, resulting in vast majorities of false positives

— extremely inclusive definitions of a “Covid-19-involving death” that is tied in most cases simply to a positive test result

— the end result is probably causally-related “Covid-19 deaths” being 90% or more lower than the headline figures, because the large majority of the deaths being tallied are not causally related, or perhaps only minimally related, to Covid-19.

We don’t yet have many peer-reviewed studies to support this analysis of substantial exaggeration of Covid-19 deaths, but we can be confident those studies will come out before too long. CDC did complete a study in early 2021 (Gundlapalli et al. 2021) looking at the “plausibility” of Covid-19-attributed deaths. They conclude that indeed the number of deaths being attributed to Covid-19 in 2020, 378,048, was “plausible.” This is a low evidentiary bar indeed and says nothing about actual causality, or reasonable inferences of causality. How many of these “plausible” attributions are actually reliable? Dispelling the doubts about massive over-inclusion, which are based on similar logic and dynamics with respect to case counts and hospitalization counts, discussed above, requires a far higher bar than plausibility.

We are are working with a team of scientists attempting to double check CDC’s excess deaths analysis (this figure tracks all excess deaths, not just Covid-19 deaths, but it also assumes that all Covid-19 deaths are excess deaths, which in itself risks massive over-inclusion, or “reassignment” of existing causes of death as “Covid-19 deaths”). We will release a preprint soon of our analysis.

So if it’s not Covid-19 why are so many people dying?

The U.S. has indeed seen a large jump in “excess deaths” since early 2020. This chain of logic and data requires asking the question: if it’s not Covid-19 causing all of those deaths, what is? And if it’s not Covid-19 we aren’t left with too many options: it has to be a mix of lockdown policies, exacerbating existing causes of death like cancer, heart disease, COPD, diabetes, Alzheimer’s, etc., and media hype causing massive amounts of fear and over-reaction, also leading to exacerbation of existing causes of death.

The peer-reviewed science is also gradually coming in on this issue and a later essay will explore lockdown-associated mortality. One of us (Hunt) wrote a short article recently looking at the massive and unprecedented increases in overdose deaths and homicides. These are probably indicative of other cause of death categories increasing in 2020 and 2021 as a direct result of lockdowns and partial lockdowns. More data and peer-reviewed science will come out before long on these unfortunate issues.

We also need to consider that 40% of US Covid-19-involving deaths have occurred in nursing homes, that the average age of death for Covid-19 deaths has been 76, and that the average comorbidities has been 4.0, according to CDC.

These figures show that a lot of what is going on may be primarily an acceleration of deaths in those who are already very old and frail. We calculate in another in-progress academic paper that the “years of life lost” on average for Covid-19-involving deaths is about 5.3 (I’ll add a link to the pre-print when ready). This is far far lower than the 40 or more years of life lost for the average overdose or homicide death, and raises serious questions about whether lockdowns and related policies have caused more deaths than they’ve prevented.

It is not surprising that panic, lockdowns, and media hype would tip a lot of these people being labeled as Covid-19 deaths over the edge, regardless of direct harm from the virus itself.

Wrapping up, we acknowledge that the case for massive exaggeration of Covid-19-attributed deaths is not yet airtight — but the circumstantial case for it is quite good. We look forward to additional data and peer-reviewed studies on this highly important topic.

Last, this analysis should prompt in the reader the underlying questions: what went wrong? How did so many layers of exaggeration happen and not become more widely known far sooner?