How accurate are coronavirus tests?
The various coronavirus tests are all seriously flawed, either inherently or in how they’re being used, and this is fundamentally undermining our understanding of the pandemic
Two policy changes occurred in the beginning of the pandemic in the U.S., which seem to have led to serious data quality issues in a number of areas: 1) changing the longstanding guidelines for medical professionals to report deaths, which made attributing cause of death to the virus far more likely than under the previous guidelines that had been in place for 17 years; 2) substantially reducing the accuracy standard for coronavirus tests, as part of the nationwide emergency.
The entire pandemic surveillance and policy structure rests on the coronavirus tests being reasonably accurate. But as we’ll see below they’re often wildly inaccurate, either inherently or in how they’re used. This makes the surveillance structure’s foundation unfirm. The following chart summarizes the discussion in this article.
Are coronavirus tests accurate?
Vermont found a cluster of 65 positive COVID-19 cases in July. Bracing for a broader outbreak, authorities followed up on these tests, which were rapid antigen tests, and found that only four of the 65 were actual positives. 61 of the positives, 94%, were found to be false positives. The follow up tests were PCR tests, considered to be the gold standard test for detecting the virus.
It should be clear that any tests that return up to 94% false positives is worse than having no data — it can and surely did create severe fear and over-reaction in the community and among policymakers. And it also may have led actual positives being grouped physically with false positives and then turning false positives into real positives.
There are many other problems with false positives (New York Times has a good summary of the problems here), and of course also with false negatives, which have received more attention because of the justified fear of missing potentially infected patients.
So what happened in Vermont? And how on Earth could these tests be so inaccurate? We’ll see below that this, unfortunately, was not an isolated incident. The Big Island of Hawaii, where I live, recently experienced a very similar incident, with 93% false positives.
The NFL (National Football League) tested a number of their players in August of 2020, finding an alarming 77 positive results using the gold standard PCR tests. On re-testing, however, all 77 of these were found to be false positives — and it was chalked up as a lab error. How many other similar errors are occurring like this?
There are three widely used tests for the SARS-CoV2 virus: 1) PCR (polymerase chain reaction) tests detect the presence of specific virus RNA sequences from nasal swabs or other samples; 2) antigen tests that detect specific virus proteins (antigens) in the mucus, saliva or blood; 3) antibody tests that measure the presence of antibodies produced by the immune system in response to antigens.
The first two are generally used to assess the presence of the virus at the time of the test, and the third is used to determine whether the patient had the virus at some point in the past. We’ll see, however, that these functions of the various tests are not possible given how the tests are used in practice.
An August 10, 2020, blog from Prof. Robert Shmerling, at Harvard Medical School, summarized accuracy data for the various types of tests. He concludes, alarmingly, as follows:
Unfortunately, it’s not clear exactly how accurate any of these tests are. There are several reasons for this:
We don’t have precise measures of accuracy for these tests — just some commonly quoted figures for false negatives or false positives, such as those reported above. False negative tests provide false reassurance, and could lead to delayed treatment and relaxed restrictions despite being contagious. False positives, which are much less likely [this is not accurate, as I describe below], can cause unwarranted anxiety and require people to quarantine unnecessarily.
How carefully a specimen is collected and stored may affect accuracy.
None of these tests is officially approved by the FDA. They are available because the FDA has granted their makers emergency use authorization. And that means the usual rigorous testing and vetting has not happened, and accuracy results have not been widely published.
A large and growing number of laboratories and companies offer these tests, so accuracy may vary. At the date of this posting, more than 170 molecular tests, two antigen tests, and 37 antibody tests are available.
All of these tests are new because the virus is new. Without a long track record, assessments of accuracy can only be approximate.
We don’t have a definitive “gold standard” test with which to compare them.
Shmerling mentions in the middle of his list the emergency use authorization provided by FDA. As with the major U.S. policy change discussed in Part I of this series, this is an extremely important policy change that set up our country to react in knee-jerk and damaging ways to the virus by allowing far too lenient a standard for test accuracy.
The most important result from this leniency seems to have been the lack of any required standard for the cycle threshold of the allegedly gold standard PCR test, which, according to some respected medical professionals may have resulted in 90% or more effective false positives from PCR tests. I discuss this bombshell finding further below.
But first, let’s look at the policy changes at the FDA and why they were put in place.
Policy changes provided emergency use authorization for various virus tests with low accuracy requirements
FDA has provided “emergency use authorization” (EUA) for dozens of PCR tests, six different antigen tests, and a few dozen antibody/serology tests, as of late October 2020, described here.
FDA does not require any third party testing for accuracy under the EUA — this is, essentially, the purpose of having an EUA in place: allowing approval of a test, therapeutic or device without third party testing or clinical trials. The EUA process “requires a lower level of evidence,” FDA said, as reported by the Associated Press in June. Makers need only show that a test “may be effective” instead of the usual requirement to demonstrate “safety and effectiveness.” This higher threshold must be met once the federal government declares the emergency over. And that hasn’t happened since we’re stuck in a semi-permanent state of emergency.
FDA is, however, monitoring use and accuracy of these tests once they’re approved. Thus far it has revoked authorization (scroll down the page to “Individual EUAs for Serology Tests for SARS-CoV-2”) of two of the antibody/serology tests (ChemBio Diagnostic Systems and Autobio Diagnostics Co.), stating that the revocation in both cases occurred because:
FDA determined that it is not reasonable to believe the product may be effective in detecting IgM antibodies against SARS-CoV-2 or that the known and potential benefits of the device when used for this purpose outweigh its known and potential risks. FDA also concluded that based on the risks to public health from false test results, revocation is appropriate to protect the public health or safety.
This kind of revocation obviously has very serious financial implications for companies seeking to sell these tests.
Thus far, despite the wildly inaccurate antigen test results discussed below, FDA has not revoked its authorization for any of the six approved antigen tests (as of Oct. 25, 2020).
FDA did, however, issue a letter in early November warning about the potential for very high false positive antigen test results — particularly at low disease prevalence levels (active disease prevalence has been under 1% throughout the pandemic, resulting in very high false positives in screening and surveillance programs). This warning letter is a start but far more needs to be done to avoid continued wildly inaccurate test results.
Let’s now examine each of the three main types of tests in more detail, starting with the PCR tests.
Test #1: PCR tests are being misused, resulting in up to 90% effective false positives
Of the tests currently employed, the PCR test is considered the gold standard, and is technically described as the RT-qPCR test (Reverse Transcription quantitative Polymerase Chain Reaction). It tests for RNA from the virus itself (most viruses don’t have DNA; they have RNA) using a well-established method for amplifying a small RNA signal for positive identification.
The U.S. developed its own specific PCR test early in the pandemic, designed to test for three RNA fragments that are thought to be unique to the SARS-CoV-2 virus. This U.S.-specific development process led to serious delays and inaccuracies that took a few weeks to recover from.
One peer-reviewed study (Lee 2020) found about 30% false positives in the PCR test (70% specificity) and 20% false negatives (80% sensitivity). This is an inherent inaccuracy issue, specific to how the test works. There are also inaccuracies resulting from how the test is used. These inaccuracies may be even larger.
A New York Times analysis found that 90% of PCR positives in three states (Massachusetts, Nevada and New York) were effectively false positives, or as one of the scholars in this area (Michael Mina, see more below) clarified in an email dialogue with me “non-infectious late-stage positives,” because of excessive amplification of what is a very small viral load. The “cycle threshold” — the number of times a viral sample from the patient is amplified before the test is applied, is usually between 30 and 40 cycles and when this threshold is reached the test returns a positive or negative result.
However, Mina, a professor at Harvard, has been critical of the way that PCR tests are being used, for either failing to include the CT numbers in the tests, or using up to 40 cycles before finding a positive or negative. He thinks 30 cycles or below is far more reasonable because patients over that number of cycles are very unlikely to get Covid-19 (the disease resulting in some people from the coronavirus) or to be infectious to those around them.
In other words, positives from PCR tests over a cycle threshold of 30 should be considered false positives, or at the least these should be labeled, as Mina prefers, “late non-infectious” positives, and thus not requiring quarantining or contact tracing. This understanding renders the PCR test highly inappropriate for finding infectious patients, which is its stated purpose.
Peer-reviewed science on this issue was available relatively early in the pandemic, with Bullard et al. 2020 published in the journal Clinical Infectious Diseases in May of 2020, and concluding that cycle thresholds over 24 were not finding any live virus: “SARS-CoV-2 Vero cell infectivity was only observed for RT-PCR Ct < 24 and [symptom onset to time of test] < 8 days. Infectivity of patients with Ct > 24 and duration of symptoms > 8 days may be low. This information can inform public health policy and guide clinical, infection control, and occupational health decisions.”
Tony Fauci is on the record agreeing with the overly high cycle threshold issue, stating in a July interview:
If you get a cycle threshold of 35 or more … the chances of it being replication-confident are minuscule … You almost never can culture a virus from a 37 threshold cycle … [or] even 36 …
As we’ve just seen this figure should 24 or 25, but Fauci was erring on the side of extreme caution in his statement.
The New York Times calculated, based on reviewing PCR test results that did include CT figures, that up to 90% of positive test results in the three states examined would be considered negative (i.e. they are false positives at CT higher than 30) if they had been limited to a 30 cycle threshold:
In three sets of testing data that include cycle thresholds, compiled by officials in Massachusetts, New York and Nevada, up to 90 percent of people testing positive carried barely any virus, a review by The Times found.
On Thursday, the United States recorded 45,604 new coronavirus cases, according to a database maintained by The Times. If the rates of contagiousness in Massachusetts and New York were to apply nationwide, then perhaps only 4,500 of those people may actually need to isolate and submit to contact tracing.
One solution would be to adjust the cycle threshold used now to decide that a patient is infected. Most tests set the limit at 40, a few at 37. This means that you are positive for the coronavirus if the test process required up to 40 cycles, or 37, to detect the virus.
Tests with thresholds so high may detect not just live virus but also genetic fragments, leftovers from infection that pose no particular risk — akin to finding a hair in a room long after a person has left, Dr. Mina said.
Any test with a cycle threshold above 35 is too sensitive, agreed Juliet Morrison, a virologist at the University of California, Riverside. “I’m shocked that people would think that 40 could represent a positive,” she said.
A more reasonable cutoff would be 30 to 35, she added. Dr. Mina said he would set the figure at 30, or even less. Those changes would mean the amount of genetic material in a patient’s sample would have to be 100-fold to 1,000-fold that of the current standard for the test to return a positive result — at least, one worth acting on.Needless to say, this difference in what we should consider a positive test result is a very large difference in terms of diagnosing the spread of the virus and calibrating policy responses to the pandemic.
Mina wrote up some of these concerns in a peer-reviewed commentary also in the journal Clinical Infectious Diseases.
The notion that 30 cycles is sufficient is based on research conducted by Xiao et al. 2020 and also more recent work by Tom Jefferson, Carl Heneghan and their colleagues in a meta-review of research comparing positive PCR test results and live cultures of the same sample. The fourth update of this work, collected by Jefferson’s team and published on medrxiv.org Sept. 29, 2020, found that a “cut-off RT-PCR Ct > 30 was associated with non-infectious samples.” And they found that the chance of infectiousness declined 33% with every additional cycle.
This means that going from 30 cycles to 40 cycles to find a positive entails a very small chance of infectiousness in these “positive” results, as follows: a cycle threshold of 31 is 33% less likely to be infectious than a CT of 30, which is 33% less likely at 32, 33% less likely at 33, etc.
The end result is that we have only a 2.7% chance of an infectious patient at 40 CT if we assume 100% at 30 CT. Assuming 100% at 30 is extremely conservative because, as we’ve seen above with the Bullard et al. study, many studies have found no live infections resulting from CT of 24 or 25, which is exponentially less sensitive than 30 CT.
Jefferson and his team do not shy from calling these high CT positives “false positives.” They conclude (p. 2 of the paper linked above): “A binary Yes/No approach to the interpretation RT-PCR unvalidated against viral culture will result in false positives with segregation of large numbers of people who are no longer infectious and hence not a threat to public health.”
In sum, the way PCR tests are being conducted in the US at this time, with very high cycle thresholds, renders the PCR test a good diagnostic tool for the shape of past infections, but a very poor tool for diagnosing current infections unless the cycle threshold is reduced to 30 or below, as various scholars are now recommending.
In this manner the PCR test is quite similar to the antibody/serology tests that we’ll discuss further below, which are intended to measure past infections and not current contagiousness.
The over-arching problem is that the vast majority of medical and public health professionals still view PCR tests as the gold standard for measuring current infections, when it is highly imperfect for that role based on how it is being implemented in the vast majority of labs in the U.S. and around the world.
And when the headline infection numbers (“cases”) are reported daily for each county, state and nationwide, based largely on highly misleading PCR test results, with up to 90% being late-stage non-infectious effective false positives, we have a serious data quality problem.
Test #2: Rapid antigen tests are even less accurate than PCR tests
Let’s turn now to the newer, but even less accurate, antigen tests, the second kind of test for detecting contagious infections.
The FDA approved, in an emergency use authorization (an “EUA” in FDA parlance), the first antigen test in early May of 2020, to Quidel Corp. According to the press release: “These diagnostic tests quickly detect fragments of proteins found on or within the virus by testing samples collected from the nasal cavity using swabs.” As mentioned above there are now six FDA-approved antigen tests in the U.S.
The World Health Organization (WHO) approved one antigen test, also under emergency approval, in September, and there are now (as of late September) at least two antigen tests being rolled out in large numbers around the world. One of these (from U.S.-based Abbott) doesn’t even have emergency approval from WHO at the time of writing. The U.K. publication The Guardian wrote: “The tests, which look like a pregnancy test, with two blue lines displayed for positive, are read by a health worker. One test has received emergency approval from the World Health Organization (WHO) and the other is expected to get it shortly.”
The Guardian article briefly addresses accuracy: “The companies claim their tests are about 97% accurate, but that is in optimal conditions. FIND [the nonprofit “Foundation for Innovative New Diagnostics”] puts [the test] sensitivity between 80% and 90% in real-world conditions. That would pick up most infections.”
We now have a number of examples of inaccuracy being wildly higher than these claimed accuracy figures. The Vermont example at the beginning of this essay found 94% false positives in a cluster of 65 positive test results using the Quidel antigen test. Nevada found 60% false positives in a nursing home cluster, resulting in the state banning the use of these tests in nursing homes — until they reversed this ban after federal government pressure was applied.
In my home state of Hawaii, we had a similar 93% false positive rate after the first week of using rapid antigen tests on the Big Island to screen arriving passengers. 15 positives were found, out of about 3,000 tests, but 14 of those antigen test positives were found to be negative when re-tested with PCR tests. This wildly inaccurate (93% false positives) testing process led the county mayor to ponder publicly about killing the on-arrival testing program.
We don’t have aggregated antigen test accuracy data at this point but these examples in different locales around the country strongly suggest that these tests are worse than having no data.
CDC performed a study at two universities in September and October of 2020, finding a very low accuracy rate for the antigen tests in asymptomatic people, with 41% false negatives out of the 871 asymptomatic people tested. The CDC study blandly states — going to so far as to include the key figures in a parenthetical and ignoring the fact that their own study has completely undermined the efficacy of antigen tests for rapid screening — “accuracy was lower (sensitivity 41.2% and specificity 98.4%) when used for screening of asymptomatic persons.” But these results are assuming 100% accuracy in PCR tests, the reference standard for antigen tests, which is not at all accurate, as discussed above.
Even if antigen tests were 98.4% specific in the real world, when they are used to test asymptomatic people (which is itself not an authorized use since the EUAs only authorize testing symptomatic people) in increasingly common screening and surveillance programs, the large majority of positive test results can be false positives — as we’ve seen was the case with the Vertmont, Nevada and Hawaii examples.
Despite these abysmal results there is still a large push for antigen tests to be used more widely.
This trend toward rapid antigen tests rather than PCR tests appears to be setting the stage for yet more staggering inaccuracies in terms of both false positives and false negatives. But the general feeling among policymakers and medical professionals seems to be that even very bad testing data is better than no data. This is manifestly not the case, however, because false results (whether positives or negatives) over 50% render the test no better than chance. It is, when it exceeds 50% inaccuracy, worse than flipping a coin. In other words: it is extremely misleading and potentially very damaging to continue to rely on antigen tests when there is a demonstrated track record of wildly inaccurate results.
Such inaccurate results will lead to wrong policy choices and wrong individual determinations such as quarantining, contact tracing or, the worst possible outcome, grouping alleged positives with actual positives and then actually infecting the formerly not infected person — as happened in numerous nursing homes around the country early in the pandemic, with tragic results.
Test #3: Antibody/serology tests are potentially even less accurate
Turning to antibody tests, which attempt to test for past infections based on the presence of antibodies in a patient’s blood, we find that these tests are also highly problematic. These tests measure the presence of antibodies that are assumed to have been formed by the body in response to a now-disappeared coronavirus infection.
And since vaccines were made available in early December of 2020, which create antibodies in response to the vaccine’s mRNA injections (in the case of the Moderna and Pfizer vaccines), antibody tests have become entirely mooted.
The problem with antibody tests, before vaccines became available, is that antibodies detected by these tests can be produced by other viruses and create the false impression of a past SARS-CoV-2 infection. One study found about 50% antibody test false positives, resulting from the presence of the common cold coronavirus (CCC), which is in the same family of coronaviruses as SARS-CoV-2. Grifoni et al. 2020, a peer-reviewed paper in Cell, stated:
Importantly, we detected SARS-CoV-2-reactive CD4+ T cells in 40%–60% of unexposed individuals, suggesting cross-reactive T cell recognition between circulating ‘‘common cold’’ coronaviruses and SARS-CoV-2. … [A]ntibody, CD4+, and CD8+ T cell responses to SARS-CoV-2 were generally well correlated.
The good news (yes, there is some good news) here is that apparently having a prior immune response to the common cold may in many cases confer at least some immunity to COVID-19, a phenomenon known as “cross-immunity,” which may help explain why so many patients are asymptomatic or have very mild symptoms despite SARS-CoV-2 being a novel coronavirus (along with the probability that a majority of asymptomatics are actually just false positives).
Some experts think antibody tests are even less accurate than this — far less accurate. Christopher Farnsworth at the Washington University School of Medicine estimated that literally six out of seven antibody positive test results, if only asymptomatic individuals were screened (as would be the case for any attempt to find previous, i.e. no longer active, infections) were probably false positives:
In our research, we estimated that if we screened asymptomatic individuals, only one out of seven positive antibody tests in Missouri would be true positives, even with a highly accurate test. So, the other six people may think they’re protected and let their guard down, and then they could get infected and spread the disease. Widespread antibody testing could do more harm than good if people do not understand the limitations of such testing.
An advisor to Israel’s Health Ministry confirmed the serious problems with antibody tests in a Jan. 2021 statement: “according to the data known today, serology tests are not a reliable or valid tool to determine the level of protection against infection, neither after recovery nor after getting a vaccine.”
In sum, antibody tests present yet another set of wildly inaccurate testing data that should not be relied on. As discussed above, any results that are worse than 50% inaccurate are worse than chance and are thus worse than no data. 86% (6/7) false positives is far worse than having no data at all.
Mixing of antibody tests with PCR and antigen tests has been common
Another major data quality issue is mixing of different kinds of tests into a single pool of positive infections or “cases” data — which is then widely reported by news outlets as a single number. CDC has in fact advised states to combine PCR and antigen test results into a single number of “cases.” Most states have heeded this advice.
This is an issue because, as discussed, antigen tests have been shown in a number of cases to be wildly inaccurate. And with respect to mixing antibody tests with PCR and antigen tests, antibody tests assess the presence of past infections not current infections, and are also extremely inaccurate. Measuring a past infection and combining it with a pool of current infection data risks double or triple counting a single incident of viral infection. And it muddies the waters considerably in trying to track the current extent of the virus.
CDC itself was mixing testing data in this manner for some time and as far as I can tell is still doing it.
Johns Hopkins’ Coronavirus Resource Center warns states against this practice (see the bottom of the page):
When states report testing numbers for COVID-19 infection, they should not include serology or antibody tests. Antibody tests are not used to diagnose active COVID-19 infection and they do not provide insights into the number of cases of COVID-19 diagnosed or whether viral testing is sufficient to find infections that are occurring within each state. States that include serology tests within their overall COVID-19 testing numbers are misrepresenting their testing capacity and the extent to which they are working to identify COVID-19 infections within their communities. States that wish to track the number of serology tests being performed should report those numbers separately from viral tests performed to diagnose COVID-19.
However — and this is yet another serious “however” — we’ve seen above how the “gold standard” PCR tests are, in about 90% of cases, “not used to diagnose active COVID-19 infection” and nor do they “provide insights into the number of cases … diagnosed….” This is because the vast majority of PCR test positives are finding only dead RNA viral fragments and not a live infection.
U.S. testing data of the kind tracked by Johns Hopkins and other sites do, despite the admonition just quoted, often combine PCR tests with antibody tests and also rapid antigen tests. When there are wildly inaccurate results from all three of these tests, due to inherent flaws or flaws in how they’re being used, it becomes readily apparent that the entire U.S. testing structure is at this time fatally flawed.
The true gold standard: live culture calibration to test for infectiousness
What we need to measure with a virus test is live infections, i.e. patients who are infectious and thus capable of spreading to others. CDC researchers stated in a June 2020 research article that: “Viral RNA detection by RT-PCR does not prove the presence of infectious virus; culture isolation of virus is a better indication of contagiousness.”
There is a better way: using “live culture” tests to determine when PCR test positives are likely to indicate infectiousness. These tests take a patient sample and culture it in the lab to see whether a virus colony can be grown from the sample. If it can it means fairly clearly that the person was infectious at the time the sample was taken.
Jefferson, Heneghan, et al. 2020, discussed above in the section on PCR tests, review the global research on live culture tests for the virus and conclude that PCR false positives are a serious issue because the cycle threshold is far too high in most PCR tests. In 90% of the PCR test positives, also discussed above, these swabs do not grow live cultures. This means that there were no live viruses in the swab, and thus that person was not infected with live viruses and was not infectious to others.
One possible policy solution to the house of cards identified in this essay is to require, as Mina and others have called for, reporting the cycle threshold on every PCR test result so that CT over 30 results can be excluded from quarantine or contact tracing. This would reduce the concern over currently reported daily positive results by up to 90%.
A second policy response that could be effective is to follow the state of Nevada and ban all rapid antigen tests for testing asymptomatic people, particularly in nursing homes, which is an off label use anyway since antigen tests are meant to test only symptomatic people.
A third response is to also ban use of antibody tests entirely since they are far too cross-reactive with the common cold coronavirus, and possibly other viruses, to produce any kind of useful indication of previous infections.
CDC should revise its definition of a “case” in light of this information showing that all tests have serious problems because the current guidance, issued April 5, 2020, includes as a confirmed “case” of Covid-19 any positive PCR or antigen test. But as we’ve seen the false positive and late-stage effective false positive data renders this definition potentially wildly inaccurate in terms of tracking cases.
FDA should also reconsider its EUA policy in terms of requiring far more data regarding test accuracy, not just in lab conditions but also in real world conditions, before granting EUAs for specific tests.
And, last, our mainstream media should really dig into these issues and try to explain to the public how these problems are creating a very inaccurate overall picture of the pandemic. Only when we get a good handle on the front-line diagnostic tests will we have any confidence in our understanding of the pandemic. And we’re not remotely close to that point now.