Graphs and numbers about the COVID situation dominate the news every day. Alarm bells are ringing when we see increasing numbers of COVID patients, while decreasing numbers are cause for celebration as an improvement is expected. However, numbers and graphs can be misleading and it is important to understand how data is presented as well as gain insight about their context. For example, the amount of positive tested patients during the first Corona period in the Netherlands March last year is lower compared to the second one around Christmas. Although this is a fact, we should not directly conclude that the situation in March was “better” compared to the period at the end of the year. Because when you look at the number of tests being performed during each peak, drastically more tests were performed during the last half year resulting in more COVID positives that are shown in the graphs. An alternative strategy might be to include the number of hospitalized patients, however the numbers during the first and second peak can also not directly be compared (e.g. during the second Corona period less patients needed intensive care and/or shorter hospital visits due to our expanding knowledge about the treatment options). As you already notice, it is not easy to critically analyze your data as there are many ways to interpret the same dataset. Even in this example, which at a glance seems straightforward, it is difficult to choose how to analyze and present the data to draw correct conclusions.
We encountered difficulties with correctly interpreting our data last month when we were optimizing one of the assays for the COVID-19 project. Together with my colleague, we were developing an orthogonal assay to look at the amount of corona Spike protein that was taken up by cells expressing the human receptor protein ACE-2. Once having a feasible setup, the next step was to test the actives from the primary screen for their effect in this cellular assay (see my previous blog about “Optimal assay conditions? Does that exist?”). The data from this orthogonal assay was analyzed automatically by a program that was normalizing our virus load by the amount of ACE-2 present in the cell. At first instance, our assay seemed to be working when we plotted the normalized data in a graph (Figure 1A). With increasing the amount of Spike, more of this protein was taken up by the cells that were expressing ACE-2. Taking a closer look at the rough counts without the normalization, there was no such an effect observed (Figure 1B). The amount of Spike proteins in the cell was similar to the background and not depending on the amount of virus load that we added to the cells or the presence of ACE-2. The question we were asking ourselves: “What is fair, looking at normalized or rough data? How to deal with this and can we conclude that our assay is (not) working?”
* Data shown is an illustration and not based on real numbers.
Figure 1. Orthogonal assay to assess the amount of Spike protein uptake in cells that express ACE-2. A) Normalized data, the ratio was of spike protein signal versus the amount of ACE-2 present in the cell. B) Rough data of Figure A, the amount of ACE-2 signal (top) and amount of Spike protein (bottom)
We repeated this COVID once more and included even more control samples as a final check before drawing any conclusions. When doing science, taking along extra control experiments to address your question is crucial to make correct conclusions. You can better have too many controls than in the end throwing away your data due to a lack of control samples. Unfortunately, based on the second outcome the assay is still not working in our hands and we decided not to continue working with this assay even though we spend many hours.
I can imagine that it is sometimes very tempting to ignore the rough data like we observed during the first experiment and only focus on that what you want to see. During my education and scientific career, I have learned to keep being critical even if that means to conclude that an assay, you have been working on for months or sometimes years, cannot be used. My advice, critically have a look at data and the analysis before drawing conclusions that are actually not meaningful at all.