Genome #5: Omicron update and the reporting data mess
Two years into the pandemic, India's reported data has few hits, many misses
India has started witnessing an uptick in Covid-19 cases in the past couple of weeks. The government’s official dashboard provides these data points:
Total samples tested
Number of active cases
Number of discharged cases
Number of deaths
Number of vaccination doses administered (dose1/dose2)

While the information was sufficient and useful in the beginning phase of the pandemic, the emergence of Omicron requires more data. As we are trying to determine the clinical severity of Omicron, the availability of more fine-grained information will go a long way in making more informed decisions. The Prime Minister recently (25th December 2021) announced that vaccination for the 15-18 years age group will begin from January 3rd, 2022 while the ‘precautionary’ 3rd dose for 60 years+ individuals will begin from January 10th, 2022. While these are welcome decisions, there is no discussion on what is being done to improve data collection, surveillance, or even reporting.
Gap 1: Data collection and reporting
At this stage, the number of hospitalizations matters more than the cases. MyGov’s dashboard provides no information about how many hospital beds, ICU beds, ventilators are currently in use across the states. Barring a few states, this information is also absent from their daily health bulletins. It is not just that this information is missing, the reporting often happens in obscure form. For example, Bihar’s health bulletin on Covid-19 cases is through tweets - no dashboard or website, making it difficult to even track the data on a daily basis. Contrastingly, Kerala’s health bulletins (Figure 2) and the Covid-19 dashboard are some of the most informative in the country, reporting the hospitalizations with vaccination status. This metric is important, particularly in the light of new variants - it informs us how are the vaccines holding up in preventing hospitalizations (and infections). For example, in New York City (NYC) which just hit a record number of daily cases at 44, 000 (December 30th), the hospitalization rate is mostly among the unvaccinated (Figure 3). We are still trying to figure out how the Omicron situation will unfold in India - recording and reporting the vaccination status is the first step in that direction.

Gap 2: Genomic surveillance: Missing data and clinical metadata
Missing data
INSACOG, a consortium set up to ramp up genomic surveillance of SARS-CoV-2 is still to catch up on optimal sequencing. At this point, genomic surveillance can help us gauge the prevalence of Omicron in the country. There is a huge lag (median 70 days!) in when the sequenced samples make it to GISAID. The weekly bulletins which are supposed to carry information about VOC (variant of concern) prevalence in the country are largely empty and lagging behind too - the last weekly bulletin came out on December 20, 2021 with no ‘National’ information (Figure 4).

Missing metadata
Scientists in South Africa alerted the world about Omicron because of a strong network of genomic surveillance. The alarm, in this case, came from the increasing number of cases that would often have aberrant features in the PCR and subsequently genomic profiles.
It is currently not clear what metadata is collected at the time genomic samples are sent for sequencing in India. Collecting more nuanced metadata about the sequenced sample - hospitalization status, vaccination status, and disease severity would help us monitor the variants more proactively and help make more informed policy decisions.
Filling in the gaps
What should we be collecting and reporting for the cases? At the very least, the states should provide the following information through dashboards and machine-readable (instead of images embedded in the tweets) format:
Total samples tested
Number of active cases
Number of discharged cases
Number of deaths
Number of vaccination doses administered (dose1/dose2)
Number of hospitalizations (vaccinated/unvaccinated)
Number of ICU admissions (vaccinated/unvaccinated)
This is achievable via a simple mechanism - use a spreadsheet to track daily counts of the above fields, which can then be pushed to a dashboard for visualization.
For genomic surveillance, besides requiring the sentinel sites to report for fine-grained metadata (hospitalization status, vaccination status, and disease severity at the minimum), we need to be more proactive in opening the sequencing data and the associated metadata to the public. A lag of 70 days in uploading (and analyzing) the samples makes the entire process of surveillance a futile exercise.
Status of Omicron surveillance in India
In hindsight, the emergence of Delta as a rapidly growing variant appears clear if we look at data deposited on GISAID (Figure 5). With the recent uptick in cases, we are trying to figure out the prevalence of Omicron in India. Between December 1st and 31st India had only deposited 0.3% of its total active cases to GISAID. Among the 1269 sequences deposited on GISAID, 204 (21%) were Omicron (Figure 6), though it is quite likely that the Omicron samples reflect a selection bias, particularly so given our median lag in uploading data to GISAID is 70 days. While we can rely on S gene target failure as a proxy to measure Omicron prevalence, no data about SGTF is made public.
Some labs seem to be more proactively uploading sequences to GISAID. While NCDC, which is leading INSACOG has deposited only 1 sequence (which also happens to be an Omicron sample), Gujarat Biotechnology Research Center has been consistently uploading sequences to GISAID (both Omicron and Delta reflecting a more uniform sentinel sampling).

Omicron being a less severe strain is reassuring and will hopefully not strain our health care system as much as Delta did, but the onset of this wave is a good reminder to not let our guards down and improve our data recording and reporting standards.