Lineages at a Glance
When presented with an overwhelming amount of information, how can we understand what’s actually important? Since the onset of the SARS-CoV-2 pandemic, researchers who have been examining the genome of the virus across its many variants have been both helped and hindered by the unprecedented amount of data that has been collected on the disease. As part of our ongoing effort to create a suite of tools for working with SARS-CoV-2 sequencing data, we saw the challenge of the ever-shifting nature of the virus as an opportunity to help our collaborators hone in on the most relevant mutations when presented with a new lineage.

From the data available, it’s easy enough to learn what mutations a set of lineages have in common if you have a specific set of lineages and a mutation already in mind - for example, that the ORF1a:T3255I mutation is shared by the BQ.1, BA.2.75, XBB, XBB.1.9.1, EG.1, XBF, and BF.7 SARS CoV-2 lineages. But that statement alone leaves out major pieces of the puzzle. How else are these lineages similar? How do they differ? What other lineages share this mutation? Is this mutation even important?

We created Lineage Portraits to quickly view the differences and similarities in mutations between lineages, while at the same time highlighting the potential impact of those mutations on the evolution and spread of the virus.

Continuing to use the ORF1a:T3255I mutation as an example, you can see the same information noted above - that our specified mutation is shared by a particular set of lineages - with the added context of how these lineages stack up against each other more broadly.

Two contrasting color palettes are used to draw attention to how the lineages in view differ from one another. Mutations that are not shared across all the lineages present are flagged in warm reds and oranges, while consistently present mutations show up in the cooler blues. Now - at a glance - we can see the full picture of how these lineages compare to one another.

The S:L452R mutation appears in red as it is only present in two of the seven selected lineages.
The S:P681H mutation and its surrounding mutations are blue — indicating a region of high similarity across the seven selected lineages.

However, for researchers in the field, the most pressing question when faced with an emerging lineage is often “Are we prepared to fight this variant?” - something mutation comparison between lineages alone cannot answer. As not all mutations and locations in the genome pose the same level of risk, we applied PyR0 and Fitness Effect scores to indicate the potential transmissibility and resilience of a variant. Both warm and cool colors map to general assessments of the importance of each mutation - with lighter colors indicating mutations with lower scores and darker colors indicating mutations with higher scores.

With this added layer of information, we can see our example of ORF1a:T3255I among these lineages with even more clarity — it is visibly distinguished as a mutation of high potential impact, which may be why it’s being preserved across lineages that are only distantly related to each other.

This flattened view with no scoring indicator selected provides users with the information of which mutations are present in each lineage, but without the hierarchy of relative importance the color palettes provide.

While a handful of naming conventions have been developed to label new lineages through reference to their descent from existing lineages, such classifications quickly become too layered and complex to convey meaningful insights into these relationships beyond a few generations. Knowing where a lineage came from is a far cry from understanding that lineage. From developing effective vaccines to implementing prevention policies, maximizing the success of public health interventions is dependent on the early detection and understanding of emerging variants. As a launch pad for investigation, we hope that Lineage Portraits will shorten the gaps between data gathering, analysis, and response.

As we deepen our collaboration with the Sabeti Lab, we hope to continue developing useful tools to understand complexity in service of fighting the spread of disease.

We’d love to hear what you’re working on, what you’re intrigued by, and what messy data problems we can help you solve. Find us on the web, drop us a line at hello@fathom.info, or subscribe to our newsletter.