Genomics12 of 1340 minModules 1–11

The frontier — RNA medicine, AI, and the genomics of the near future

Every module so far has described established science — things we know, tools that work, clinical applications that exist.

Start here

Every module so far has described established science — things we know, tools that work, clinical applications that exist. Module 12 is different. It describes the leading edge: technologies that have proven themselves in the last five years, and trajectories that are reshaping what genomic medicine will look like by 2030.

The boundaries here are real. Some of what this module describes is in clinical trials. Some is in Phase 3. Some has been FDA-approved in the last two years. All of it was theoretical ten years ago.

Understanding the frontier matters not just because the tools are exciting — it matters because the policy frameworks governing these technologies are being written right now, by people who may or may not understand the science. The students who go through Zylif are the generation that will be in those rooms.

By the end of this module you should be able to answer:

What are the major classes of RNA therapeutics and how does each work?
What did mRNA vaccines teach us about nucleic acid medicine?
What is AlphaFold and why was it called a "solution to a 50-year-old problem"?
What can AI currently do in genomics, and where does it fail?
What is the epigenome clock and what does it tell us about aging?
What are the most important unsolved problems in genomics?

---

RNA therapeutics — the class beyond CRISPR

While CRISPR edits DNA permanently, a complementary revolution has been building in RNA therapeutics — drugs that act on RNA rather than protein, enabling disease treatment without permanently altering the genome.

RNA therapeutics include several mechanistically distinct classes:

mRNA therapeutics: The COVID-19 vaccines demonstrated at scale that synthetic mRNA, delivered via lipid nanoparticles, could instruct human cells to produce a specific protein. The mRNA is transient — it degrades within days — but the protein it produces can be immunogenic (vaccines), therapeutic (protein replacement), or informational (cancer neoantigens).

The next wave of mRNA therapeutics extends this logic:

mRNA protein replacement: Delivering mRNA encoding functional proteins to patients whose own genes produce non-functional versions. Moderna and others are developing mRNA therapeutics for rare diseases including propionic acidemia and methylmalonic acidemia (deficiencies in metabolic enzymes).
Personalized cancer vaccines: Tumor neoantigens (Module 11) can be encoded in individualized mRNA vaccines, training the immune system to recognize a specific patient's tumor. Moderna/Merck's mRNA-4157 in combination with pembrolizumab showed a 44% reduction in recurrence in high-risk melanoma in a Phase 2 trial (2023). Phase 3 trials are ongoing.

siRNA (small interfering RNA): Double-stranded RNA molecules, ~21 nucleotides, that trigger degradation of a complementary mRNA sequence through the RNA interference (RNAi) pathway. siRNA can silence any gene for which you can design a complementary sequence.

Approved siRNA drugs:

Patisiran (Onpattro, 2018): First approved RNAi therapeutic. Silences TTR mRNA in the liver, reducing toxic transthyretin protein accumulation in hereditary ATTR amyloidosis. Delivered via LNP.
Inclisiran (Leqvio, 2021): Silences PCSK9, a liver-expressed gene that degrades LDL receptors. Two injections per year reduce LDL cholesterol by ~50%. Dramatically more convenient than monthly injections of PCSK9 antibodies.
Multiple additional siRNA drugs approved or in late-stage trials for conditions including hyperoxaluria, hepatitis B, and hemophilia.

ASOs (antisense oligonucleotides): Single-stranded synthetic DNA or RNA analogs that bind to complementary RNA sequences and direct their degradation (via RNase H) or block their translation. Unlike siRNA, ASOs can also block splicing — useful for redirecting alternative splicing in diseases caused by splicing errors.

Nusinersen (Spinraza, 2016): Treats spinal muscular atrophy (SMA) by redirecting splicing of the SMN2 gene to include exon 7, producing more functional SMN protein. Delivered intrathecally (directly into cerebrospinal fluid). Transformed outcomes for SMA — previously the most common genetic cause of infant death.
Tofersen (Qalsody, 2023): Silences SOD1 mRNA in familial ALS caused by SOD1 mutations. First approved ASO for ALS.

The chemical engineering challenge: Naked RNA is rapidly degraded in the body, doesn't cross cell membranes efficiently, and triggers immune responses. The clinical success of RNA therapeutics required decades of chemical modification work: 2'-modifications, phosphorothioate backbones, GalNAc conjugation (for liver targeting), and LNP formulation development. This engineering is as important as the biology.

---

The delivery frontier

A theme across Modules 8 and 12: the biology of genome and RNA medicine is increasingly solved; delivery to the right tissue remains the limiting factor.

Current tissue tropism:

Liver: Best-served by LNPs (systemically administered, naturally taken up by hepatocytes) and GalNAc-conjugated ASOs/siRNAs (GalNAc is a sugar recognized by the asialoglycoprotein receptor on liver cells). The liver is why patisiran, inclisiran, and TTR-targeting CRISPR therapies all work.
CNS: Requires intrathecal delivery (directly into CSF, bypassing the blood-brain barrier) or direct injection. Nusinersen and tofersen use this route. Systemic delivery to the brain remains largely unsolved.
Muscle: AAV serotypes (AAV9, AAVrh74) can reach muscle after IV administration. Gene therapy for Duchenne muscular dystrophy (micro-dystrophin) uses this route.
Lung: Inhaled LNPs can reach pulmonary epithelium — relevant for cystic fibrosis gene therapy.
Eye: Intravitreal injection delivers directly to retinal cells. Several approved gene therapies use this route (Luxturna for RPE65-associated retinal dystrophy).

The most important unsolved delivery problems are the brain (for neurological disease), heart (for genetic cardiomyopathies), and immune cells beyond T cells.

---

AlphaFold and the protein structure revolution

Proteins fold into three-dimensional structures that determine their function. Predicting that structure from amino acid sequence alone — the protein folding problem — was one of the central unsolved challenges in biology for 50 years. Experimental structure determination (X-ray crystallography, cryo-EM) takes months to years per protein and doesn't scale to the entire proteome.

In December 2020, DeepMind's AlphaFold2 was unveiled at the Critical Assessment of Protein Structure Prediction (CASP14) competition — a biennial benchmark where teams attempt to predict protein structures from sequence. AlphaFold2 achieved accuracy rivaling experimental methods for most proteins. It was widely described as the solution to a 50-year problem.

In 2022, DeepMind and EMBL released the AlphaFold Protein Structure Database: predicted structures for over 200 million proteins — essentially the entire known proteome across ~1 million organisms. This is freely available.

Consequences for genomics:

Variant interpretation: For any missense variant in a protein, you can now predict how it changes the 3D structure — does it disrupt a binding site? destabilize a critical fold? This provides mechanistic evidence for VUS reclassification.
Drug discovery: Predicting binding pockets and designing small molecules that fit them is dramatically accelerated when you know the target structure. Structure-based drug design timelines have compressed.
Understanding non-coding variants: AlphaFold's successor systems (ESMFold, RoseTTAFold) are being extended to RNA structure prediction, which may help interpret non-coding regulatory variants.

What AlphaFold cannot yet do:

Predict protein-protein complex structures with full accuracy (though AlphaFold-Multimer addresses this partially)
Predict intrinsically disordered proteins (which have no stable fold)
Predict how proteins move and change shape during function (dynamics)
Predict the effect of post-translational modifications on structure

The successor to AlphaFold2, AlphaFold3 (2024), extends predictions to protein-DNA, protein-RNA, and protein-small molecule complexes — directly relevant to drug binding prediction.

---

AI in genomics — what works and what doesn't

Machine learning has become embedded in genomics at multiple levels. Understanding what AI actually does well and where it systematically fails is increasingly important for anyone who will use or regulate these tools.

What AI does well in genomics:

Variant effect prediction: Models like DeepVariant (Google) significantly outperform traditional algorithms in calling SNPs and indels from sequencing data. AlphaMissense (2023, DeepMind) classified 71 million possible missense variants as likely pathogenic or benign with accuracy comparable to expert curation.
Regulatory element prediction: Models like Enformer predict gene expression from DNA sequence alone, learning the regulatory grammar of enhancers and promoters from large datasets of functional genomics data.
Polygenic score construction: ML-based PGS methods (LDpred2, PRSice, BayesR) outperform simple p-value-thresholding approaches by better modeling effect sizes and LD.
Cancer classification from sequencing data: ML models can classify tumor type from somatic mutation patterns with high accuracy — relevant for cancers of unknown primary.
Pathology: Deep learning on histology images can predict genomic features (MSI status, BRAF mutation) directly from tumor slides, without molecular testing.

Where AI fails in genomics — and why it matters:

Ancestry bias: Models trained predominantly on European-ancestry genomic data perform worse on non-European samples — for the same reasons PGS does (Module 7). AlphaMissense's training data skewed European; its performance is lower for variants common in African or South Asian populations. This is the Module 4 problem, embedded in ML.
Interpretability: Most deep learning models in genomics are black boxes. A model that predicts a variant as pathogenic cannot fully explain why — making it hard to catch errors, audit for bias, or build clinical trust.
Distribution shift: Models trained on data from academic medical centers fail when deployed in community hospitals with different sequencing platforms, patient populations, or disease frequencies. This is not hypothetical — it has caused clinical errors.
Spurious correlations: Genomic datasets contain many confounders. A model might learn that patients of a particular ancestry have higher rates of a disease — and use ancestry as a proxy in ways that are both scientifically wrong and ethically problematic.

The AI regulation gap in genomics (introduced in Module 10) is particularly acute for ML-based variant interpretation tools, which are being used clinically without the regulatory framework that governs laboratory-developed tests or medical devices.

---

The epigenome, aging, and biological clocks

In Module 5 you learned that the epigenome — patterns of DNA methylation and histone modification — regulates gene expression across cell types. It turns out the epigenome also encodes biological age.

Epigenetic clocks are mathematical models that predict biological age from DNA methylation patterns. Steve Horvath's 2013 pan-tissue clock, trained on methylation data from multiple tissues and ages, could predict chronological age from blood or tissue methylation data with a median error of 3.6 years.

More important than predicting age is what happens when predicted age diverges from chronological age — epigenetic age acceleration:

Individuals whose epigenetic age exceeds their chronological age have higher all-cause mortality, increased disease risk, and accelerated cognitive decline
Accelerated epigenetic aging is associated with chronic stress, smoking, obesity, socioeconomic disadvantage, and racial discrimination (measured independently of other risk factors)
Individuals whose epigenetic age is younger than their chronological age tend to be healthier and longer-lived

Newer clocks (PhenoAge, GrimAge, DunedinPACE) are trained on mortality and disease outcomes rather than chronological age and are better predictors of health than the original Horvath clock.

Why this matters:

Epigenetic clocks provide a measurable, modifiable readout of biological aging — a potential endpoint for anti-aging interventions
They encode the biological embedding of social determinants: poverty, discrimination, and stress literally accelerate the epigenome's aging, providing a molecular mechanism linking social disadvantage to health outcomes
They are increasingly being used as endpoints in clinical trials of interventions ranging from diet to metformin (a diabetes drug being tested for longevity effects in the TAME trial)

This connects genomics to one of the most profound questions in modern biology: what is aging, and can we slow it?

---

The unsolved problems

Genomics has accomplished more in the last 25 years than most scientists thought possible. It has also revealed how much remains unknown.

The non-coding genome: ~98% of the human genome doesn't encode protein. We have functional annotation for only a fraction of it. Enhancers, silencers, long non-coding RNAs, regulatory RNA structures — their functions are mapped incompletely. Most GWAS hits fall in this dark matter, and we don't know what most of them do.

Gene-environment interaction: Complex traits emerge from the interaction of thousands of genetic variants with each other and with a lifetime of environmental exposures. We can measure genes at a single point in time, but we can't yet integrate them with dynamic environmental data across a lifetime. Longitudinal omics — repeatedly measuring genome, epigenome, transcriptome, proteome, and metabolome in the same individuals — is the methodological frontier.

Single-cell heterogeneity and spatial genomics: scRNA-seq revealed that tissues are far more cellularly heterogeneous than suspected. Spatial transcriptomics (measuring gene expression at specific locations within tissue, maintaining spatial information) is now revealing how cell identity and communication depend on tissue architecture. We are only beginning to understand how spatial organization of gene expression shapes development and disease.

The RNA world: Most genomics has focused on DNA and mRNA. Non-coding RNAs — lncRNAs, circRNAs, small nucleolar RNAs — are pervasive, often conserved, and largely of unknown function. The regulatory logic of the transcriptome beyond protein-coding genes is substantially unmapped.

Protein-protein interaction networks: The function of any gene product depends on what it interacts with. The human interactome — the complete map of protein-protein interactions — is estimated to contain 300,000–600,000 interactions, of which only ~100,000 have been experimentally characterized. Most disease biology occurs in interaction networks we don't fully understand.

Causality vs. association: GWAS identifies associations; establishing causality requires functional experiments or Mendelian randomization. For most of the thousands of GWAS loci, the causal gene, causal variant, causal mechanism, and causal cell type remain unknown. The gap between "associated with disease" and "here is the mechanism and therapeutic target" is still enormous for most complex traits.

---

Check yourself

1. A pharmaceutical company wants to treat a patient with a gain-of-function mutation in a liver-expressed gene causing a metabolic disease. They are choosing between an siRNA approach and a base-editing approach. What are the key tradeoffs — durability, delivery, safety profile, reversibility — that should inform this choice?

2. AlphaMissense classifies a missense variant in a cardiac ion channel gene as "likely pathogenic" with 94% confidence. The variant is in a South Asian patient and is rare in gnomAD's South Asian population. What additional steps should a clinical team take before acting on the AlphaMissense classification? What are the specific failure modes to check for?

3. An epigenetic clock study reports that patients of lower socioeconomic status have epigenetic ages 4 years older than their chronological age compared to high-SES patients of the same chronological age. A policy analyst claims this means poverty "gets into" the genome. Is this claim biologically accurate? What is the correct mechanistic description, and what are the policy implications?

4. A company develops a multi-cancer early detection blood test with 70% sensitivity and 99% specificity across 50 cancer types. They market it as a population screening tool. You are advising an insurer on whether to cover it. Walk through the clinical and health economics considerations that should determine your recommendation.

---

Key facts to remember

RNA therapeutics classes: mRNA (transient protein expression), siRNA (RNAi-mediated mRNA degradation), ASOs (degradation or splicing modulation)
Patisiran (2018): first approved siRNA drug; inclisiran (2021): PCSK9 silencing, 2x/year dosing
Nusinersen (2016): ASO splicing redirection for SMA — first approved RNA therapeutic
mRNA cancer vaccines: Moderna/Merck mRNA-4157 in Phase 3 for melanoma (2023 Phase 2 data: 44% recurrence reduction)
AlphaFold2 (2020): solved protein structure prediction for most proteins; AlphaFold DB: 200M+ structures freely available
AlphaMissense (2023): classified 71M missense variants as likely pathogenic/benign
Epigenetic clocks: methylation-based biological age prediction; epigenetic age acceleration correlates with mortality and encodes social determinants
AI in genomics: works well for variant calling, regulatory prediction, cancer classification; fails due to ancestry bias, distribution shift, interpretability gaps
Major unsolved problems: non-coding genome function, GxE interaction, spatial genomics, RNA world, interactome, GWAS-to-mechanism gap

---

Primary sources & references

Adams, D. et al. (2018). "Patisiran, an RNAi therapeutic, for hereditary transthyretin amyloidosis." NEJM, 379, 11–21.
Finkel, R. S. et al. (2017). "Nusinersen versus sham control in infantile-onset spinal muscular atrophy." NEJM, 377, 1723–1732.
Jumper, J. et al. (2021). "Highly accurate protein structure prediction with AlphaFold." Nature, 596, 583–589.
Cheng, J. et al. (2023). "Accurate proteome-wide missense variant effect prediction with AlphaMissense." Science, 381, eadg7492.
Kelley, D. R. et al. (2023). "Sequence modeling and design from molecular to genome scale with Evo." bioRxiv.
Horvath, S. (2013). "DNA methylation age of human tissues and cell types." Genome Biology, 14, R115.
Belsky, D. W. et al. (2022). "DunedinPACE: a DNA methylation biomarker of the pace of aging." eLife, 11, e73420.
Liu, Z. et al. (2023). "Underlying features of epigenetic aging clocks in vivo and in vitro." Aging Cell, 22, e13852.