Capstone — putting it all together
You've covered 12 modules.
You've covered 12 modules. You know what a genome is, how it's sequenced, how variants are identified and interpreted, how gene expression works, how mutations cause disease, how population-scale studies find disease associations, how CRISPR rewrites DNA, how pharmacogenomics personalizes medicine, why genomic data is ethically fraught, how cancer genomics is transforming oncology, and where the frontier is heading.
The capstone module doesn't introduce new concepts. It does three things:
- Integrates — shows how the modules connect into a coherent picture of genomic medicine
- Applies — walks through three realistic case studies that require synthesizing knowledge across multiple modules
- Orients — gives you a framework for continuing to learn as the field moves
This module is deliberately harder than the others. The case studies are not solved by remembering facts — they require reasoning across multiple concepts under uncertainty, the way real clinical and research problems work.
---
Part 1: The map of genomic medicine
Before the case studies, here is how the modules connect:
The information layer (Modules 1–3): The genome is the information system. DNA encodes genes (M1). Variation between individuals — SNPs, indels, CNVs, structural variants — creates the raw material for both disease and evolution (M2). Sequencing technologies — Sanger, short-read NGS, long-read — are the tools that read the information out (M3).
The reference and interpretation layer (Modules 4–6): To make sense of sequencing data, you need a reference (M4) — and you need to understand that the reference encodes who has been studied, creating systematic gaps in our knowledge for non-European populations. Gene expression (M5) explains why identical genomes produce diverse cell types — the genome is not a static instruction set but a dynamic regulatory system. Variants cause disease through specific molecular mechanisms (M6) — LOF, GOF, haploinsufficiency, dominant negative — and are classified using databases (ClinVar) and clinical guidelines (ACMG/CPIC).
The population layer (Module 7): Most human traits are not Mendelian — they are polygenic, shaped by thousands of variants each contributing tiny effects. GWAS is the tool for finding those associations at scale. Polygenic scores aggregate them into predictions. Mendelian randomization uses them to make causal inferences. The ancestry bias in all of these — rooted in who participated in early studies — compounds the reference genome problem from M4.
The intervention layer (Modules 8–9): Once you understand the molecular mechanism of disease, you can intervene. CRISPR and its descendants (base editors, prime editors) edit DNA; RNA therapeutics (siRNA, ASOs, mRNA) act on RNA transiently. The right drug for the right patient depends on their genomic context — pharmacogenomics (M9) shows this concretely, from CYP450 metabolizer phenotypes to HLA hypersensitivity screening.
The governance layer (Module 10): All of this is embedded in a social, legal, and ethical context. Who owns genomic data? What protects patients from discrimination? What do historically marginalized communities have the right to control about research on their genomes? These questions don't have clean answers, and the regulatory frameworks are still catching up to the science.
The disease application layer (Module 11): Cancer is where genomics has had its most immediate clinical impact — precision oncology matches drugs to driver mutations, liquid biopsy enables real-time tumor monitoring, and clonal evolution explains why resistance is nearly inevitable.
The frontier (Module 12): RNA medicine is expanding the therapeutic toolkit beyond CRISPR. AI is transforming variant interpretation, drug discovery, and prognostic modeling — while also encoding and potentially amplifying the biases of the datasets it learned from. Epigenomic clocks reveal how social determinants get biologically embedded. The non-coding genome, the interactome, and causal mechanisms of GWAS loci remain largely dark.
---
Part 2: Three integrated case studies
These case studies are designed to require synthesis across multiple modules. Read each one, attempt your own reasoning, then work through the provided analysis.
---
Case Study 1: A child with unexplained disease
Presentation: A 4-year-old girl presents with progressive muscle weakness, developmental regression, and elevated serum creatine kinase (CK). Her parents are healthy and unrelated. Family history is unremarkable for neuromuscular disease. MRI shows muscle edema without atrophy.
Genomic workup: Whole-exome sequencing of the child and both parents (a trio) is ordered. The report returns:
- A de novo heterozygous missense variant in RYR1 (ryanodine receptor 1): c.14545C>T, p.R4849C. Not previously reported in ClinVar. Classified as VUS. gnomAD frequency: 0 (absent from all populations). CADD score: 28.4 (highly predicted deleterious). AlphaMissense: likely pathogenic (89% confidence).
- A paternally inherited heterozygous variant in DMD (dystrophin): c.9563G>A, p.R3188Q. ClinVar: Likely Benign. gnomAD: 0.003% in European population.
Questions to reason through:
Which variant is more likely to explain this child's disease — and why?
The RYR1 variant is the stronger candidate for several reasons:
- De novo inheritance: A de novo variant — present in the child but absent from both parents — immediately elevates suspicion for pathogenicity, because it cannot explain the healthy parents (M6). De novo dominant mutations are a major cause of severe pediatric disease.
- Gene-disease fit: RYR1 encodes the ryanodine receptor, a calcium release channel critical for muscle contraction. Dominant RYR1 mutations cause central core disease and malignant hyperthermia susceptibility — both presenting with muscle weakness and elevated CK, matching the phenotype.
- Variant characteristics: Absent from gnomAD (not observed in >700,000 alleles — extremely strong evidence against benign common variant status). CADD score of 28.4 is in the top 0.4% of predicted deleteriousness. AlphaMissense calls it likely pathogenic. The arginine at position 4849 is in the transmembrane domain critical for channel gating.
- VUS classification doesn't mean "probably benign": VUS means insufficient evidence for classification — not evidence of benignity (M6). The de novo occurrence plus absent population frequency plus computational evidence makes this VUS highly suspicious.
The DMD variant is almost certainly irrelevant: it is Likely Benign in ClinVar, present at low frequency in the population (meaning it's a known common variant, not a novel finding), and DMD-related muscular dystrophy is X-linked recessive — this girl would need both copies of DMD affected, which doesn't match (and her father carries it without disease) (M6, X-linked inheritance).
What additional evidence would you seek?
- Functional assay: Test whether the R4849C variant alters RyR1 calcium release in a cell model — in vitro functional data can reclassify a VUS
- Additional cases: Search literature and variant databases for other patients with RYR1 variants at the same position or domain with similar phenotypes
- RNA sequencing of muscle tissue: Check whether the variant affects splicing or expression
- Family functional testing: Even though parents don't carry the variant, testing whether their RYR1 channels function normally can establish baseline
What is the policy dimension?
This child's RYR1 variant will likely be reclassified from VUS to Likely Pathogenic as evidence accumulates — but that reclassification may take years. In the interim, clinical management proceeds without a confirmed genetic diagnosis. This is the VUS problem (M4, M6): it disproportionately affects patients whose variants aren't well-represented in databases, but it also affects any patient with a novel variant. The gap between "sequencing happened" and "sequencing is informative" remains a major challenge in clinical genomics.
---
Case Study 2: A treatment decision shaped by ancestry
Presentation: A 58-year-old woman of Han Chinese ancestry is diagnosed with hormone receptor-positive, HER2-negative early breast cancer. She undergoes lumpectomy. The oncologist recommends adjuvant tamoxifen for 5 years.
Pharmacogenomic context:
Before initiating tamoxifen, her institution's pre-emptive pharmacogenomics program returns her CYP2D6 result: she is a Poor Metabolizer (4/10 genotype).
Questions to reason through:
What does this result mean for tamoxifen efficacy?
Tamoxifen requires CYP2D6-mediated conversion to endoxifen, its active metabolite. As a Poor Metabolizer, this patient produces minimal endoxifen — insufficient for the antiproliferative effect in breast tissue that tamoxifen's clinical benefit depends on (M9). Multiple studies show CYP2D6 PM status is associated with higher breast cancer recurrence rates in tamoxifen-treated patients.
What should the oncologist do?
CPIC guidelines for CYP2D6 and tamoxifen (updated 2022) recommend: for CYP2D6 poor metabolizers with ER-positive breast cancer, consider switching to an aromatase inhibitor (AI) — anastrozole, letrozole, or exemestane — which does not require CYP2D6 metabolism and achieves equivalent or superior endocrine suppression in postmenopausal women (M9). If the patient is premenopausal, ovarian suppression combined with an AI is the recommended alternative.
What is the ancestry dimension?
CYP2D6*10 is a reduced-function allele found at ~50% frequency in East Asian populations — dramatically higher than its ~5% frequency in Europeans (M9). Without pre-emptive genotyping, a clinician might not recognize the elevated PM risk in a Han Chinese patient and prescribe tamoxifen without considering her metabolizer status.
This case illustrates a broader principle: standard oncology protocols were largely developed in trials with predominantly European-ancestry participants. Pharmacogenomic variation that differs by ancestry means those protocols may systematically underserve non-European patients if PGx is not integrated (M4, M7, M9 all intersect here).
What is the health equity dimension?
Pre-emptive PGx programs are available primarily at large academic medical centers. A patient at a community hospital — which serves a disproportionate share of minority and lower-income patients — is less likely to have access to pre-emptive genotyping. The patients most likely to have ancestry-specific pharmacogenomic variation relevant to standard treatments are among the least likely to have access to testing that would identify it. This is a structural inequity embedded in the deployment, not just the science, of pharmacogenomics.
---
Case Study 3: A GWAS finding and its journey to the clinic
Presentation: A large GWAS for type 2 diabetes (T2D) identifies a genome-wide significant SNP (rs10830963) in the MTNR1B gene (melatonin receptor 1B). The risk allele (G) has an odds ratio of 1.09 for T2D and is present in ~30% of Europeans and ~70% of East Asians (a striking frequency difference).
A pharmaceutical company proposes developing a MTNR1B antagonist to treat T2D, based on this GWAS signal.
Questions to reason through:
Does the GWAS association establish that MTNR1B is a good drug target?
The association establishes that common variation near MTNR1B is correlated with T2D risk. But the chain from "GWAS hit" to "validated drug target" requires additional evidence (M7):
- Is this SNP causal, or is it a tag? rs10830963 may be in LD with the actual causal variant. Fine-mapping studies suggest this particular SNP is indeed a likely causal variant (it's a coding variant affecting receptor function) — but this must be established, not assumed.
- What is the mechanism? The G allele of rs10830963 increases MTNR1B expression in pancreatic beta cells, reducing beta cell responsiveness to melatonin and impairing insulin secretion during nocturnal fasting. This mechanistic evidence (from functional studies, not the GWAS alone) supports MTNR1B as the affected gene and improves confidence in it as a target.
- Does Mendelian randomization support causality? If genetic instruments for higher MTNR1B expression are associated with higher T2D risk, that's MR evidence for a causal relationship (M7). This analysis has been done and supports a causal role.
- Is the effect size clinically meaningful? An odds ratio of 1.09 means carrying the risk allele increases T2D odds by 9%. This is a small individual effect — but it doesn't mean the pathway is unimportant. The effect size of the GWAS hit reflects the allele's contribution in the context of all other genetic and environmental factors; a drug targeting the pathway directly could have a much larger effect than the allele.
What does the East Asian frequency difference mean?
The risk allele is far more common in East Asians (~70%) than Europeans (~30%). This means:
- East Asian patients are at substantially higher background genetic risk from this locus alone
- A drug targeting MTNR1B might have its largest population impact in East Asian populations — which is a reason to include East Asian participants prominently in clinical trials
- Effect sizes estimated in European-ancestry GWAS may not apply equally in East Asian populations (M4, M7)
What are the preclinical and clinical steps needed?
- Establish MTNR1B antagonist improves insulin secretion in in vitro and animal models
- Confirm the relevant effect is in pancreatic beta cells (tissue-specific delivery or selectivity matters)
- Phase 1 safety and PK/PD trials — including participants of East Asian ancestry given the allele frequency difference
- Phase 2 proof-of-concept with glucose and insulin secretion endpoints
- Phase 3 cardiovascular outcomes trial (T2D drugs require cardiovascular safety demonstration post-2008 FDA guidance)
The governance question:
The GWAS data underlying this drug discovery program came from a large multi-ancestry biobank. Participants consented to "health research." Does drug discovery for profit fall within that consent? Under broad consent, probably yes legally — but ethically, many participants didn't envision their data contributing to drugs they might later be unable to afford. Benefit sharing frameworks (referenced in M10 for indigenous populations) are being discussed for general biobank participants — this case illustrates why.
---
Part 3: What comes next for you
The genomics field moves fast. A student who understood the field well in 2020 would need to significantly update their knowledge by 2025. This is not a reason for despair — it's a feature of being in a field that matters.
How to stay current:
- Read primary literature selectively. You don't need to read every paper — you need to read the landmark papers in the areas you care about. Nature, Nature Genetics, NEJM, Science, and Cell publish the most impactful genomics work. Their research digests and news sections translate findings accessibly.
- Follow GWAS Catalog, ClinVar, and CPIC updates. These databases change constantly as new findings are added and classifications are revised. Setting alerts for genes or conditions you care about keeps you current without reading everything.
- Track FDA approvals. Every genomic medicine that reaches approval represents a completed translational arc — from gene discovery to mechanism to drug to clinical trial to regulatory decision. FDA approval letters and labels are publicly available and technically detailed.
- Engage with preprints skeptically. bioRxiv and medRxiv publish genomics findings before peer review. Important results appear here first, but so do findings that don't replicate. Learn to read methods sections critically.
The policy frontier:
The scientific knowledge in this track will largely be supplanted within a decade. The policy and ethical frameworks will shape the field for generations. The questions that remain actively contested — who owns genomic data, how we ensure genomic medicine reaches all populations equitably, how we govern germline editing, how we regulate AI-based genomic tools — are where your generation will make decisions that matter.
Zylif was built because the students who will be in those rooms need a foundation that most curricula don't provide. You now have that foundation.
---
Answer these without looking back — then revisit any you're unsure of.
- A 2026 biobank study sequences 200,000 individuals from 15 countries, predominantly lower-income nations in Africa, South Asia, and Latin America. It is funded by a consortium of European and American pharmaceutical companies. The consent model is broad consent translated into local languages. Data is stored on servers in the United States. Design the ethical framework this study should operate under — drawing on OCAP principles, GINA's limitations, the Havasupai case, and the gnomAD diversity gap. What would you require before the study begins?
- A patient with metastatic pancreatic cancer has a tumor with: KRAS G12D mutation, TP53 loss, SMAD4 loss, high TMB (22 mut/Mb), and no MSI. Their germline sequencing shows a BRCA2 pathogenic variant. They have failed two lines of standard chemotherapy. Walk through the precision oncology options in order of evidence strength, explaining the mechanism behind each recommendation.
- The year is 2035. Base editing for sickle cell disease is now available globally at a cost of $150,000. CRISPR germline editing for the same disease is technically feasible and could eliminate the disease from a family line permanently with one intervention. A couple who are both sickle cell carriers asks their genetic counselor whether to use PGT-A (preimplantation genetic testing to select unaffected embryos), treat any affected children with base editing, or pursue germline editing of embryos. Walk through the ethical, medical, and policy considerations that should inform this conversation — without prescribing a single "right" answer.
- You are advising the FDA on a regulatory framework for AI-based tools that interpret whole-genome sequencing data for clinical use. The tools perform better than human experts for some variant classes and worse for others. They perform significantly better on European-ancestry samples than on African or South Asian samples. Propose a regulatory framework that addresses accuracy, bias, transparency, and post-market surveillance — drawing on what you know about existing regulatory gaps in genomics. ---
Primary sources & references
The following are the primary sources cited across all 13 modules, consolidated for reference.
Foundational genomics:
- Watson, J. D. & Crick, F. H. C. (1953). Nature, 171, 737–738.
- International Human Genome Sequencing Consortium (2001). Nature, 409, 860–921.
- 1000 Genomes Project Consortium (2015). Nature, 526, 68–74.
- Crick, F. (1970). Nature, 227, 561–563.
Sequencing:
- Sanger, F. et al. (1977). PNAS, 74, 5463–5467.
- Metzker, M. L. (2010). Nature Reviews Genetics, 11, 31–46.
- Nurk, S. et al. (2022). Science, 376, 44–53. (T2T)
Reference genome and diversity:
- Liao, W-W. et al. (2023). Nature, 617, 312–324. (pangenome)
- Popejoy, A. B. & Fullerton, S. M. (2016). Nature, 538, 161–164.
- Chen, S. et al. (2024). Nature, 625, 92–100. (gnomAD v4)
Gene expression and epigenetics:
- Pan, Q. et al. (2008). Nature Genetics, 40, 1413–1415.
- GTEx Consortium (2020). Science, 369, 1318–1330.
- Roadmap Epigenomics Consortium (2015). Nature, 518, 317–330.
Variants and disease:
- Knudson, A. G. (1971). PNAS, 68, 820–823.
- Richards, S. et al. (2015). Genetics in Medicine, 17, 405–424.
- Lek, M. et al. (2016). Nature, 536, 285–291. (gnomAD/pLI)
- Landrum, M. J. et al. (2016). Nucleic Acids Research, 44, D862–D868.
GWAS and complex traits:
- Visscher, P. M. et al. (2017). AJHG, 101, 5–22.
- Yengo, L. et al. (2022). Nature, 610, 704–712.
- Duncan, L. et al. (2019). Cell, 178, 1–12.
- Boyle, E. A. et al. (2017). Cell, 169, 1177–1186.
- Davey Smith, G. & Hemani, G. (2014). HMG, 23, R89–R98.
CRISPR and editing:
- Jinek, M. et al. (2012). Science, 337, 816–821.
- Komor, A. C. et al. (2016). Nature, 533, 420–424.
- Anzalone, A. V. et al. (2019). Nature, 576, 149–157.
- Frangoul, H. et al. (2021). NEJM, 384, 252–260.
- Lander, E. S. et al. (2019). Nature, 567, 165–168.
Pharmacogenomics:
- Relling, M. V. & Evans, W. E. (2015). Nature, 526, 343–350.
- Mallal, S. et al. (2008). NEJM, 358, 568–579.
- Relling, M. V. et al. (2022). Clin Pharm Ther, 111, 1296–1302.
Ethics and governance:
- Erlich, Y. et al. (2018). Science, 362, 690–694.
- Garrison, N. A. et al. (2019). Nature Reviews Genetics, 20, 256–263.
- Hudson, M. et al. (2020). Data Science Journal, 19, 43.
- Green, R. C. et al. (2013). Genetics in Medicine, 15, 565–574.
Cancer genomics:
- Alexandrov, L. B. et al. (2013). Nature, 500, 415–421.
- Vogelstein, B. et al. (2013). Science, 339, 1546–1558.
- Le, D. T. et al. (2017). Science, 357, 409–413.
- Klein, E. A. et al. (2021). Annals of Oncology, 32, 1167–1177.
RNA medicine and AI:
- Adams, D. et al. (2018). NEJM, 379, 11–21.
- Finkel, R. S. et al. (2017). NEJM, 377, 1723–1732.
- Jumper, J. et al. (2021). Nature, 596, 583–589.
- Cheng, J. et al. (2023). Science, 381, eadg7492.
- Horvath, S. (2013). Genome Biology, 14, R115.