Genomics6 of 1340 minModules 1–5 (genome basics, variants, sequencing, reference genomes, gene expression)

Mutations and disease — from variants to mechanisms

In Module 2 you learned that any two humans differ at roughly 4–5 million positions.

Start here

In Module 2 you learned that any two humans differ at roughly 4–5 million positions. In Module 5 you learned how genes get expressed into functional proteins. Now the question is: what happens when a variant disrupts that process?

The answer isn't simple. Not all variants cause disease. Most don't. Of those that do, the mechanism varies enormously — some variants destroy a protein's function entirely, others hyperactivate it, others alter where or when it's expressed. Understanding these mechanisms is the difference between being able to read a genetics paper and actually understanding what it's telling you.

This module is where Module 2 (variants) and Module 5 (expression) converge into real disease biology.

By the end of this module you should be able to answer:

  • What is the difference between a benign variant and a pathogenic one?
  • What are the main molecular mechanisms by which mutations cause disease?
  • What is the difference between dominant and recessive inheritance — at the molecular level?
  • What is haploinsufficiency and how does it differ from a dominant negative effect?
  • How do somatic vs. germline mutations differ in their disease consequences?
  • What is OMIM and how is it used clinically?

---

Most variants don't cause disease — here's why

The human genome is 3.1 billion base pairs. Any two unrelated people differ at roughly 4–5 million positions. If every variant caused disease, the species would not exist.

Four molecular mechanisms of disease A two by two grid drawing each mechanism as the protein itself. Loss of function: a normal teal enzyme binds its amber substrate while a gray mutant cannot. Constitutive activation: a normal protein needs a signal while a coral mutant radiates signal nonstop. Dominant negative: a four-subunit complex with one coral mutant subunit is entirely dead. Toxic aggregation: a misfolded protein forms an insoluble coral clump. Four ways a mutation causes disease (the protein is drawn, not labelled)Loss of functionprotein can’t work — no outputConstitutive activationstuck “on”, signals nonstopDominant negativemutant poisons the whole complexToxic aggregationmisfolds and clumps, harming the cellnormalmutant: inertnormal: signal neededmutant: always onone mutant subunit (coral) in a 4-part complexwhole complex deadinsolubleclump

The vast majority of variants are benign for several reasons:

Redundancy in the genetic code: 64 codons encode only 20 amino acids. This means many single nucleotide changes in a coding sequence don't change the amino acid at all — they're synonymous variants (also called silent mutations). A change from GAA to GAG still codes for glutamic acid. No protein change, no consequence.

Tolerance in protein sequence: Most proteins can tolerate amino acid substitutions at many positions without losing function. Evolution has been experimenting with protein sequences for billions of years — positions where substitution is tolerable show natural variation across species; positions that are critical are highly conserved.

Variants in non-functional sequence: A significant fraction of the genome doesn't encode genes or regulatory elements. Variants in these truly inert regions have no consequence.

Minor allele frequency: Many variants that do slightly alter protein function are common in the population — they've persisted because the effect is small, recessive, or only relevant in specific environmental contexts.

The question "is this variant pathogenic?" is therefore not a yes/no binary — it's a probabilistic assessment that considers the type of change, the position in the gene, the gene's tolerance for variation, and the population frequency of the variant.

---

The spectrum of variant types

Before discussing mechanisms of disease, you need to understand the types of mutations that exist.

At the nucleotide level:

  • Synonymous (silent): Nucleotide change, same amino acid. Generally benign, though some affect splicing.
  • Missense: Nucleotide change, different amino acid. May be benign or pathogenic depending on position and amino acid properties.
  • Nonsense: Nucleotide change creates a premature stop codon. Usually causes loss of function via nonsense-mediated decay (the cell detects and destroys truncated mRNA) or produces a truncated, non-functional protein.
  • Frameshift: Insertion or deletion of nucleotides not divisible by 3, shifting the reading frame. Alters every amino acid downstream of the mutation; almost always damaging.
  • Splice site: Variant at the boundary between exon and intron. Disrupts the spliceosome's recognition signals, causing intron retention, exon skipping, or activation of cryptic splice sites.

At a larger scale:

  • Copy number variants (CNVs): Duplications or deletions of entire genomic segments, from kilobases to megabases. Can delete or amplify entire genes or sets of genes.
  • Structural variants (SVs): Inversions, translocations, complex rearrangements. Can disrupt gene structure or move regulatory elements away from the genes they control.

Different variant types have very different probabilities of being pathogenic. Nonsense and frameshift variants in genes that are intolerant to loss-of-function are almost always damaging. Missense variants are the hardest to interpret — the majority of VUS (variants of uncertain significance) in clinical reports are missense.

---

Loss of function — when a gene stops working

The simplest disease mechanism is loss of function (LOF): a variant reduces or eliminates the activity of a protein.

Haploinsufficiency and the pLI score A dosage spectrum from zero to one hundred percent protein output. A dosage-tolerant gene with pLI near zero is fine with one copy lost at fifty percent output. A haploinsufficient gene with pLI near one, such as BRCA1 or NF1, develops disease when one copy is lost because fifty percent output is not enough. gnomAD's pLI score flags which genes cannot tolerate losing even one copy. Gene dosage and the pLI score0%both copies lost50%one copy lost100%both workingDosage-tolerant genepLI ≈ 0 — 50% protein is plentyone copy loss: fineHaploinsufficient genepLI ≈ 1 — needs both copies (BRCA1, NF1)one copy loss: diseasegnomAD’s pLI flags which genes can’t tolerate losing even one copy — close to 1 means dosage-critical.

LOF can occur through multiple molecular paths:

  • A nonsense variant creates a premature stop → the mRNA is degraded by nonsense-mediated decay → no protein is made
  • A missense variant changes an amino acid in the protein's active site → the protein is made but cannot perform its function
  • A splice site variant causes intron retention → the reading frame is disrupted → no functional protein
  • A large deletion removes the gene entirely

The disease consequence of LOF depends critically on the gene and the inheritance pattern.

Recessive LOF: In recessive diseases, you need both copies of a gene to be non-functional to get disease. One working copy is sufficient to maintain normal function. Cystic fibrosis is the classic example: mutations in both copies of CFTR cause disease; carriers with one functional copy are healthy. This makes biological sense — the cell has enough protein from one functional copy.

Haploinsufficiency: Some genes are so dosage-sensitive that losing even one copy causes disease — having 50% of normal protein output is not enough. This is called haploinsufficiency. Examples include:

  • BRCA1/2 in hereditary breast/ovarian cancer — one functional copy is not enough for adequate DNA repair in some tissues
  • NF1 in neurofibromatosis — one functional copy doesn't provide enough neurofibromin to suppress tumor growth
  • Many transcription factors, which often must be present at precise concentrations to activate downstream targets correctly

Haploinsufficient genes are identified in gnomAD by their pLI score (probability of being loss-of-function intolerant) — a score close to 1 means the gene is highly intolerant to heterozygous LOF variants, suggesting haploinsufficiency.

---

Gain of function — when a gene becomes too active

Some mutations don't eliminate a protein's function — they change it in a way that produces a new, harmful activity. These gain-of-function (GOF) mutations are often dominant: even one mutant copy, alongside one normal copy, causes disease.

GOF mechanisms include:

Constitutive activation: A protein that is normally active only when a signal is present becomes permanently "on." The clearest examples come from oncology — mutations in KRAS (found in ~25% of all cancers) lock the RAS protein in a GTP-bound, active state, permanently driving cell proliferation regardless of growth signals.

Novel interaction: A mutant protein acquires the ability to interact with new partners, hijacking cellular pathways it normally wouldn't touch. The BCR-ABL fusion in chronic myelogenous leukemia (CML) creates a constitutively active kinase with novel substrates.

Toxic aggregation: A mutant protein misfolds and aggregates, and the aggregates are toxic to the cell. This is the mechanism in several neurodegenerative diseases — mutant huntingtin in Huntington's disease forms nuclear inclusions that impair transcription.

Dominant negative: A particularly important GOF subtype. A mutant protein not only loses its function but actively interferes with the function of the normal protein produced by the other allele. This happens frequently in proteins that work as multimers (complexes of multiple subunits). If one mutant subunit is incorporated into the complex, it can poison the entire complex's function — meaning the dominant negative mutant is worse than a simple LOF.

Classic example: TP53 mutations. p53 normally forms tetramers (complexes of four subunits). A dominant negative TP53 mutation produces a mutant p53 that incorporates into tetramers with normal p53, inactivating the entire complex's tumor suppressive function. This is why dominant negative TP53 mutations often give a more aggressive cancer phenotype than simple LOF.

---

Dominant vs. recessive — the molecular logic

You've probably heard "dominant" and "recessive" as inheritance patterns. But it's worth understanding what they mean at the molecular level, because the mechanism varies.

Dominant vs recessive at the molecular level A table showing two gene copies and the outcome. Recessive with one working copy is healthy; recessive with both broken is affected. Dominant haploinsufficient is affected because fifty percent output is not enough. Dominant gain of function is affected because the mutant copy actively causes harm. A note explains X-linked inheritance: males have only one X, so a recessive X variant acts dominant in males. Why one bad copy sometimes matters — and sometimes doesn’tRecessivecopy 1copy 2healthyone good copy makes enough proteinRecessivecopy 1copy 2affectedboth copies broken — no functionDominant (haploinsufficient)copy 1copy 2affected50% output isn’t enoughDominant (gain of function)copy 1copy 2affectedmutant copy actively causes harmX-linked: males have only one Xa recessive X variant is recessive in females (second X) but acts dominant in males — e.g. hemophilia, Duchennegreen/teal = working copy · coral = broken copy

Recessive: Disease requires two damaged copies. Usually LOF in a gene where one functional copy is sufficient.

Dominant (haploinsufficient): Disease requires only one damaged copy, because 50% protein output is insufficient for normal function.

Dominant (gain of function): Disease requires only one mutant copy, because the mutant protein actively causes harm — whether through constitutive activation, dominant negative effect, or toxic aggregation. Having a normal second copy doesn't rescue function because the mutant protein is actively sabotaging it.

De novo dominant: Some dominant mutations appear in an individual with no family history — they arose as new mutations in that person's germline or early development. This is increasingly recognized as a major cause of severe pediatric genetic disease (autism, intellectual disability, epilepsy). With whole-genome sequencing of parent-child trios, de novo mutations are now identifiable even without family history.

X-linked: Genes on the X chromosome follow different rules. Males have one X (XY) and females have two (XX). A recessive mutation on the X is recessive in females (who have a second X copy) but effectively dominant in males (who have no second copy). This is why X-linked recessive diseases like Duchenne muscular dystrophy and hemophilia affect males almost exclusively.

---

Germline vs. somatic mutations

Every mutation discussed so far has assumed the variant is present in every cell of the body — it was inherited or arose early enough in development to be in the germline. These are germline mutations.

But mutations also accumulate throughout life in individual cells. These somatic mutations are not heritable — they only exist in the cells descended from the cell where the mutation arose. They are the primary driver of cancer.

Germline mutations:

  • Present in every cell of the body
  • Heritable — can be passed to offspring
  • Detectable from any tissue (blood, saliva, buccal swab)
  • Associated with inherited disease risk (BRCA1/2 mutations, cystic fibrosis alleles)

Somatic mutations:

  • Present only in the clone of cells descended from the mutant cell
  • Not heritable
  • Require sequencing of the affected tissue (e.g., tumor biopsy) — not detectable from blood unless they've shed into circulation (liquid biopsy)
  • The primary cause of cancer

Cancer as a somatic disease: Cancer requires the accumulation of somatic mutations in specific genes — typically tumor suppressors (which normally restrain growth) and oncogenes (which promote it). The two-hit hypothesis (Alfred Knudson, 1971) proposed that cancer requires inactivation of both copies of a tumor suppressor. In hereditary cancer syndromes like BRCA1/2-associated cancer, patients inherit one damaged copy (germline mutation) and only need one additional somatic "hit" to inactivate the second copy — which is why they develop cancer at higher rates and earlier ages.

---

OMIM — the clinical reference for genetic disease

OMIM (Online Mendelian Inheritance in Man) is the primary curated database of human genes and genetic disorders. Every clinical geneticist, genetic counselor, and genomics researcher uses it.

The two-hit hypothesis Two paths to losing both copies of a tumor suppressor. Sporadic cancer: a cell born with two good copies must acquire two separate somatic hits before cancer. Hereditary cancer: a person inherits one already-broken copy in every cell, so only one additional somatic hit is needed, which is why hereditary cancer appears earlier and at higher rates. The two-hit hypothesis: why inherited cancer risk starts a step aheadSporadicneeds two somatic hitsborn with 2 good copies1st somatic hit2nd hit → cancerHereditaryinherits 1st hit — only needs 1 moreborn with 1 bad copy1 somatic hit → cancerearlier, higher risk

OMIM entries exist for:

  • Genes: Describing a gene's function, known mutations, and associated phenotypes
  • Phenotypes: Describing a disease, its inheritance pattern, molecular basis, and clinical features

Each entry has a unique OMIM number. When you see a number like OMIM #219700 in a genetics paper, it refers to cystic fibrosis (caused by CFTR mutations).

How OMIM categorizes inheritance:

  • # (number sign): Phenotype with known molecular basis
  • %: Phenotype with unknown molecular basis
  • +: Gene with known sequence and role in phenotype
  • \*: Gene with known sequence but no known disease association

OMIM is the first place a clinical geneticist goes when a patient's sequencing report flags a variant in an unfamiliar gene. Is there disease associated with this gene? What inheritance pattern? What are the reported pathogenic variants? What is the phenotype?

The database is maintained by Johns Hopkins and is freely accessible at omim.org.

---

ClinVar — the variant classification database

Once you know a variant exists in a gene associated with disease, you need to know whether that specific variant has been reported as pathogenic. This is where ClinVar comes in.

ClinVar is a public database maintained by NCBI that aggregates variant interpretations submitted by clinical labs and research groups worldwide. Each variant entry includes:

  • The gene and genomic coordinates
  • The type of variant (missense, nonsense, etc.)
  • The clinical significance classification (see below)
  • Which labs submitted interpretations and whether they agree

The five-tier classification system (ACMG/AMP guidelines):

  1. Pathogenic: Sufficient evidence that the variant causes disease
  2. Likely Pathogenic: Strong evidence, but not definitive
  3. Variant of Uncertain Significance (VUS): Insufficient evidence to classify
  4. Likely Benign: Strong evidence of no disease association
  5. Benign: Sufficient evidence of no disease association

VUS is the most clinically frustrating category — and it's the most common outcome of clinical sequencing for patients of non-European ancestry, for reasons you learned in Module 4. A VUS cannot be acted on clinically. It just means "we don't know yet." Families who receive VUS results often wait years for reclassification as databases grow.

---

Check yourself

1. A patient has a heterozygous nonsense variant in NF1 (pLI = 1.0). Their parent is unaffected and does not carry the variant. What type of mutation event most likely explains this? What disease would you expect and why does one non-functional copy cause it?

2. A missense variant in TP53 is found in a tumor biopsy at 45% variant allele frequency (VAF) but is absent from the patient's blood. Is this a germline or somatic variant? What does 45% VAF suggest about when the mutation arose in tumor development?

3. Two siblings both have cystic fibrosis. Their mother carries one CFTR variant (heterozygous) and is healthy. Their father's sequencing report returns a VUS in CFTR — not a known pathogenic variant. What is the most likely explanation for the siblings' disease, and what experiment would you run to resolve the VUS?

4. A drug target is a protein that works as a homodimer (two identical subunits). A patient's tumor has a dominant negative mutation in one allele. A drug designed to inhibit the protein's enzymatic activity is being considered. Why might this drug be particularly effective — or ineffective — in this patient?

---

Key facts to remember

  • Most variants are benign: synonymous changes, variants in non-functional sequence, common variants with small effects
  • LOF mechanisms: nonsense-mediated decay, missense at active site, splice disruption, deletion
  • Haploinsufficiency: 50% protein output insufficient; identified by high pLI in gnomAD
  • GOF mechanisms: constitutive activation (KRAS), dominant negative (TP53), toxic aggregation (huntingtin)
  • De novo dominant mutations are a major cause of severe pediatric genetic disease
  • Germline = every cell, heritable; somatic = clone only, not heritable, cancer driver
  • Two-hit hypothesis: inherited germline hit + somatic second hit = hereditary cancer
  • OMIM: curated gene/disease database; ClinVar: variant classification database (Pathogenic → Benign, VUS in between)
  • VUS inflation disproportionately affects non-European patients (Module 4 connection)

---

Primary sources & references
  • Knudson, A. G. (1971). "Mutation and cancer: statistical study of retinoblastoma." PNAS, 68, 820–823.
  • Richards, S. et al. (2015). "Standards and guidelines for the interpretation of sequence variants." Genetics in Medicine, 17, 405–424. (ACMG/AMP classification framework)
  • Lek, M. et al. (2016). "Analysis of protein-coding genetic variation in 60,706 humans." Nature, 536, 285–291. (ExAC; precursor to gnomAD, established pLI)
  • Landrum, M. J. et al. (2016). "ClinVar: public archive of interpretations of clinically relevant variants." Nucleic Acids Research, 44, D862–D868.
  • McKusick, V. A. (2007). "Mendelian Inheritance in Man and its online version, OMIM." American Journal of Human Genetics, 80, 588–604.
  • Alexandrov, L. B. et al. (2020). "The repertoire of mutational signatures in human cancer." Nature, 578, 94–101.