I decided to write this blog post to accompany my appearance on the Insight Podcast, where I talked about heritability, how much of it is ‘missing’, and what the the recent preprint from Peter Visscher’s group has shown.
Heritability is the fraction of trait variation in a population due to genetic inheritance (see here). As I have outlined previously, heritability analysis has its origins in the work of Victorian polymath and cousin of Charles Darwin, Francis Galton. However, heritability analysis only became a sophisticated field of scientific inquiry in humans with the advent of twin studies. The most important twin study design compares the phenotypic similarity (correlation) of identical (monozygotic) twins to non-identical (dizygotic) twins. Since monozygotic twins are genetically identical, whereas non-identical twins are only half identical on average, greater similarity of identical over non-identical twins is evidence for a contribution of genetic variation to trait variation. However, the twin design makes several assumptions, most importantly that there is no greater environmental similarity of identical over non-identical twins. Whether twin studies are overestimating heritability for human traits, especially social and behavioural traits, remains controversial (Felson 2014).
With the invention of cheap genotyping technologies, it became possible to measure hundreds of thousands of genome-wide genetic variants for thousands of individuals. The natural question then became: can we construct a model using specific genetic variants that explains the heritability estimated from twin studies.
This was the dawn of the genome-wide association study (GWAS) era, around the year 2007, when I was finishing high school. In the early days of GWAS studies, the sample sizes were small and only a fraction of the genetic variation in a population was captured by the genotyping arrays, so the studies only had the power to identify common genetic variants with relatively strong effects. The amount of trait variation that these variants explained was typically only a small fraction of the heritability estimated by twin studies. For height, by 2010 around 40 variants had been identified that collectively explained around 5% of the variation in height, compared to a twin heritability of around 80%. This gap became labelled ‘the problem of missing heritability’ (Manolio et al., 2009) and created quite a fuss around ten years ago.
Many different explanations for the ‘missing heritability’ have been proposed (Eichler et al., 2010). I will focus on two: 1) that complex traits are highly polygenic and affected by many rare variants; 2) that twin studies have overestimated heritability. Note that these are not mutually incompatible explanations. The idea behind 1) was that GWAS studies were not well powered to detect the many genetic variants with weak effects on a trait like height, and the genotyping array technologies were not capturing the rare genetic variants that may explain a substantial fraction of the heritability (see above Figure). The idea behind 2) was that twin studies were overestimating heritability, perhaps due to genetic interactions, gene-environment interactions, or violation of twin studies assumptions about the environment, so that less heritability was in fact missing.
Perhaps the most influential paper in the missing heritability debate came from Peter Visscher’s group and was published in Nature Genetics in 2010 (Yang et al. 2010). Previously, people had compared twin heritability estimates to the variance explained by specific genetic variants that had been found to affect height, which was around 5% in 2010. This paper used a technique that later became called ‘GREML’ to measure the height variation explained by all of the genetic variation captured on the genotyping array, which measured ~250k genetic variants. They estimated that the genetic variation captured on the array explained around 45% of the variability in height in the population. This was a lot more than explained by the variants then known to affect height (explaining around 5%), and it suggested that many common genetic variants with weak effects contribute to height variation. However, there was still a gap between the 45% estimate from the genotyping array SNPs and the 70-80% heritability estimated from twin studies. Since the genotyping array used in that study mainly captured common genetic variations, it was plausible that some of the gap between the 45% number and 70-80% number from twin studies was due to rare genetic variants that were not well captured on the genotyping array.
In 2015, the GREML methodology was extended to include rare genetic variations inferred by a statistical procedure called imputation (Yang et al. 2015). This increased the estimated heritability for height from 45% to 55%. While imputation allows you to infer rare variants with decent accuracy down to a fairly low frequency (say 0.1%), in most cases it cannot infer very rare genetic variations (<0.1%) with high accuracy. The question then remained: was the ~80% number from twin studies too high, or do very rare variants that cannot be imputed accurately explain the gap?
A preprint from Peter Visscher’s group was posted on biorxiv recently, ‘Recovery of trait heritability from whole genome sequence data’ , based on an impressive dataset of ~20,000 individuals of European ancestry with whole genome sequence data covering 47.1 million genetic variants. This paper is a sign of the kind of data we will have in the future: large samples with high coverage whole genome sequence data. What is exciting about such samples is that they present a close to complete picture of genetic variation in a sample. Does the genetic variation captured by such samples explain all of the heritability estimated from twin studies? That is one of the important questions that this paper addresses.
This paper is an important advance in the detail of quantitative genetic analysis. It gives us an idea of the value of large, whole genome-sequenced datasets for understanding the genetic architecture of complex traits such as height. The analysis presents evidence that rare variants that are not well imputed (typically below 0.1% minor allele frequency) contribute significantly to the heritable variation for height and BMI. This is an important result, showing that significant gains in genetic prediction ability could, in theory, be reached through whole genome sequence data. The paper also presents evidence for an outsized contribution from low frequency protein altering variants, especially those in low correlation with other variants in the genome (low LD). This has implications for mapping of particular genes affecting height and BMI and likely other complex traits.
The study also estimates the total heritability explained by the 47.1 million variants for height and BMI. The point estimates for height (0.79) and BMI (0.40) are in line with twin estimates, suggesting that the full heritability estimated by twin studies is waiting to be unlocked by whole genome sequence data. This is an important result if true. However, I have several concerns that give me pause before accepting that conclusion. First, a lack of precision in the heritability estimates. Second, analysis of rare variants may be particularly sensitive to non-normality of trait distribution. Third, it is not clear whether violation of methodological assumptions about the effect size distribution may be problematic for very rare variants measured through whole genome sequencing. Fourth, I have doubts about whether the methods they employ to deal with stratification are adequate, especially for rare variants.
First, the heritability estimates from whole genome sequence data are not very precise, which precludes any strong conclusions being drawn. While the point estimate for the heritability of height is high (79%), the results in this paper imply, with 95% confidence, that the heritability of height is above 61% and below 97%. The upper and lower ends of this range would be outside of the range of most twin estimates. Similarly, for BMI, the estimate of heritability is 40% with a standard error of 9%, which implies, with 95% confidence, that the heritability is between 22% and 57%. This is a wide range, parts of which are compatible with twin studies, although some have estimated the heritability of BMI at over 60% (Elks et al., 2012).
Second, if a rare variant happens by chance to be carried by a few individuals who are outliers for height or BMI (very tall/short or very fat/thin), then this could lead to overestimation of the effect of that variant. The methodology employed assumes a normal trait distribution, so outlying individuals in the tails of the distribution can have an outsized influence on the estimated heritability if the trait is not transformed to be normal. It appears on the basis of Supplementary Figure 2 that the traits analysed for the main results were not transformed to have a normal distribution. Evidence that this may have led to overestimation of heritability in this paper is that the estimate for the heritability of BMI when BMI is transformed to a normal distribution is only 23% to 25% (see Supplementary Figure 8) compared to 40% for the untransformed phenotype (Supplementary Figure 2E). Furthermore, it appears that the height trait analysed has heavy tails compared to a normal distribution (Supplementary Figure 2D), so this may have also lead to overestimation of heritability from rare variants for height. However, I don’t think that the authors present results for height transformed to follow a normal distribution, so this is hard to assess.
Third, the methodology used assumes that effect sizes are normally distributed within each bin, where the variants have been divided into bins based upon their frequency and on the strength of their correlations with other variants in the genome (LD). A recent preprint makes a case that this method can lead to biased estimates for common variants (Hou et al., 2019). For rare variants, it could be more problematic, as one expects that the heritability from rare variants has a large contribution from rare variants with large effects. The distribution of effect sizes among rare variants are therefore likely to deviate strongly from the modelling assumptions of the method employed.
The fourth concern is population stratification. Population stratification occurs when two genetically distinct subpopulations have different mean trait values. This implies that any genetic variant that is differentiated between these subpopulations (which is usually due to chance, ‘genetic drift’) will be correlated with the trait even though it has no causal effect on the trait. An example would be that, in a mixed sample of Dutch and Italians, any genetic variant that is at a higher frequency in the Dutch than the Italians would be correlated with height simply because Dutch are taller than Italians on average. Recently, two papers appeared in e-life along with an editorial showing that stratification has affected genome-wide association studies of height, leading to spurious inference of selection leading to greater height in northern vs southern Europe. This work has led people to realise that one of the standard methods to control for stratification in genetic studies, principal components analysis (PCA), is inadequate in some cases. (I would recommend everyone to watch this brilliant lecture by Alex Bloemendal and Christina Chen to understand why PCA can fail to adequately control for stratification in real genetic data.) The pre-print from Peter Visscher’s group uses PCA to control for stratification, but this is unlikely to be fully effective, especially for the very rare variants used in their analysis. I expect that the principal components inferred from millions of very rare variants in a sample of only 20,000 may be very noisy, and therefore not fully effective at controlling for stratification, which can be subtle for rare variants.
Furthermore, in my own research, I have seen evidence that PCA is not fully effective at controlling for stratification in GREML type methods. In a paper published in Nature Genetics last year (Young et al. 2018, explained in this blog post), I described a novel method for estimating heritability, relatedness disequilibrium regression (RDR), that takes advantage of the randomisation of genetic material during the production of sperm and eggs to estimate heritability in a way that has negligible bias due to environmental effects and population stratification. In this paper, I found evidence that the heritability of height was being overestimated by the GREML-type method employed in the preprint (see Figure 2b of Young et al. 2018), and I suspect that this is in part due to population stratification that has not been properly controlled for by PCA.
It is also worth looking at estimates of the heritability of height from other, more robust methods. Beyond RDR, there is a method invented by Peter Visscher in 2006, Sib-Regression (Visscher et al. 2006), that also exploits to randomisation of genetic material in families to estimate heritability in a way that is robust to population stratification. Using RDR, I estimated the heritability of height in Iceland to be 55% with a standard error of 4%. I also estimated the heritability of height using Sib-Regression to be 68% with a standard error of 9.6%. Combining this estimate with a previous Sib-Regression estimate (Hemani et al. 2013), one obtains a combined estimate of 68% for the heritability of height with a standard error of 7.9%. The Sib-Regression estimate is not inconsistent with the whole genome sequence estimate, but together, the RDR and Sib-Regression estimates suggest that the heritability may be on the lower side of the range of the whole genome sequence estimate.
Furthermore, if one compares RDR and Sib-Regression estimates to twin estimates across traits, there is clearly a gap (see Figures below). This implies that there may still be some ‘missing heritability’, in that there is still a gap between heritability estimated from robust genomic methods (RDR, Sib-Regression) and twin estimates. However, an important caveat is that we do not yet have samples where we can obtain precise heritability estimates from twins and from RDR or Sib-Regression, which would be the fairest and best comparison.
In conclusion, the ‘missing heritability’ problem has not been solved by this preprint, although it does represent an important advance in the detail of quantitative genetic analysis. Perhaps my concerns about the methodology can be assuaged for a trait like height through further analyses, and larger sample sizes will provide a more precise answer, which would be very interesting. However, my own work has shown that the GREML approach employed in this preprint can lead to large overestimation of heritability for traits like educational attainment, where indirect genetic effects from relatives play a significant role (see previous blog). Solving the problem of missing heritability for traits like educational attainment is therefore likely to be more challenging.
I believe we will only truly settle this debate when we have large, genotyped samples including lots of parents and offspring or siblings, allowing precise estimates with robust methods such as RDR and Sib-Regression. However, this would only settle the debate over whether twin estimates of heritability are correct. A deeper solution to the problem of missing heritability would come from building genetic predictors that can predict as well as the heritability suggests is possible, and ideally this would come also with a deeper understanding of genetic architecture and causal pathways. For this, the kind of data and analyses presented in this paper represent an important step in that direction.