🧰 Quantile-Quantile Plots (QQ-plots)


📈Q-Q plot is an essential tool for detecting problems (such as unrecognized population structure, analytical approach, genotyping artifacts, etc.) in a Genome-wide association study (GWAS).
Introduction
A Quantile-Quantile (QQ) plot (in general) plots the observed quantiles of one distribution versus another, OR plots the observed quantiles of a distribution versus the quantiles of the ideal distribution.
In GWAS, we use a QQ plot to plot our the quantile distribution of observed p-values (on the y-axis) versus the quantile distribution of expected p-values. In an ideal situation, where there ARE NO causal polymorphisms, the QQ-plot will be a line.
The reason is that we will observe a uniform distribution of p-values from such a case and in our QQ, we are plotting this observed distribution of p-value versus the expected distribution of p-values: a uniform distribution (where both have been -log transformed).
** Note that if your GWAS analysis is correct but you do not have enough power to detect positions of causal polymorphisms, this will also be your result (!!)-> it is a way to assess whether you can detect any hits in your GWAS.
To plot a QQ-plot
One way to do it is by using the qqman package in R.
install.packages('qqman')
library(qqman)
qq(result$PVALUE, main = "QQ Plot of {{Project}}")
Lambda (λ)
When making a QQ-plot, it is important to calculate lambda (also called the genomic inflation factor, often written as λGC).
- λ quantifies how much the observed test statistics deviate from what you’d expect under the null hypothesis (i.e., no association). It helps assess whether your test statistics are inflated due to technical or population structure issues.
- Detect inflations or deflations of P-values
📈 If your QQ plot shows a systematic upward curve and λ is »1 (e.g., 1.2 or higher), it suggests inflation, possibly from:
- Population stratification
- Cryptic relatedness
- Batch effects
- Genotyping errors
📈 If λ is <1, it might signal deflation, often due to:
- Overcorrection
- Conservative test statistics
- Sparse data
chisq <- qchisq(1 - result$PVALUE, df = 1)
lambda <- median(chisq, na.rm = TRUE) / qchisq(0.5, df = 1)
legend("topleft", legend = bquote(lambda == .(round(lambda, 3))), bty = "n")
To read more, see GitHub repository
References
- Statistical Horizons
- Ehret GB, Curr Hypertens Rep. 2010 Feb;12(1):17–25. doi: 10.1007/s11906-009-0086-6