Table of Contents
DNA methylation #
DNA Methylation is a most stable maker in epigenetics to access causative landscape across phenotypes and the most accepted technique from the epigenetics research community. This is a fast growing filed to explore number of devastating consequences caused by methylation in various diseases such as cancer, lupus, muscular dystrophy, and a range of birth defects. It is frequently described as a “silencing” epigenetic mark and the machanism were discoved by function of 5-methylcytosine which was originally proposed in 1970’s. The development of high-throughput techniques such as Chip and next generations sequencing and facilitated the genome with screening and quantitative analysis of methylation sites. Based on CpG locus, the functions also varies, which are clearly reviewed by Peter A. Jones . (Figure 1 & Figure 2) 1
Figure 1. Molecular anatomy of CpG(methylation) sites in chromatin and their roles in gene expression. About 60% of human genes have CpG islands (CGIs) at their promoters and frequently have nucleosome-depleted regions (NDRs) at the transcriptional start site (TSS). The nucleosomes flanking the TSS are marked by tri methylation of histone H3 at lysine 4 (H3K4me3), which is associated with active transcription, and the histone variant H2A.Z, which is antagonistic to DNA methyltransferases (DNMTs). Downstream of the TSS, the DNA is mostly CpG-depleted and is predominantly methylated in repetitive elements and in gene bodies. CGIs, which are sometimes located in gene bodies, mostly remain un methylated but occasionally acquire 5‑methylcytosine (5mC) in a tissue-specific manner (not shown). Transcription elongation, unlike initiation, is not blocked by gene body methylation, and variable methylation may be involved in controlling splicing. Gene bodies are preferential sites of methylation in the context CHG (where H is A, C or T) in embryonic stem cells5, but the function is not understood (not shown). DNA methylation is maintained by DNMT1 and also by DNMT3A and/or DNMT3B, which are bound to nucleosomes containing methylated DNA99. Enhancers tend to be CpG-poor and show incomplete methylation, suggesting a dynamic process of methylation or demethylation occurs, perhaps owing to the presence of ten-eleven translocation (TET) proteins in these regions, although this remains to be shown. They also have NDRs, and the flanking nucleosomes have the signature H3K4me1 mark and also the histone variant H2A.Z32, 100. The binding of proteins such as CTCF to insulators can be blocked by methylation of their non-CGI recognition sequences, thus leading to altered regulation of gene expression, but the generality of this needs further exploration. The sites flanking the CTCF sites are strongly nucleosome-depleted, and the flanking nucleosomes show a remarkable degree of phasing.
Figure 2. Silencing precedes DNA methylation. Active promoters and enhancers have nucleosome depleted regions (NDRs) that are often occupied by transcription factors and chromatin re-modelers. Loss of factor binding — for example, during differentiation — leads to increased nucleosome occupancy of the regulatory region, providing a substrate for de novo DNA methylation. DNA methylation subsequently provides added stability to the silent state and is likely to be a mechanism for more accurate epigenetic inheritance during cell division. The example given is for the OCT4 and NANOG genes45, and its generality is not yet known, but inactive genes are often more susceptible to de novo methylation than their more active counterparts. In the figure, OCT4 binding is shown and NANOG binding is not shown, although its expression is required. Recent experiments have demonstrated that the methylation must be removed by active and/or passive processes to reactivate the gene. DNMT3A, DNA methyltransferase 3A; siRNA, small interfering RNA.
Data processing #
Illumina has developed the Infinium Human Methylation microarray assay, which offers a cost-effective, high throughput method for quantitatively assessing methylation across the genome. The initial HumanMethylation27 (27K) BeadChip interrogated 27,578 CpG sites associated with 14,495 protein-coding gene promoters. The more recent HumanMethylation450 (450K) BeadChip assays DNA methylation at 482,421 CpG sites, including 90% of the sites on the 27K array 2 (Figure 3).
This array includes different technology in same Chip (Infinium I and InfiniumII). So, the data process need a special attention and needed different normalization techniques rather than other steps followed for microarray Chip analysis. Generally the micro array chips need to normalize with quantile normalization, in the case of 450K BeadChips different normalization techniques were developed and deployed in R packages.There are several R packages were established for analysis the 450K BeadChip, those are methylumi, minifi, IMA, watermelon and RnBeads. Basically DNA methylation at the specific CpG site is calculated as β = M/(M + U + α), here the M and U are methylated and unmethylated signal intensities and α is an arbitrary offset intended to stabilize β values where fluorescent intensities are low and another alternative index is M = log2((M + α)/(U + α)) . It is just a log transformation of β. To normalize those beta intensities, there are different methods were developed. i.e. Beta Mixture Quantile dilation (BMIQ) , subset quantile normalization (SQN and SWAN) [5, 6], and Peak-Based correction (PBC)  with are included in those R packages.(Figure 4)
Among those R packeges, [RnBeads] (http://rnbeads.mpi-inf.mpg.de/ ) attains special attention to analysis the BeadChip data with the publication stranded plots and included with most normalization methods. RnBeads is an R package for comprehensive analysis of DNA methylation data which are obtained with any experimental protocol that provides single-CpG resolution, including Infinium 450K microarray and bisulfite sequencing protocols, but also MeDIP-seq and MBD-seq once the data have been preprocessed with DNA methylation inference software. RnBeads implements an analysis workflow that is significantly more comprehensive than those of existing tools. It documents its results in a highly annotated and readable hypertext report, and it scales to the large sample sizes that are becoming the norm for DNA methylation analysis in human cohorts. This page has given the overview of DNA methylation analysis.
- Jones PA: Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nature reviews Genetics 2012, 13(7):484-492.
- Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, Esteller M: Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics : official journal of the DNA Methylation Society 2011, 6(6):692-702.
- Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, Lin SM: Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC bioinformatics 2010, 11:587.
- Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D, Beck S: A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics 2013, 29(2):189-196.
- Touleimat N, Tost J: Complete pipeline for Infinium((R)) Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Epigenomics 2012, 4(3):325-341.
- Maksimovic J, Gordon L, Oshlack A: SWAN: Subset-quantile within array normalization for illumina infinium HumanMethylation450 BeadChips. Genome biology 2012, 13(6):R44.
- Dedeurwaerder S, Defrance M, Calonne E, Denis H, Sotiriou C, Fuks F: Evaluation of the Infinium Methylation 450K technology. Epigenomics 2011, 3(6):771-784.