Bioinformatic methods

A Systems Biology Approach to Deciphering Cell-to-Cell Communication and Molecular Transport in Plants:
Shaping Cell Fate and Organogenesis

RNA Isolation: Apical tips (approximately 1 cm, ~100 mg) will be manually harvested into tubes and immediately frozen in liquid nitrogen. RNA will be isolated using an RNeasy Mini Plant Kit (Qiagen), and DNA residues will be cleaned using the Turbo DNAse Kit (Invitrogen). Both processes will be conducted according to the manufacturers' protocols. The quality of the RNA samples will be verified using the NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific) and RNA gel electrophoresis.

RNA-Sequencing (RNA-Seq):  will be done in three biological replicates by the Illumina NGS service of Macrogen (Macrogen Europe B.V., Netherlands). We will perform multiplex sequencing of four samples per lane, using paired-end sequencing with read lengths of 2 × 101 base pairs each. Raw reads will be subjected to a filtering and cleaning procedure. The Trimmomatic tool (Bolger et al., 2014) will be used to remove Illumina adapters from the reads. Subsequently, the FASTX Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/index.html, version 0.0.13.2) will be employed to trim read-end nucleotides with quality scores below 30 using the FASTQ Quality Trimmer and to remove reads with less than 70% base pair alignment using the using the FASTQ Quality Filter.

Clean reads will be mapped to the reference genome of Cannabis sativa (https://www.ncbi.nlm.nih.gov/assembly/GCA_900626175.2) and (GCA_013030025.1) for male plants using STAR software (Dobin et al., 2013; v. 2.7.1a).

We will use Bowtie2 and TopHat2 for aligning these sequences, in conjunction with Cufflinks (Trapnell et al., 2010, v. 2.2) to efficiently assemble the RNA transcripts. Gene annotations from the NCBI will be incorporated to enhance our understanding of these transcripts. This integrated approach is designed to accurately assemble the transcripts and to provide a precise estimation of their abundances.

Next, we will use the edgeR (5) and/or DESeq2 (6) packages in the R programming environment for differential gene expression analysis. This will enable us to identify genes that show statistically significant changes in expression between different IFMs samples. The robust statistical methods provided by these packages will help in controlling for false discovery rates and normalizing for library size variations. Additionally, we'll leverage the graphical capabilities of R to visualize the results, facilitating easier interpretation and presentation of the data.

PCA analysis and Heatmap visualization will be performed using R Bioconductor (Gentleman et al., 2004). Cluster analysis of the differentially expressed transcripts based on the FPKM value, will be conducted using Expander 7 software (Ulitsky et al. 2010) with the K-means algorithm (Shamir et al. 2005).

Bioinformatics analysis and integration of all data set into a model

For each next-generation sequencing analysis, we are doing the following: quality check and cleaning of the reads with FastQC (1) and cutadapt (2) programs followed by alignment of the reads to a reference genome using TopHat2 (3) and Bowtie2 (4) for RNA-seq and edgeR (5) and/or DEseq2 (6) packages in R program for ChIP-seq.

For visualization of the data we are using IGV (10).

For mRNA-seq:

To obtain a broad view on the differences between the transcript profiles of two different samples, a Principal Component Analysis (PCA) is conducted

To do Gene Ontology analysis we are using the BiNGO application on Cytoscape platform (11)

C

 References:

 1.            Andrews, S., FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc, 2010.

2.            Marcel, M., Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 2011. 17(1): p. 10-12.

3.            Kim, D., et al., TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology  2013. 14(4)

4.            Langmead, B. and S.L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nature Methods, . 9(4): p. 357-U54

5.            Robinson, M.D., D.J. McCarthy, and G.K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 2010. 26(1): p. 139-140.

6.         Love, M.I., W. Huber, and S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 2014. 15(12).

7.         Zhang, Y., et al., Model-based Analysis of ChIP-Seq (MACS). Genome Biology, 2008. 9(9)

8.         Quinlan, A.R. and I.M. Hall, BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 2010. 26(6): p. 841-842.

9.         Ramirez, F., et al., deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research, 2016. 44(W1): p. W160-W165.

10.       Thorvaldsdottir, H., J.T. Robinson, and J.P. Mesirov, Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics, 2013. 14(2): p. 178-192.

11.         Maere S, Heymans K, Kuiper M. BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics, 2005. 21(16):p.3448-9

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

bioinformatics_analysis_and_integration_of_all_data_set_into_a_model.docx120 KB