Identification and freedom to operate analysis of potential genes for drought tolerance in maize

How can an Institution go from smooth running to total chaos in just minutes as…. For each test sample, after exclusion of the most expressed genes and the genes with the largest log ratios, TMM is computed as the weighted mean of log ratios between this test and the reference. If it is not, its value provides an estimate of the correction factor that must be applied to the library sizes [ 21 ]. This normalization method is implemented in the edgeR Bioconductor package as the default normalization method [ 33 ]. For a given sample, the RLE scaling factor is calculated as the median of the ratio, for each gene, of its read counts over its geometric mean across all samples.

By assuming most genes are not DE, the median of the ratio for a given sample is used as a correction factor to all read counts to fulfill this hypothesis [ 34 ].

Across sample normalization As the library size normalization methods mostly correct for sequencing depth and fail to adjust for other technical variations, across sample normalization methods have been proposed to correct for other technical artifacts to improve data quality and ability to detect biologically relevant genes. Known technical artifact In contrast to the more complex modeling methods is the approach involving the direct adjustment for known technical artifacts within the appropriate statistical model e. Unknown technical artifact Recently, normalization methods have been developed to assess and remove unknown technical variations by estimating the latent factors to capture these sources of variation.

Remove Unwanted Variation RUV : Under this approach, the factors of unknown technical variations are estimated and removed by performing the factor analysis on suitable sets of negative control genes or samples by keeping the primary factor of interest. However, RUVr i. SVD is then computed on the residual matrix to estimate the factors of unknown technical variations. The number of factors of unwanted variation, k, should be guided by considerations that include samples sizes, extent of technical effects captured by the first k factors, and extent of differential expression [ 17 , 25 ].

Once the number of SVs is calculated, then using the two-step algorithm following Leek and Storey [ 18 ] to estimate unknown technical artifacts. Principal Component Analysis PCA : This approach is completed by applying SVD to the scaled residual matrix to estimate the factors of unknown technical variations [ 44 ].

One can determine the number of PCs to include in the model by multiple methods, including: PCs that explain a given percent of the variation; PCs that are associated with the biological factors of interest i. Issues of loss of degrees of freedom For practical purposes it is more convenient to perform downstream analyses on the batch adjusted or normalized data without further consideration of technical artifacts effects.

Comparison of methods TCGA cervical study. Simulation study. Library size normalization.

Fig 2. Across sample normalization. Fig 4. Assess the impact of correctly accounting for loss of degrees of freedom due to normalization using workflow 3. Simulation study To compare the performance of different methods in estimating technical artifacts, firstly we calculated the percentage of correctly estimated number of SVs. Fig 5. Fig 6.

Fig 7. Discussion Despite, some studies showing that RNA-Seq data do not need complicated normalization [ 2 ], in practice normalization has been shown to have a great influence on the analysis of gene expression data generated using RNA-Seq technology. Conclusion We recommend the assessment and, if required, the use of across sample normalization methods, in addition to the library size normalization, in the analysis of RNA-Seq data.

