## 统计代写|主成分分析代写Principal Component Analysis代考|PCA for metabolomics data

Humic acids are one of the major chemical components of humic substances, which are the major organic constituents of soil (humus), peat, coal, many upland streams, dystrophic lakes, and ocean water. They are produced by biodegradation of dead organic matter. They are not a single acid; rather they are a complex mixture of many different acids containing carboxyl and phenolate groups so that the mixture behaves functionally as a dibasic acid or, occasionally, as a tribasic acid. Humic acids can form complexes with ions that are commonly found in the environment, creating humic colloids. Humic and fulvic acids (fulvic acids are humic acids of lower molecular weight and higher oxygen content than other humic acids) are commonly used as a soil supplement in agriculture, and less commonly as a human nutritional supplement. Humic and fulvic acids are considered as soil bioindicators and reflect an equilibrium between living organic and non-organic matters.

Mass spectrometry has been used to estimate signature analytes and patterns specific to some soils (Mugo \& Bottaro, 2004). Fulvic acids were prepared from a soil using different extraction protocols resulting in 5 samples, H1, H1H2, EVM1, EVM2 and EAA. Are these extraction protocols similar and which analytes are they extracting more efficiently? MALDI MS spectra from 150 to $1500 \mathrm{~m} / \mathrm{z}$ range were recorded in the presence of the MALDI matrix alpha-cyano-4-hydroxycinammic acid (CHCA). Normalization of intensities were done with the $379 \mathrm{~m} / \mathrm{z}$ analyte in common to these samples, and Pareto scaling was chosen during the alignment process performed by MarkerView. Figure $9 \mathrm{~A}$ shows that the $\mathrm{PCA}$ analysis reveals poor separation of samples with $\mathrm{PC}{1}$ explaining $25.8 \%, \mathrm{PC}{2} 18.7 \%$ and $\mathrm{PC}_{3} 14.8 \%$. of variability (a total of $59.3 \%$ captured). Samples are not so well separated by the first $\mathrm{PC}$ axis, demonstrating the large influence of factors other than soil extraction differences (chemical precipitation, physical precipitation, filtration). Discriminant Analysis associated with PCA (supervised PCA-DA) was attempted to further separate these known 5 groups (Figure 9B). This supervised technique means that it uses class information based on the assigned sample group to improve their separation. Figure $9 \mathrm{~B}$ shows a dramatic improved separation but this may be based on noise. Peaks which are randomly more intense in one group as compared to another can possibly influence the results, and careful examination of loading plots as well as analyte profiles across the samples is necessary to avoid batch effects. This analysis is also affected by samples incorrectly assigned to wrong group and outliers.

## 统计代写|主成分分析代写Principal Component Analysis代考|What are batch effects

Batch effect is one overlooked complication with “omics” studies and occurs because highthroughput measurements are affected by multiple factors other than the primary tested biological conditions (Leek et al, 2010; Leek \& Storey, 2008). These factors are included in a comprehensive list among which are laboratory conditions, reagents batches, highly trained personnel differences, and hardware maintenance. Batch effect becomes a problem when these conditions vary during the course of an experiment, and it becomes a major problem when the various batch effects are possibly correlated with an outcome of interest and lead to incorrect conclusions (Ransohoff, 2005; Baggerly, et al., 2004). Batch effects are defined as a sub-group of measurements that have qualitatively different behaviors across conditions and are primarily unrelated to the biological or scientific variables under study. Typical batch effect is seen when all samples of a certain group are measured first, and when all samples of a second group are measured next. Batch effect occurs too when a particular batch of reagent (ex: Taq polymerase enzyme for PCR experiments) is used with all samples of the first group, and another reagent batch is used with all samples of the second group. Typical batch effects are also seen when an experimentalist/technician acquires all samples from the first group and a different experimentalist/technician works with the other group or when the instrument’s characteristics (example for MALDI mass spectrometry: laser or detector replacements) used to acquire the data have been deeply modified. Data normalization generally does not remove batch effect unless normalization takes into account the study design or takes into account the existence of a batch problem.

