| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |




From the Department of Pathology
* and Center for Pathology Informatics,
University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
It is believed that the major causes for platform-dependent differences in gene expression are attributable to variations in array design, probe deposition, probe sequence, and gene annotation.4 Although these are major causes of variability in gene expression data, there are other methodological differences that can introduce minor but systematic biases. In this context, little attention has been paid to methodological differences such as the amplification and labeling reactions of different manufacturers. Linear, high-fidelity amplification is critical because it ensures accurate replication of the size, distribution, and complexity of the initial mRNA population. Several studies have suggested that systematic biases are introduced by variations in amplification technique that could impact expression results regardless of the choice of array platform.5, 6 These results challenge the common underlying assumption that representation of transcripts in a sample remains unchanged by the amplification and labeling protocols used before hybridization.
The most widely used RNA amplification and labeling technique presently in use is the T7-based method developed by Gelder and colleagues (Eberwine method).7 A growing number of T7-based amplification systems are now commercially available, and most incorporate modifications from the original technique. The goal of the present study is to specifically test the effect of variations in amplification and labeling protocols on gene expression results. To achieve this goal, we compare three widely used, commercially available target amplification methods.8, 9 We delineate the variation introduced by each one and determine its potential impact on gene expression data.
| Materials and Methods |
|---|
|
|
|---|
Target Preparation Methods
Methods compared in this study will be described briefly in this section. For details readers are referred to the manufacturers manuals and selected references.8, 9, 10, 11
Table 1
summarizes the major differences and similarities among the three target labeling kits utilized.
|
First-Strand and Second-Strand cDNA Synthesis: All reagents are from Invitrogen Corp. unless otherwise specified. Recommended amounts of total RNA (Table 1)
in 8 µl of nuclease-free water were spiked with 2 µl of diluted poly(A) RNA control (Affymetrix, Santa Clara, CA) and then incubated with 2 µl of 50 µmol/L T7-Oligo (dT)24 primer (Affymetrix) at 70°C for 10 minutes and cooled on ice. Poly(A) RNA controls were diluted to appropriate concentrations immediately before performing the experiment to maintain the same proportionate final concentration of the spike-in controls to the total RNA. First-strand cDNA was synthesized by adding 4 µl of 5x first-strand buffer, 2 µl of 0.1 mol/L dithiothreitol, 1 µl of 10 mmol/L dNTP, 1 µl of Superscript II reverse transcriptase, and incubating at 42°C for 1 hour. Second-strand cDNA was synthesized by adding 91 µl of nuclease-free water, 30 µl of 5x second-strand buffer, 3 µl of 10 mmol/L dNTP, 1 µl of Escherichia coli DNA ligase, 4 µl of E. coli DNA polymerase I, 1 µl of RNase H, and incubating at 16°C for 2 hours. Two µl of T4 DNA polymerase were added, and the reaction was incubated at 16°C for 5 minutes. Reactions were stopped by adding 10 µl of 0.5 mol/L ethylenediaminetetraacetic acid. Double-stranded cDNA was purified using the Sample Cleanup Module (Affymetrix).
Synthesis of Biotin-Labeled cRNA with the Enzo Kit: Purified double-stranded cDNA was used in the IVT reaction using the Enzo BioArray high-yield RNA transcript labeling kit (Affymetrix) at 37°C for 4 hours in a 40-µl reaction volume, containing 4 µl of 10x HY reaction buffer, 4 µl of 10x biotin-labeled ribonucleotides, and 4 µl of 10x dithiothreitol, 4 µl of 10x RNase inhibitor mix, 2 µl of 20x T7 RNA polymerase and variable amounts of RNase-free water.
Synthesis of Biotin-Labeled cRNA with the Affy Kit: Purified double-stranded cDNA was used in the IVT reaction using the GeneChip expression 3'-amplification reagents for IVT labeling kit (Affymetrix) at 37°C for 16 hours in a 40-µl reaction volume, containing purified double-stranded cDNA, 4 µl of 10x IVT labeling buffer, 12 µl of IVT labeling NTP mix, 4 µl of IVT labeling enzyme mix, and variable amounts of RNase-free water. Ten additional labeling reactions incubated for only 4 hours were also performed (Affy4h method).
Fragmentation and Hybridization for Enzo and Affy Protocols: One µl of purified biotin-labeled cRNA was then analyzed for purity and concentration by ND-1000 spectrophotometer and Agilent 2100 bioanalyzer. For the cRNA prepared by the Affy4h method, purified cRNA from two reactions were pooled to achieve the required amount of cRNA for hybridization. Fifteen µg of purified cRNA were incubated with the adequate amount of fragmentation buffer (Affymetrix) at 94°C for 35 minutes. A 1-µl aliquot was used to assess complete fragmentation by capillary electrophoresis.
GE Health Care CodeLink Expression System Target Preparation
Twelve biotin-cRNA samples were prepared by the CodeLink method using the CodeLink expression assay reagent kit (GE Health Care, Piscataway, NJ). All reagents used are from this kit unless otherwise specified. One µg of total RNA in 8 µl of nuclease-free water were spiked with 1 µl of working solution of bacterial control mRNAs and 2 µl of diluted poly(A) RNA control (Affymetrix), then incubated with 1 µl of T7-oligo (dT) primer at 70°C for 10 minutes and cooled on ice. First-strand cDNA was synthesized by adding 2 µl of 10x first-strand buffer, 4 µl of 5 mmol/L dNTP mix, 1 µl of RNase inhibitor, 1 µl of reverse transcriptase. and then incubating at 42°C for 2 hours. Second-strand cDNA was synthesized in a 100-µl reaction volume by adding 63 µl of nuclease-free water, 10 µl of 10x second-strand buffer, 4 µl of 5 mmol/L dNTP mix, 2 µl of DNA polymerase mix,1 µl of RNase H, and then incubating at 16°C for 2 hours. Double-stranded DNA was purified using the QIAquik PCR purification kit (Qiagen).
IVT reaction was performed by mixing purified double-stranded DNA with 4 µl of 10x T7 reaction buffer, 4 µl of T7 ATP solution, 4 µl of T7 GTP solution, 4 µl of T7 CTP solution, 3 µl of T7 UTP solution, 7.5 µl of 10 mmol/L biotin-11-UTP (Perkin-Elmer Corp., Wellesley, MA), and 4 µl of 10x T7 enzyme mix and then incubating for 14 hours at 37°C; final reaction volume was 40 µl. Biotin-labeled cRNA products were purified with the RNeasy mini kit (Qiagen). Fifteen µg of cRNA from each sample were fragmented following the recommended procedures in the CodeLink target preparation manual.
Evaluation of Amplification Products
cRNA yield for all methods was assessed in a ND-1000 spectrophotometer (Nanodrop Technologies). Fold amplification was calculated by dividing the total cRNA yield by the estimated mRNA content (2% of total RNA) in the initial starting total RNA of each reaction. mRNA or cRNA size distribution was obtained by capillary electrophoresis with the Agilent 2100 bioanalyzer (Agilent Technologies, Inc.) using the Smear Analysis function of the 2100 Expert software version B.01.02.SI136 (Agilent Technologies, Inc.). Six transcript size regions (0
0.2 kb, 0.2
0.5 kb, 0.5
1.0 kb, 1.0
2.0 kb, 2.0
4.0 kb and 4.0 kb
max) were defined in the electropherograms and then used to determine the percentage of area under the curve for each size interval. Six individual mRNA samples were evaluated to determine the size distribution of unamplified transcripts. All size distribution data were corrected for rRNA contamination. It is important to note that size distribution in the Agilent bioanalyzer is relative to the fluorescence intensity and does not reflect the actual number of transcripts of a given size.
Hybridization, Washing, Staining, and Data Processing
Five cRNA samples from each method were hybridized to Affymetrix GeneChip HG-U95Av2 arrays, which contain 12,625 probe sets representing
10,000 full-length genes. Briefly, 15 µg of fragmented cRNA were mixed in a hybridization cocktail with control oligonucleotide B2 (Affymetrix), eukaryotic hybridization controls (Affymetrix), herring sperm DNA (Promega Corp., Madison, WI), acetylated bovine serum albumin (BSA) solution (Invitrogen Corp.), 2x hybridization buffer (made from MES-free acid monohydrate) (Sigma-Aldrich Corp., St. Louis, MO), MES sodium salt (Sigma-Aldrich Corp.), 5 mol/L NaCl (Ambion, Inc., Austin, TX), 0.5 mol/L ethylenediaminetetraacetic acid (Sigma-Aldrich Corp.), molecular biology grade water, 10% Tween 20 (Calbiochem, San Diego, CA), and 10% dimethyl sulfoxide (for Affy and Affy4h methods only), and variable amounts of water to a final volume of 300 µl. Two hundred µl of hybridization cocktail was hybridized on each array at 37°C for 16 hours. Each array was then washed, and stained with streptavidin-phycoerythrin in a GeneChip Fluidics Station 400 (Affymetrix) and scanned by a GeneChip Scanner 3000 (Affymetrix) as recommended by the manufacturer. Quality control (QC) parameters were derived from the MAS 5.0 algorithm of the GCOS software (version 1.1; Affymetrix). Numerical gene expression data were derived from the raw intensity files using two distinct algorithms: the MAS 5.0 and the MBEI algorithm from the dChip software (http://www.dchip.org).12, 13
Gene expression data has been submitted to the National Center for Biotechnology Informations Gene Expression Omnibus with accession number GSE3254.
Analysis of Gene Expression Data
Present (P) and absent (A) calls are based on the detection calls made by the GCOS software. For the purposes of this study, we defined a transcript (probe set) as truly present in the UHR RNA if it was identified as P at least three times in five replicates of any amplification labeling method.
Data from MBEI PM-only model12 of the dChip software was used for all of the transcript lists analyses. The Avadis Pride software package v3.3 (Strand Genomics, Redwood City, CA) was used for annotation, filtering, and integration of gene expression data. Michael Eisens Cluster and TreeView software tools (http://rana.lbl.gov/EisenSoftware.htm)14 were used to perform hierarchical clustering and view clustering results. Coefficient of variance (CV) for each transcript across samples was calculated by dividing the SD of its intensity values over the mean and expressed as a percentage (%CV).
Two-class unpaired comparisons of gene expression data from two methods were performed with the Significance Analysis of Microarrays (SAM)15
software tool v1.21 (http://www-stat.stanford.edu/
tibs/SAM/). All gene expression profile comparisons with SAM were performed at a false discovery rate (FDR) of less than 0.03% (Delta level of 3.0), except the comparison between Affy and Affy4h data, which was performed at a FDR of 0.32% (Delta = 2.0). STATA software v8.01 (STATA Corp., College Station, TX) was used for all other statistical analysis including correlation studies, Mann-Whitney tests, analysis of variance (all QC data), and regression analysis. SigmaPlot v.8.0 (SSPS Inc., Chicago, IL) and Microsoft Excel (Microsoft, Redmond, WA) were used for all plots.
For each method A to method B comparison of intensity values with SAM, transcripts that showed significantly increased values in method A over B were labeled as "affected by A." Conversely, transcripts significantly increased in method B, therefore decreased in method A, were labeled "affected by B." For the Enzo versus Affy4h comparison, we calculated differences in cytosine content in the target sequence of transcripts affected by these methods. The target sequence of a transcript is defined as the region interrogated by all probes in a probe set in the Affymetrix HG-U95Av2 array. Differences in cytosine content were calculated as the ratio of cytosine (c) to uracil (u) and expressed as guanine/adenine (G/A), thus reflecting the actual mRNA sequence. For the Affy versus Affy4h comparison, transcript sizes reported correspond to the target mRNA sizes reported by the array manufacturer. Both transcript lengths and probe sequence information were obtained from the NetAffx website (www.affymetrix.com).
| Results |
|---|
|
|
|---|
10 µg on average. The CodeLink method had the highest cRNA fold amplification and showed more variability in cRNA yields, which was mostly based on lot-to-lot differences of the amplification kit (Table 2)
|
There were no significant differences across samples in the 3'/5' ratios of GAPDH, Lys, and Phe (Table 3)
. However, the 3'/5' ratios for ß-actin, Dap, and Thr were significantly higher in the samples labeled with the CodeLink method compared to Affy, Enzo, and Affy4h methods (ß-actin and Thr, P < 0.001 for all methods; Dap, P = 0.004, 0.006, and 0.011 for each method, respectively). Interestingly, control transcripts that showed increased 3'/5' ratios in the CodeLink method are all nearly 2 kb long, while the controls not affected by this bias (GAPDH, Lys, and Phe) are all less than 1.5 kb long. Additionally, rRNA sequences were identified as present by the MAS5 algorithm in all but the Enzo labeling method. Interestingly, intensity values for the rRNA probe sets in the Enzo method were not significantly decreased as compared to the others, which indicates that the present/absent calls are not directly related to a higher abundance of labeled rRNA transcripts (Supplemental Figure 1 at http://jmd.amjpathol.org/) and most likely reflect the higher noise and background observed with this method (Table 2)
.
|
|
|
Reproducibility of Gene Expression Measurements
Pair-wise Pearson correlation coefficients of normalized gene expression measurements, within and between methods, were calculated using the set of present transcripts. Gene expression data showed excellent intramethod reproducibility and sensitivity, with correlation coefficients >0.990 for all methods (Table 5)
. The Affy and Affy4h methods had the highest intermethod correlation coefficient (r = 0.989), whereas the Enzo and CodeLink data correlated with each other the least (r = 0.949). With unsupervised hierarchical clustering, the arrays formed distinct clusters based on target preparation methods confirming that intermethod variability is greater than intramethod variability (data not shown).
|
|
|
Sources of Variation
Dual Labeling
The Enzo method uses double-nucleotide labeling (biotin-CTP and biotin-UTP) whereas others use single labeling (Table 1)
. Samples labeled by the Enzo method had higher average unnormalized fluorescence intensity values than all other methods (Table 2)
. As seen in Table 6
for the Enzo/Affy4h comparison, 61.4% of all transcripts have significantly different gene expression values and are therefore affected by the method-dependent variation.
We hypothesized that if this method-dependent variation is a direct result of the double-nucleotide labeling, then the transcripts that show higher gene expression values with the Enzo method will have a higher cytosine content in the transcript sequence interrogated by the probe set, because this nucleotide is only labeled by this method. This was expressed as the G/A ratio of the target transcript sequence as defined in the Materials and Methods section. The average G/A ratio of transcripts showing elevated expression in Enzo data were 1.166 ± 0.485, which is significantly higher than those of transcripts increased by the Affy4h method (0.773 ± 0.305; Mann-Whitney test: z = 32.477; P < 0.00001). When transcripts that are affected significantly by the two methods are categorized according to their G/A ratio, we found that 93.7% of transcripts with ratios >2.0 show significantly higher values with the Enzo method and 84.70% of genes with ratios <0.5 show higher values with the Affy4h method (Figure 3)
.
|
Based on the transcript size shift observed with long IVT reactions, we hypothesized that transcripts with significantly higher expression values in samples labeled with a long (overnight) IVT are more likely to be short transcripts. Therefore, we investigated if genes <1.5 kb would be preferentially amplified by a long IVT labeling method. Figure 4
shows the percentage of transcripts <1.0 kb that are selectively increased in the Affy method in comparison to the Affy4h. These data show an inverse relationship between transcript length and the percentage of transcripts whose expression values were increased by the long IVT. Linear regression analysis shows an R2 of 0.9291, indicating a strong association between the increase of transcript length and the decrease of the proportion of long IVT affected transcripts. This association could not be found when a comparison of both long IVT methods (Affy/CodeLink) was done (Supplemental Figure 4; http://jmd.amjpathol.org/).
|
| Discussion |
|---|
|
|
|---|
Our results suggest that, when choosing or designing labeling kits for clinical applications, attention should be paid to the number of biotinylated ribonucleotides used for labeling at the IVT step. When comparing single versus double-nucleotide labeling with normalized data, we found that
30% of the present genes had substantially higher gene expression values in Enzo (double nucleotide) compared to Affy4h (single nucleotide), suggesting the data sets generated from methods using two labeling nucleotides are not directly comparable to data sets derived by using a single labeling nucleotide. It has previously been shown that incorporation of biotin-CTP is not as efficient as biotin-UTP.8
Our results are in agreement with these findings, because we found differences when the guanine/adenine (G/A) ratio of the targeted sequence was higher than 2, indicating that at least two incorporated biotin-CTPs per biotin-UTP are necessary to significantly increase the amount of fluorescent signal per transcript. It is essential to note that Enzo and Affy4h methods only differ at the IVT reaction, with all other steps and reagents being identical (Table 1)
. Therefore, the IVT reaction is the only source of the observed variation between these two methods. Although other components in this reaction might also vary, the most significant difference between these two methods is the number of biotinylated nucleotides. The possibility that differences in enzyme concentration or buffer composition between the two IVT kits may contribute to the observed variation cannot be formally excluded in our experiments. However, the correlation between the number of guanines in the target sequence and higher expression values in the Enzo method is a strong argument for the role of double-nucleotide labeling as a source of this variation.
We also demonstrate that the distribution of transcripts shifts toward shorter cRNA products in protocols with long IVT incubations, suggesting enhanced amplification of short transcripts. This is further corroborated by the fact that short transcripts were more likely to be increased in cRNA samples from long IVT labeling methods. Interestingly, Spiess and collaborators20 reported a similar cRNA size shift with long IVT incubation, but suggested that degradation of cRNA molecules by T7 RNA polymerase accounted for this observation. However, in our results, long incubations consistently gave higher yields, contrasting with the decrease in cRNA yield after 5 hours observed in their study. Furthermore, in their description of exonuclease activity of T7 RNA polymerases, Sastry and Ross21 indicated that this activity is only unmasked in paused/arrested transcription complexes and that the kinetic balance during normal transcription is balanced toward polymerization. We speculate that the degradation and/or decrease in IVT yields seen by Zhao and colleagues18 and Spiess and colleagues20 with IVT reactions exceeding 4 hours could be a result of paused transcription complexes due to depletion of reaction components. New IVT kits that are designed for longer incubation times seem to overcome this problem. Although the degree of amplification correlated with the increase in short cRNA transcripts, we were unable to assess the role of enzyme concentration between protocols with identical incubation times because the kit manufacturers would not provide this proprietary information.
In this study, the number of transcripts identified as P in a sample, was directly related to the degree of amplification achieved in all methods but one (Affy4h). This suggests that transcripts actually present in a sample are not always amplified successfully, which contributes to the variability within and between assays. In fact, as seen in other studies,4 variability in gene expression measurements was most pronounced in the low-fluorescence intensity range (ie, in the low-expressor transcript range) as would be expected if low-abundance transcripts are not efficiently amplified each time. It is interesting to note that the Affy4h method, in which we used pooled reactions due to low-fold amplification, yielded similar P calls as the CodeLink platform, which showed the highest fold amplification. These results suggest that multiple labeling reactions may be more effective at amplifying low-expressor transcripts, because more transcription initiation events may occur with multiple short-term incubations. Further testing of this hypothesis is currently underway in our laboratory.
In the present study, all methods provided low intramethod CVs, but intermethod variability was considerably higher. Intramethod variability reflects random errors created during the performance of a specific method, whereas intermethod variability comprises both random experimental errors and systematic biases. Average CVs across any two methods ranged from 15.65 to 20.44% approximating the average CV across all methods of 19.93%. Other studies have reported correlation coefficients for the CodeLink and Affymetrix platforms between 0.59 to 0.79.4, 22, 23 In our study we obtained higher correlation coefficients between these two platforms, which could reflect the fact that all samples were hybridized to the same array type, therefore isolating only the variability contributed by the labeling method.
Another significant difference observed between labeling methods was underrepresentation of 5' probes from genes larger than 1.5 kb with the CodeLink method. This phenomenon was observed by Baugh and colleagues,6 and was demonstrated to be related to inefficient reverse transcription. Indeed, when comparing the CodeLink method against all others, which share a common reverse transcription step, the former requires a longer incubation period (2 hours versus 1 hour) that may lead to depletion of dNTPs and early termination of reverse transcription reactions yielding 5' truncated cDNA products. It is also possible that IVT further contributes to 5' underrepresentation when the T7 RNA polymerase fails to transcribe full-length transcripts. It is likely that the majority of gene expression results are not affected by this phenomenon, because most probes in current array designs are 3' biased, but this factor should be taken into account for probes that interrogate the 5' region of selected transcripts.
Until now, reports on gene expression biases were limited to a description of this problem; however, we have performed a characterization of the factors that contribute to this variability. In summary, our results indicate that variability introduced by T7 RNA polymerase-based amplification methods can be explained, at least partially, by two factors: the number of biotinylated nucleotides used in the labeling reaction and the length of the IVT reaction. These biases are not corrected by intensity-based normalization techniques, such as the invariant set normalization method,12 and therefore can generate discordant results even if the same sample is analyzed with different labeling methods. Although our results do not address the impact of these biases in the classification of samples by gene expression (gene expression profiling) or the question of which labeling method best reflects the actual transcriptome, they explain, in part, the discordant results seen between studies with similar experimental designs.3 Our results show that the bias introduced by the IVT method is insufficient to overcome biological variability. However, the fact that it introduces sequence- and transcript size-dependent variation in a systematic manner can lead to erroneous experimental results. This is relevant for researchers using gene expression profiling as a discovery tool and those performing meta-analysis of gene expression profiles from different studies.
It is expected that, newly developed sequence-based normalization methods could overcome these biases in gene expression data. As shown recently, concordance between different platforms has improved substantially thanks to advances in gene annotation and array design,24 and high reproducibility among laboratories can be achieved when standardized protocols and array platforms are used.25, 26 As shown by Dobbin and colleagues,27 biological variability is maintained if a standardized operating protocol (SOP) is used. Therefore, studying the effect of different labeling protocols on the ability to detect biological variability would have been redundant to other studies.26 However, it is expected that standard operating procedures to perform clinical tests based on gene expression profiles will be developed. To this end, data from our experiments could also be used to establish which microarray probes have acceptable performance across multiple labeling protocols, as suggested by Daly and colleagues19 Our results emphasize the importance of standardization in target preparation methods to optimize gene expression analysis and achieve a consistency compatible with the clinical application of this technology.
| Acknowledgments |
|---|
| Footnotes |
|---|
Supported by the Pennsylvania Department of Health (Pennsylvania Cancer Alliance Bioinformatics Consortium grant ME-01740 to M.J.B.) and the College of American Pathologists Foundation (scholars award to F.A.M.).
This work was performed at the Clinical Genomics Facility of the University of Pittsburgh Cancer Institute.
Supplemental material for this article can be found on http://jmd.amjpathol.org/.
Accepted for publication November 18, 2005.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. I. Dumur, M. Lyons-Weiler, C. Sciulli, C. T. Garrett, I. Schrijver, T. K. Holley, J. Rodriguez-Paris, J. R. Pollack, J. L. Zehnder, M. Price, et al. Interlaboratory Performance of a Microarray-Based Gene Expression Test to Determine Tissue of Origin in Poorly Differentiated and Undifferentiated Cancers J. Mol. Diagn., January 1, 2008; 10(1): 67 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Cronin, C. Sangli, M.-L. Liu, M. Pho, D. Dutta, A. Nguyen, J. Jeong, J. Wu, K. C. Langone, and D. Watson Analytical Validation of the Oncotype DX Genomic Diagnostic Test for Recurrence Prognosis and Therapeutic Response Prediction in Node-Negative, Estrogen Receptor-Positive Breast Cancer Clin. Chem., June 1, 2007; 53(6): 1084 - 1091. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |