| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
From the Division of Experimental Medicine, Genomic Medicine Group, Eli Lilly and Company, Indianapolis, Indiana
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
The Affymetrix GeneChip platform is the most widely used commercially available microarray for expression analysis. In this technology, pairs of 25-nucleotide oligos are synthesized in situ on silica wafers. Each probe pair contains an oligo that exactly matches the target sequence (perfect match, PM) and a second oligo that differs by a single nucleotide in the center of the oligo (mismatch, MM).4, 5 The presence or absence of a given target sequence in the sample can then be calculated using one of several algorithms based on comparing the PM and MM signals across a probeset of 8 to 16 probe pairs for a given sequence. In addition, the relative expression level of a target sequence can be estimated by the intensity of the signal.
Sample preparation for Affymetrix genechip analysis is a multistep process that includes isolation and cleanup of RNA from target tissue, generation of cDNA by reverse transcription, synthesis of biotinylated cRNA from this cDNA template, hybridization to the Affymetrix chip, and staining using streptavidin-phycoerythrin. Each of these steps has the potential to introduce analytical variability into the final results. For many research applications using microarrays, this variability is reduced by batch analyzing samples from a given experiment using a single analyst in a limited number of runs with single lots of reagents. Alternatively, large populations of patients can be analyzed and data analyzed on a group basis, which minimizes the impact of variability in individual analyses by evaluating changes across a population. These approaches contrast sharply with potential clinical applications of microarray technology, in which patient samples will likely be analyzed in real time as they are collected and will need to be analyzed on an individual patient basis, not as part of a combined population. In addition, changes throughout time in reagent lots, analysts, and machine performance could also introduce analytical variation in results obtained on different occasions or at different testing locations. For these reasons, understanding the analytical variability of Affymetrix microarrays in such a context will be critical.
For any clinical assay, an understanding of the normal variation of the marker is needed to allow interpretation of patient results. The overall variability of the assay is influenced by preanalytical factors, by the analytical precision of the assay, and by the baseline biological variability of the marker being examined. Biological variability is an inherent characteristic of the analyte being measured, and protocols to evaluate within- and between-patient variability of clinical assays have been described.6 However, there is usually little that can be done to influence the amount of biological variability. In contrast, preanalytical variability can often be reduced by identifying sources of variation and limiting their impact. For Affymetrix analyses, the quality of input RNA and the consistency of sample collection and processing methods have been identified as crucial factors. Dumur and colleagues7 recently described QC criteria for input sample RNA derived from a variety of sources, and similar criteria have been recommended in a recent best practices document from the Tumor Analysis Best Practices Working Group.8 Once preanalytical variation has been reduced as much as possible, the analytical precision of the assay becomes the single biggest factor in determining what level of biological change can be seen.
Because of the complexity of the Affymetrix analytical process, there are many factors that can contribute to analytical variation. Several groups have looked at the overall precision of Affymetrix analyses, and have found average probeset coefficient of variations (CVs) ranging from 8 to 13% on a variety of human and rodent microarrays.9, 10, 11 However, the majority of these studies were done using a relatively small number of replicates (<15 in most cases) and were analyzed in a limited number of runs. In our study we have tried to more closely mimic potential clinical applications by evaluating precision using a much larger number of technical replicates throughout an extended period of time, using multiple operators, instruments, and reagent lots. In addition, we have specifically looked at the contributions of each of these components to overall variability to identify steps for which rigorous control criteria would provide the most benefit. Using this approach, we have generated a more realistic estimate of the analytical precision of this technology in clinical use, and provide a model protocol that could be used to validate clinical applications of Affymetrix microarrays.
| Materials and Methods |
|---|
|
|
|---|
Tumor RNA was isolated from a uterine leiomyoma specimen obtained from the Department of Pathology, Indiana University School of Medicine, in accordance with the guidelines of Indiana University and with the approval of the Indiana University-Purdue University Indianapolis institutional review board. The tissue was surgically removed, snap-frozen in liquid nitrogen, and stored at 80°C until ready for use. Two- to three-mm cube sections of tissue were cut off and mechanically homogenized in the presence of TRIzol reagent. Homogenates were pooled together, mixed well, separated into 1-ml aliquots, and stored at 80°C.
General Microarray Analytical Procedure
Total RNA was isolated from the TRIzol aliquots following the manufacturers instructions, and purified using an RNeasy kit (Qiagen Inc., Valencia, CA). RNA integrity and yield were assessed by determining sample absorbance at 260 and 280 nm and by analysis on the Agilent RNA 6000 Nano LabChip (Agilent Technologies, Inc., Palo Alto, CA). All samples had 260:280 ratios >1.8 and clear 18S and 28S ribosomal RNA bands on the Agilent. Complementary RNA synthesis and gene expression profiling were performed following the protocol described in the Affymetrix GeneChip Expression Analysis Technical Manual,13
with only minor changes. Briefly, 5 µg of cleaned total RNA were used to generate double-stranded cDNA by reverse transcription, using a Superscript, double-stranded cDNA synthesis kit (Invitrogen) and an oligo deoxythymidylic acid primer with a T7 RNA polymerase promoter site added to the 3' end. After second-strand synthesis, cDNA was cleaned with a GeneChip Sample Cleanup Module (Affymetrix). Biotin-labeled cRNA was produced by in vitro transcription, using the Enzo BioArray high-yield RNA transcript labeling kit (Enzo Diagnostics, Farmingdale, NY). Labeled cRNA was cleaned with a GeneChip Sample Cleanup Module, dried down in a Savant Speed Vac concentrator (Savant Instruments, Inc., Holbrook, NY) and resuspended to a concentration of 1 µg/ml. Twelve µg of the concentrated cRNA product was fragmented by metal-induced hydrolysis at 94°C for 35 minutes. The efficiency of the fragmentation procedure was checked by analyzing the size of the fragmented cRNA on the Agilent 2100 bioanalyzer. Each fragmented sample was then used to prepare 200 µl of hybridization cocktail containing 100 mmol/L MES, 1 mol/L NaCl, 20 mmol/L ethylenediamine tetraacetic acid, 0.01% Tween 20, 0.1 mg/ml herring sperm DNA (Promega Corp., Madison, WI), 0.5 mg/ml acetylated bovine serum albumin (Invitrogen), 50 pmol/L control oligonucleotide B2, 100 pmol/L eukaryotic hybridization controls (Affymetrix), and 6 µg of fragmented sample. Samples were then hybridized for 16 hours to human genome U95Av2 or U133A arrays (Affymetrix).
After hybridization, GeneChips were washed and stained with streptavidin-phycoerythrin (Molecular Probes, Inc., Eugene, OR), according to the appropriate standard protocol for each chip type. Arrays were scanned using the Affymetrix GeneChip Scanner 3000 and image analysis was performed using Affymetrix Microarray Analysis Suite (MAS), version 5.114 . Each sample was scaled to a target intensity of 500 for all probe sets. Signal values, detection calls (present, absent, marginal), and P values for each detection call were generated using MAS 5.1 Absolute Expression Analysis.15
Experimental Design for Analytical Precision Studies
A total of 64 HG-U95Av2 chips were used to assess probeset precisions and the contributions of components of variation to the assay, including day/run, analyst, cDNA synthesis reaction, cRNA synthesis reaction, fluidic station, and chip lot. Eight runs of eight chips were performed using aliquots of CEM-MTA cells in 1 ml of TRIzol. Each run consisted of two analysts, each extracting total RNA from one aliquot of cells (Figure 1A)
. Both analysts worked side-by-side throughout the procedure in this initial experiment, to ensure that all samples were frozen for the same lengths of time between each step. Each total RNA sample was used as a template for two cDNA synthesis reactions, using the Proligo T7-(dT)24 primer. Each cDNA sample was then in vitro-transcribed, fragmented, and hybridized to two separate arrays, yielding a total of four chips per analyst per run. Within each run, the four chips of each analyst were alternated between two fluidics stations. Five runs (40 chips) were completed using one lot of HG-U95Av2 arrays and three runs (24 chips) were performed using a different lot of arrays to assess chip lot-to-lot variation. With the exception of chip lot, the lot number was fixed for all other reagents within the experiment. Chips were always scanned in the same order, 1 to 8, for every run.
|
Tumor tissue matrix studies were performed using a total of 16 U133A GeneChips analyzed by two analysts in four runs of four chips each (Figure 1C)
. For each run, analysts each extracted total RNA from two aliquots of homogenized tissue, stored in TRIzol. Each total RNA sample was used as a template for two cDNA synthesis reactions that were then transcribed, fragmented, and hybridized to a single U133A array per sample. A single reagent lot was used for all reagents within this experiment.
Statistical Analysis of Data
A variance components analysis was done by gene and then summarized across genes using SAS (SAS Institute Inc., Cary, NC). Genes were categorized by the percentage of the chips on which they were called present or marginal or by median Affymetrix detection P value. The CV for each probeset was calculated as 100*sqrt(variance)/mean, where the mean is the overall average of the chip values within each experiment. Variance was calculated using a mixed model approach16
with run and cDNA/cRNA production treated as nested random effects and analyst, machine, and chip lot treated as fixed effects in the initial study. For the reagent lot experiment, chip lot and run were treated as nested random effects and analyst, cDNA kit, IVT kit, hyb mix, and RNA clean-up kit were treated as fixed. The residual variance is composed of pure chip-to-chip variability, which represents the underlying variation in the assay contributed by factors other than those specifically tested above. The fixed effect means were converted to variance components under strict assumptions,17
which assumes the same two analysts, machines, and lots are to be continuously used with equal probability for any sample. The square root of the sum of the variance components (total variance) is used to estimate the SD of a gene measurement on one chip.
| Results |
|---|
|
|
|---|
The majority of genes were consistently classified in the same category (P/M versus A) throughout the experiment. Of the 12,625 probesets 7087 had identical calls on all 55 chips analyzed, while 83.5% of probesets were concordant on 50 or more chips (Figure 2A)
. CVs for individual probesets ranged from 6 to 353% and was related to signal intensity, with the highest CVs occurring in probesets with mean signal intensities of lower than 100 (Figure 2B)
. However, the majority of the highly variable probesets were for absent calls. Present probesets (defined as probesets with median P values of <0.06 in the 55 chips) had much more reproducible signals, with an average CV of 21.9% (SD 9.6%) and a 95th percentile of 40% CV. It is also interesting to note this relatively low CV continues to hold for probesets with P values >0.06, and only starts to rise at a faster rate for P values >0.1 (Figure 2D)
. This suggests that from an analytical standpoint, a cutoff of 0.06 is relatively conservative.
|
40% of the total variability (Figure 3A)
|
Twelve runs of four chips were performed using the same cell line RNA preparation as before. Analyses were run by two different analysts on 12 different occasions throughout a 7-month period. Four different chip lots were included in the study, and two different lots of cDNA synthesis kits, IVT kits, cRNA cleanup kits, and hybridization cocktail were used (Figure 1B)
. All other assay components were fixed to a single lot. No run failures occurred, and data from all 48 chips were included in the final analysis.
Concordance of P+M calls was again good between chips, with 17,130 out of 22,283 probesets (76.9%) giving identical calls on >90% of the chips analyzed. As before, the highest CVs occurred at signal intensities <100, mainly in absent calls (Figure 4A)
. Mean CV for present probesets was 27.2% (SD, 9.7%), with 90.8% of present probesets having a CV of <40%. As a population, the variability of U133A probesets under the tested conditions was slightly higher than that seen for the U95Av2 chips (Figure 4B)
. Once again, chip-to-chip variation was the largest contributor to overall variability, responsible for
55% of the overall variability seen in the present probesets. The only reagents that substantially contributed to variation were different RNA cleanup kits and chip lots (Figure 3B)
. Changes in reagent lots of the cDNA synthesis kits, IVT kits, and hybridization mixes contributed minimal amounts to the overall variation. Interestingly, run-to-run and analyst variability were higher in this study than in the U95Av2 study, with each contributing
10% of the overall variability. This may be due to the longer time period throughout which these studies were run (7 months as opposed to 3 months) as well as the fact that the analysts were no longer working side-by-side, and may more accurately represent the effects of different analysts in a clinical laboratory context.
|
To determine whether probeset performance was consistent between samples, we compared probeset CVs for the 7434 probesets that were present in both the cell line and tumor RNA samples. As shown in Figure 5A
, the correlation between probeset CVs in the two experiments was low. This is not entirely surprising because CV is related to signal intensity and the relative signal intensities were different for most probesets between samples. If the results are divided into quadrants using the 40% CV cutoff described above, one can see that the majority of probesets showed less than a 40% CV in both the cell line and tumor homogenate (Figure 5A)
. Table 1
lists the probesets with the highest variability in the other three quadrants. In general, probesets that were highly variable in one sample but not the other tended to have lower median signal intensities in the more variable sample (Figure 5B)
. It is interesting to note that many of the probesets that showed high variability in both samples were AFFX ribosomal RNA probesets, five of which have been removed from the U133A vs 2.0 GeneChips (Affymetrix website: http://www.affymetrix.com/support/help/faqs/hgu133_2/faq_14.jsp).
|
|
| Discussion |
|---|
|
|
|---|
Chip-to-chip (residual) variation was the largest component of overall variation in all experiments, suggesting that controllable factors such as reagent lots, analyst, and fluidic station play a limited role in the precision of this technique. This is promising from the standpoint of clinical application of Affymetrix genechip arrays, because it suggests that assays could be performed in a clinical laboratory context without seriously compromising precision. Although the precise cause of the underlying chip-to-chip variation is unknown, possibilities may include subtle differences in within-lot chip manufacturing, chip-to-chip surface variability in hybridization conditions, or likely a composite of multiple factors. Among the controllable factors, chip lot and cRNA cleanup kits were the most notable, suggesting that efforts to better standardize these steps would yield the most return in terms of improved precision.
By the nature of this experimental design, the CVs generated here represent an upper estimate of overall variability because no acceptance/rejection criteria were applied to chip output. The elimination of chips by using hybridization QC metrics8 would likely reduce the CVs for many probesets. We intentionally did not include such manipulations in our study because the precise criteria used can vary from institution to institution, and because we wanted to capture an estimate of the maximal variability of the assay throughout time. However, adoption of standardized criteria could potentially be useful in this regard.
Data generated using the paradigm demonstrated in this study can be applied in a number of ways. One use would be to establish QC material for potential clinical applications. Probeset CVs could be used to derive control ranges for a standard RNA sample, which could then be run as a QC material in production runs. These control ranges could be used to set acceptance/rejection criteria for each run either based on a whole-chip analyses, or on a limited subset of probesets. This would be most applicable for uses such as tumor profiling, in which the QC criteria could be limited to those probesets that make up a tumor signature. Because CVs can vary substantially between probesets and within probesets depending on signal intensity, it will be important to empirically determine appropriate CV ranges in an assay-specific manner using a control material that represents the appropriate probesets for the target application.
Another potential application of the approach demonstrated here is to identify probesets with consistently poor precision so that inclusion of these probesets in diagnostic signatures or in QC metrics can be avoided. One limitation of this study is that not all probesets represented on the chips could be examined because many RNAs were not expressed in the tissues tested. Fifty-three percent of the U133A probesets were present in one of the two samples tested. In addition, because CV is related to signal intensity, targets that happened to have low expression levels in the two RNAs tested could appear to have an unrealistically high CV in our study that might be improved at higher target levels. This is a natural limitation of using cellular RNA as a test material because no sample will express all genes represented on the chip. Once again, this demonstrates the need to tailor potential QC materials to the assay in question so that all of the relevant probesets are adequately represented in the material.
In conclusion, we have described a paradigm for precision profiling of Affymetrix microarrays that could be used as part of a validation protocol for clinical applications of this technology. The ultimate goal with any test is to eliminate as much analytical variability as possible, thereby reducing the contribution of nonbiological effects on the reported result. The results reported here give an estimate of the magnitude of these analytical effects, and serve as a marker against which process modifications can be compared. The variability measured in these experiments is probably an upper limit, as no acceptance/rejection criteria were applied to these samples and all data were included in the final analysis. Analytical precision could potentially be improved by such an approach, either by using standard Affymetrix chip-based parameters such as background and 5':3' ratios or by incorporating control materials in the analytical process, using limits derived from experiments such as these. Defining appropriate control strategies will be important for the widespread dissemination of this platform to support clinical applications of this technology.
| Footnotes |
|---|
Accepted for publication April 1, 2005.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Cronin, C. Sangli, M.-L. Liu, M. Pho, D. Dutta, A. Nguyen, J. Jeong, J. Wu, K. C. Langone, and D. Watson Analytical Validation of the Oncotype DX Genomic Diagnostic Test for Recurrence Prognosis and Therapeutic Response Prediction in Node-Negative, Estrogen Receptor-Positive Breast Cancer Clin. Chem., June 1, 2007; 53(6): 1084 - 1091. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ma, M. Lyons-Weiler, W. Liang, W. LaFramboise, J. R. Gilbertson, M. J. Becich, and F. A. Monzon In Vitro Transcription Amplification and Labeling Methods Contribute to the Variability of Gene Expression Profiling with DNA Microarrays J. Mol. Diagn., May 1, 2006; 8(2): 183 - 192. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Knudtson, H. Auer, A. I. Brooks, C. Griffin, G. Grills, S. Hester, G. Khitrov, K. S. Lilley, A. Massimi, J. P. Tiesman, et al. The ABRF MARG Microarray Survey 2005: Taking the Pulse of the Microarray Field. J. Biomol. Tech., April 1, 2006; 17(2): 176 - 186. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |