| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Special Article |


From the Department of Pathology,
*
Memorial Sloan-Kettering Cancer Center, New York, New York; the Department of Pathology and Microbiology,
University of Nebraska Medical Center, Omaha, Nebraska; and the Childrens Hospital of Los Angeles
and Keck School of Medicine, University of Southern California, Los Angeles, California
| Introduction: Molecular Classification of Tumors by Analysis of Gene Expression Profiles |
|---|
|
|
|---|
Expression profiling is a new investigative and diagnostic modality made possible by recent scientific and technological advances (for recent reviews, see1, 2 ). The key scientific advance is the Human Genome Project, in particular, its so-called "expressed sequence tag" (EST) component, which focused on collecting the sequences (in most cases only partial, i.e., "sequence tags") of all of the genes (both known and unknown) expressed in a wide variety of human tissues. By now, the majority of named and anonymous human genes are represented respectively in the GenBank and the EST sequence databases. The key technological advance has been the miniaturization and automation of molecular biology, specifically the advent of DNA microarray technology. DNA microarrays can be made on either nylon membranes, glass slides, or synthetic "chips," and can consist either of cDNA clones or cDNA clone-specific oligonucleotides. The numbers of DNA spots on microarrays can number from the hundreds to the tens of thousands.
Expression profiling refers to the process of establishing the pattern of expression of thousands of individual genes simultaneously in a given cell or tissue sample by extracting its RNA, converting it to cDNA and hybridizing the labeled cDNA to a DNA microarray. In a sense, pathologists already use gene expression studies diagnostically on a daily basis in the form of immunohistochemistry. Expression profiling provides a supraexponential increase to the amount of gene expression data available on a given tumor, but lacks the topographical information of immunohistochemistry. Nonetheless, it is widely believed that the latter shortcoming will be more than compensated by the sheer multiplicity of the gene expression data. It is already apparent that expression profiling can be used to distinguish different human tumors. With fascinating exceptions,3 some of the differential diagnoses achieved so far by this approach have been rather straigthforward by the standards of histopathology.4, 5, 6, 7, 8 Thesetypes of microarray-based "assays" have not yet been subjected to a rigorous evaluation of sensitivity, specifity, and cost-effectiveness as a diagnostic modality. However, ongoing developments such as the rapid and continual scientific refinement of expression profiling, the possibility of mass production of smaller more narrowly targeted microarrays, and the potential for automated software-based recognition of specific gene expression patterns, promise to address many of these concerns and present limitations.
Aside from its diagnostic applications, expression profiling can also provide important prognostic information.6 It is evident that morphology alone, while quite useful for the diagnostic classification of tumors, does not reflect the entire biology of a given tumor. Tumors of similar appearance can behave differently and expression profiling is expected to predict tumor behavior and response to chemotherapy more reliably than tumor histopathology.
In recent years, the impact of diagnostic molecular genetic markers, typically based on the detection of specific translocations, has been felt most acutely in hematopathology and sarcoma pathology, where the concept of morphological/genetic entities has become well accepted. Expression profiling promises to extend this process to all tumors, including carcinomas. A key difference is that, as a more "global" approach, expression profiling may require less prior expert histopathological "triage" that any prior ancillary diagnostic technique. Will expression profiling evolve into another "ancillary" technique or will it come to be seen as an alternative to expert surgical pathology? At present, expression profiling studies are highly dependent on histopathology for validation,1, 9 but it is conceivable that expression profiles may eventually become "disconnected" from the underlying morphology, assuming the role of clinically relevant entities. The extent of involvement of the pathology community in the further development of microarray-based expression profiling as a diagnostic test may determine the nature of its ultimate relationship to conventional tissue-based diagnosis.9
| Molecular Classification of Lymphomas |
|---|
|
|
|---|
Neoplastic transformation is generally initiated by a genetic lesion due to an error occurring during normal cell function or due to unrepaired genotoxic damage.10 This initial event provides an increased chance for other genetic lesions to develop, usually over a number of years. When the "proper combination" of genetic lesions has accumulated in a cell, it will then have the full potential to generate a malignant tumor. As the neoplastic cells continue to divide and expand, more genetic lesions may be acquired and some of these may contribute to enhanced growth rate, independence of growth signals, resistance to death inducing signals and other features that make the tumor more clinically "aggressive" and/or resistant to treatment.
We can postulate that the characteristics of a tumor and its clinical behavior are determined by the unique set of genetic lesions harbored by the tumor cells. The genetic lesions in a neoplastic cell will alter the pattern of mRNA expression in the cell and this pattern can be regarded as the "molecular signature or fingerprint" of the tumor. Tumors with closely related genetic lesions will have very similar "signatures" and also will be expected to have similar clinical behaviors. It is, therefore, logical to make the following assumptions: 1) gene expression profiling will help us accurately classify lymphomas, 2) gene expression profiles will be helpful in predicting the clinical behavior of lymphomas, and 3) it is possible from studying gene expression profiles to identify genes that are important determinants of the behavior of lymphomas.
There are over 40,000 known sequences (including ESTs) of mRNAs which constitute a substantial fraction of all possible mRNAs. Technical advances have allowed investigators to put 10,000 or more of these known gene sequences on a solid support (microarray).11, 12 These microarrays can then be used to determine the concentration of mRNA species in a tumor extract. The large amounts of data generated from those experiments (5,000 to 10,000 data points for each tumor samples which can reach several million pieces of information in a study) also require the development of sophisticated methods of data management and analysis.3, 5, 13
A multi-institutional, multinational study has been initiated to obtain the gene expression profile of five major types of B-cell lymphomas. All of the cases entered into this study must have adequate stored frozen tumor tissues as well as adequate clinical data and follow-up information. About 20 to 30% of the cases also have cytogenetic data. The study began with diffuse large B-cell lymphoma because it is one of the most common types of lymphoma which means that we will have enough cases to arrive at statistically significant conclusions at the end of the study. Dr. Lou Staudt at the National Cancer Institute has designed a microarray, the "Lymphochip", that is especially suitable for the analysis of B-cell lymphomas and a preliminary study of 42 cases of diffuse large B-cell lymphoma from the University of Nebraska and Stanford University have shown that it is feasible to determine the gene expression profiles of archival frozen tissue collected in a clinical setting.6 These cases were divided into at least two subgroups according to their gene expression profiles. One group of cases had a gene expression profile reminiscent of that expressed by normal germinal center cells while the profile of the other group resembled that seen in peripheral blood B-lymphocytes activated by a number of mitogenic stimuli in vitro. Analysis of the clinical behavior of these two groups of patients suggested that tumors expressing the germinal center cell profile had significantly better survival and this seemed to apply even to patients with low clinical risk factors. The preliminary data are very promising and we are following up with a much larger study (approximately 300 samples have now been collected for analysis) to confirm and extend our current findings. An initial analysis of the first 100 cases in this new study confirms the validity of subdividing DLBCL using gene expression profiling.
While gene expression profiling is a very powerful technique, it is absolutely essential to have the appropriate corresponding clinical data and preferably also cytogenetic data to optimally interpret the results from gene profiling experiments. The latter is illustrated by our recent FISH analysis of t(14;18) on the DLBCL on the microarray study.14 All of the seven positive cases were in the GC-B cell group and six of the cases were tightly clustered indicating they have a very similar gene expression pattern. It is likely that other significant genetic alterations with unique gene expression profiles will be discovered in the future.
We anticipate the identification of key genetic components that determine the characteristics of a lymphoma and its clinical behavior in the next five to ten years. It is possible to use this information to design a simpler and less expensive microarray, perhaps containing several hundred genes instead of 10,000 or more genes, for diagnostic use. This "diagnostic chip" could provide rapid molecular characterization of every B-cell lymphoma at presentation for optimal treatment decisions and prognostication. Alternatively, a panel of monoclonal antibodies may be designed based on the corresponding protein expression data to serve the same purpose. We also anticipate the identification of new and significant genetic lesions that will lead to better understanding of the key events in the pathogenesis of lymphoma and in tumor progression. The insight gained may help to identify novel targets for preventive and therapeutic intervention.
| Molecular Classification of Sarcomas: Gene Expression Profiling of Childhood Sarcomas |
|---|
|
|
|---|
The Genetic Basis of Cancer and DNA Microarrays
Cancer is by definition a genetic disease. The genetic defects
affect many genes, by mutation, deletion, amplification, and probably
most especially dysregulation: e.g., inappropriate expression of
structurally normal or abnormal genes. In aggregate, this results in a
host of gene expression abnormalities detectable by DNA microarrays and
other complementary technologies.15, 16
DNA microarrays in
particular have become a preferred method of simultaneously analyzing
thousands of expressed genes. Even the absence of expression can be
useful information. The thought is that the aggregate body of
information, including semiquantitative expression levels, will yield
an expression profile largely unique to a given tumor, but comparable
to other similar tumors, leading to a molecular classification of
tumors.
Although many authors are using this technology to analyze many of the common forms of cancer,5, 7, 17 we have chosen to focus on sarcomas of childhood and adolescence, for several reasons: 1) they routinely express unique chimeric genes that are pathogenic in many cases for a class of tumors, 2) survival among patients with seemingly comparable tumors varies widely, even when treated on the same regimen, suggesting the existence of inherently aggressive and non-aggressive forms of the disease not detectable by conventional means, and 3) growth and progression in these tumors is thought to be driven by genes such as growth factors and their receptors that might represent feasible targets for future therapeutic drug development.
DNA Microarray Technology
There are currently three major forms of high density microarrays
capable of analyzing as many as 12,000 genes at a
time.18, 19, 20
Perhaps the most common are cDNA spotted
arrays, typically on standard 1X3 glass microscope slides. These
require competitive hybridization and are usually produced by the
investigator or a core facility. Problems with the identity of the cDNA
clones have plagued this method, but many labs and commercial suppliers
claim to have rectified this problem. A derivative technique with the
potential to ensure accurate identity of the genes in question is the
spotted oligomeric DNA microarrays. Several labs and at least one
commercial firm use this approach. The oligos are usually about 50
bases in length, and multiple representation along the length of a gene
can be incorporated. Cost and performance issues have to date limited
the use of this approach, but the method does address several of the
issues noted above with cDNA arrays. The third form of microarrays is
commercially produced by Affymetrix. (Santa Clara, CA) In this
method, 25 base oligos are synthesized in situ, with each
gene generally represented by at least 20 oligos distributed along the
5' to 3' length of the gene. Each is paired with a single base mismatch
oligo. The differential hybridization to fragmented, labeled cRNA is
then taken as the specific hybridization signal, and the mean of all
such pairs yields the gross expression value. Unlike spotted arrays, no
competitive "normal" sample is used; instead, control cRNAs are
spiked into the mixture and used as internal hybridization controls.
Sample Preparation
A frequently overlooked issue in microarray analysis is the
quality and identity of the starting material.21
We
address this issue by freezing all tumor samples in OCT, cutting pilot
frozen sections, reviewing an H&E stain thereof, and trimming
unsuitable tissue from the block. Only representative, viable tumor
tissue is subsequently processed. Typically, a dozen frozen sections
are sufficient to generate the requisite 5 µg of total RNA needed for
a chip. This RNA is used to generate cDNA by reverse transcriptase,
then cRNA using RNA polymerase II. In this step, biotinylated,
fluorescently labeled nucleotides are incorporated into the cRNA. This
labeled mixture is then fragmented and hybridized with the GeneChip.
Amplification of the signal is achieved using fluorescently labeled
avidin. Arrays are read in a confocal laser scanner and the data
analyzed by a software package (MicroArray Suite) supplied by
Affymetrix.
Data Analysis
The data from a single 12,000-gene Affymetrix GeneChip
occupies 87 megabytes of storage and require significant manipulation
before interpretation. Normalization of all values to a mean of 1000 is
commonly used. Data cleanup is an important first step and various
algorithms to reduce noise, increase sensitivity, and manage artifacts
of data set transformation are often used to improve the
reproducibility and reliability of the expression data.21
Additional considerations include handling of "bad" data points,
incorporation of co-variate data such as pathological diagnosis and
clinical status, and avoiding overfitting of the data due to large
numbers of gene expression values from a limited number of samples.
Once reasonable datasets have been created, the expression values for all 12,000 genes, or a subset of "present" (e.g., expressed) genes may be used. These values are typically exported into a variety of software analysis packages. There are now a large variety of commercial, shareware, and free analytic packages available online and commercially.20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 Typical analyses include scatter analyses, whereby two samples or sample sets are compared pairwise to identify genes deviating from an ideal regression line by two-fold or more and various forms of hierarchical clustering, from simple phylogeny trees (or dendrograms) to more sophisticated unsupervised learning methods such as k means clustering, self-organizing maps, and a variety of related clustering methods. Increasingly, methods such as relational networks and imputed pathways, to name only two, are appearing, designed to relate groups of seemingly related genes to discrete functional groups or pathways.
Use in Cancer Studies
Diagnosis. In our studies to date, we have
found expression profiles to be powerful tools to separate different
diagnostic groups of tumors. In one pilot study, simple hierarchical
clustering readily separated all but two of 23 "small round cell
tumors" of childhood. Further, more sensitive clustering methods
readily distinguish subcategories such as embryonal vs. alveolar
rhabdomyosarcoma, and even distinguished subsets of alveolar RMS with a
Pax3/FKHR, Pax7/FKHR, or no translocation from one another. It appears
gene expression profiles are powerful diagnostic tools whose full
potential is only beginning to be realized.
Prognosis. Even more intriguing is the ability of gene expression profiles to detect different prognostic groups. In a pilot study of 45 cases of mixed favorable (e.g., responsive to chemotherapy) and unfavorable (e.g., unresponsive to therapy) osteosarcomas admixed with comparably favorable (e.g., embryonal) and unfavorable (e.g., alveolar) rhabdomyosarcoma, clustering methods such as expectation maximization correctly identified the majority of patients who survived and those who did not, with P values less than 0.0001. Notably, no other existing method like histology or even response to therapy was as reliable. Further, this was independent of diagnosis, suggesting a broader applicability of this approach.
Therapeutic Targets. An inherent byproduct of gene expression profiling is the identification of genes seemingly intrinsically associated with the tumor and its characteristics. When, for example, metastatic (and therefore generally lethal) tumors were compared to non-metastatic and favorable tumors, several candidate genes that represent potential therapeutic targets, much like the HER2/neu receptor in breast cancer, were readily identified. This will require biology studies to validate a functional role for such genes in these tumors, but their identification frequently circumvents years of work to identify such candidates by conventional methods. This is a particularly important potential clinical use of microarray technology.
| Molecular Classification of Carcinomas |
|---|
|
|
|---|
Carcinomas are a common and complex group of malignant tumors presenting many diagnostic and therapeutic problems. The promise of high throughput parallel molecular analysis to provide a blueprint reflecting the basic biology underlying tumor phenotypes has captured the imagination of scientists and clinicians. It is expected that the modulation of gene expression will correlate with a particular tumor type and could ultimately contribute to diagnostic decisions and therapies tailored to an individual patient.44 Significant practical and theoretical issues remain, but even in these early days of large scale molecular profiling, it appears likely that a systematic approach to the study of gene expression patterns in carcinomas can make important contributions as a means to identify biologically and clinically relevant tumor types and genes.
Many studies have begun to evaluate the potential of gene expression analysis. We have used both cDNA and oligonucleotide arrays for gene expression analysis with several types of carcinomas. It is evident that there are significant technical challenges associated with expression profiling of clinical samples that must be addressed such as the extreme intrinsic biological diversity and tissue heterogeneity of tumors. Nonetheless, the results of early studies are encouraging. Unsupervised methods of analysis have shown that underlying biological distinctions (different cell and tissue types) are associated with discrete expression profiles. Using supervised methods, individual samples can be effectively classified based on gene expression profile similarities to known classes (expression profiles of cell lines resemble their tissue counterpart, tumors of common origin at least to some extent have similar profiles).7, 45, 46, 47 It is also evident that biological differences that exist even among related tumors (e.g., prostate carcinomas in an androgen-rich or androgen-deprived environment) are associated with expression profiles that diverge.48 The differentially expressed genes in this case include some that are known to be regulated by androgens and others that have been postulated to play a role in androgen independent disease. This result suggests that the gene expression differences identified in such cases may be physiologically relevant.
It appears that the large data sets produced by comprehensive analysis of gene expression have the potential to provide novel insights into biology at the molecular level. However it is also evident that marked heterogeneity exists in the expression profiles of carcinomas that is not yet understood. In studies to identify gene expression profiles that discriminate between traditional clinicopathologic subtypes (such as primary site), carcinomas do not always cluster as expected. This suggests that molecular structure may define previously unidentified categories of neoplastic disease for which clinical relevance has yet to be defined. Further improvements in profiling technology, analysis of large numbers of samples to assess frequency, and integration of clinical and histopathologic data are needed to define the utility and limitations of molecular characterization. Ultimately clinical uses will be dependent on large-scale validation studies of appropriate design and a better understanding of the functional properties of individual genes.49 It is likely that molecular characterization in some form will become an increasingly useful standard clinical tool.
| Footnotes |
|---|
Supported by the following NCI Directors Challenge U01 Awards: CA84967 (W.C.C.), CA88199 (T.J.T.), and CA84999 (W.L.G.) M.L. is supported by NIH P01 CA47179 and American Cancer Society RPG99-216.
Authorship is in order of oral presentation.
Timothy Triche also represented co-investigators D. Schofield, E. Mjolsness, B. Wold, J. Buckley, K. Siegmund, M. Krailo and P. Sorensen.
Originally presented at the Inaugural Association for Molecular Pathology Companion Meeting at the United States and Canadian Academy of Pathology (USCAP) Meeting, Sunday, March 4, 2001, 7:0010:00 PM, at the Atlanta Marriott Marquis Hotel, Atlanta, GA. Moderated by Marc Ladanyi, M.D.
Accepted for publication May 31, 2001.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M R Hussein Genetic pathways to melanoma tumorigenesis J. Clin. Pathol., August 1, 2004; 57(8): 797 - 801. [Abstract] [Full Text] [PDF] |
||||
![]() |
Session 2: Molecular Diagnostics: Molecular Diagnostics: Chairman's Introduction JOHN F. LECHNER, Office of Technology for Diagnostics Systems, Bayer Diagnostics, Berkeley, California 94702 Toxicol Pathol, January 1, 2004; 32(1): 137 - 138. [PDF] |
||||
![]() |
C.-C. Chang and V. B. Shidham Molecular Genetics of Pediatric Soft Tissue Tumors: Clinical Application J. Mol. Diagn., August 1, 2003; 5(3): 143 - 154. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. A.-E. Hussein and G. S. Wood Molecular Aspects of Melanocytic Dysplastic Nevi J. Mol. Diagn., May 1, 2002; 4(2): 71 - 80. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. H. Fischer The Evolution of Tumor Biology: Seeking a Balance between Gene Expression Profiling and Morphology Studies J. Mol. Diagn., February 1, 2002; 4(1): 65 - 65. [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |