| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Special Article |
































From the Pathogenetics Unit,
*
Laboratory of Pathology, National Cancer Institute, Bethesda, Maryland; the Cancer Genome Anatomy Project,
Office of the Director, National Cancer Institute, Bethesda, Maryland; the Laboratory of Pathology,
National Cancer Institute, Bethesda, Maryland; the Departments of Pediatrics and Physiology and Biophysics,
University of Iowa, Iowa City, Iowa; the Laboratory of Integrative and Medical Biophysics,
¶
National Institute of Child Health and Disease, Bethesda, Maryland; Bostwick Laboratories,
||
Richmond, Virginia; the Laboratory of Population Genetics,
**
National Cancer Institute, Bethesda, Maryland; the Genome Sequencing Center,

Washington University, St. Louis, Missouri; the Office of the Director,

National Cancer Institute, Bethesda, Maryland; the National Center for Biotechnology Information,

National Library of Medicine, Bethesda, Maryland; the Integrated Molecular Analysis of Genomes and their Expression Consortium,
¶¶
Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California; the Urologic Oncology Branch,
||||
National Cancer Institute, Bethesda, Maryland; the Genome Sequence Centre,
***
British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada; and the Mathematical and Statistical Computing Laboratory,


Center for Information Technology, National Institutes of Health, Bethesda, Maryland
| Introduction |
|---|
|
|
|---|
| Molecular Profiling |
|---|
|
|
|---|
There are several experimental systems available for molecular profiling, including human cells in vitro and animal models that mimic human pathologies. Each of these approaches has proven to be valuable in past studies and hold excellent potential to produce important discoveries in expression profiling studies. However, in parallel, it is critical that patients be studied directly. Molecular profiles of human cells in vivo, as they exist in patients, may lead to unique insights that are not readily evident in laboratory-based investigations, and are the gold standard against which model systems should be compared.22 Certainly, the ability to peer directly into the molecular anatomy of normal and diseased human cells in their complex tissue milieu is a particularly exciting application of molecular profiling.
However, there are significant technical challenges associated with expression profiling of clinical samples and substantive obstacles that must be addressed. For example, investigators are confronted with the difficulty of procuring specific microscopic cell foci from heterogeneous tissues. Moreover, high-throughput expression studies require recovery of a diverse and complex transcriptome, not a trivial task when using small numbers of cells as template. Although it is exciting in concept, to date there are few experimental data available that support the possibility of this approach. Therefore, a study was designed to answer two key questions. Is molecular profiling of histopathologically defined cell populations from clinical tissue specimens feasible using available technologies and methodologies? If so, what are near-term and long-term applications of global gene expression data sets from patient samples?
Feasibility: Prostate Cancer Study
Molecular profiling studies generate large data sets for analysis,
representing a significant challenge for investigators. Moreover,
clinical studies ideally include multiple samples, such that molecular
findings can be assessed for their frequency among patients and/or
correlated with particular features of a disease. Thus, integration of
clinical information, histopathology, developing technologies and
laboratory methods, and bioinformatics algorithms is essential for
profiling efforts. The present study was performed as part of the
Cancer Genome Anatomy Project (CGAP) of the National Cancer Institute
(NCI).23, 24, 25
CGAP is an interdisciplinary program that
aims to establish the information and technological tools needed to
decipher the molecular anatomy of cancer cells. All data from the
project are immediately made available to the public and can be used
without restriction.
The feasibility of molecular profiling of microdissected cell populations was assessed using cDNA library sequencing as an initial gene expression platform and prostate cancer progression as a disease for study. Sample collection, microdissection, and library production were performed at the NCI (for additional information on the technical features of the study, see "Molecular Profiling of Prostate Cancer" below). The libraries were subsequently arrayed at Lawrence Livermore National Laboratories, and selected clones were sent to the Genome Sequencing Center at Washington University. The sequence data were returned to the National Center for Biotechnology Information where they were filtered and entered into the database of expressed sequence tags (dbEST). The flow of reagents and information essentially followed that initially designed by the Integrated Molecular Analysis of Genomes and their Expression consortium.5
Twelve microdissection-based libraries were produced from epithelial
components of radical prostatectomy or biopsy specimens, including
normal epithelium, premalignant foci, locally invasive cancer, and
metastatic cancer (see Table 1
). A total of 29,183 successful sequences was performed. Analysis of the
number and frequency of genes expressed showed that all of the
libraries exhibited a high level of complexity. The majority of genes
were observed only once or twice in each library, and the overall gene
diversity (number of genes identified/number of sequences analyzed)
averaged 39.1%, which compares favorably with standard libraries
derived from whole tissue specimens or cultured cells. Moreover, a wide
range of expression was seen, from genes observed at high levels
(prostate-specific antigen, ß-microseminoprotein) that are known to
be abundant in prostate epithelium, to a large number of low-abundance
genes that were observed infrequently. Thus, the data clearly
demonstrate the feasibility of recovering complex transcriptomes from
microdissected cell populations, encouraging news for investigators
interested in molecular profiling studies of clinical samples.
|
Prostate-Unique Gene Expression
Comparison of the expression patterns in the prostate libraries
with all of the library sequence information in dbEST permits
identification of genes that are unique to prostate epithelium as well
as those that are expressed at significantly higher levels in
prostate than in other cell types. These genes are of biological
interest, due to their presumed specialized function in the gland, as
well as potentially useful as diagnostic or therapeutic targets. For
example, prostate-specific proteins localized to the cell surface may
serve as targets for antibody-mediated delivery of therapeutic
compounds.26
Alternatively, knowledge of the promoter
regions of prostate-unique genes could have value for virally mediated
gene therapy by restricting transcription to prostate epithelial cells.
For new serum protein markers of cancer, transcripts that are both
highly expressed in tumors and unique to prostate epithelium have the
most potential, because their gene products will be the easiest to
detect and monitor based on levels of abundance. As an example,
prostate-specific antigen is the current standard as a serum marker for
prostate cancer, and its transcript was consistently observed at high
levels in the libraries.
Integration: Genome, Expression, and Disease
Expression profiles of the prostate epithelial libraries can be
integrated with GeneMap99 to examine specific areas of the genome
implicated in cancer. For example, chromosomal arms 1q, 8q, 8p, 13q,
16q, and Xq have been identified as important in prostate tumorigenesis
based on linkage studies or chromosomal abnormalities observed in
tumors.27, 28, 29, 30, 31, 32, 33
The responsible gene at each of these
regions has yet to be identified. The standard approach to finding such
genes involves narrowing the physical size of the candidate interval
using techniques such as meiotic recombination or marker disequilibrium
in affected families, or tumor deletion/amplification in sporadic
cases.34, 35, 36
An adjunct approach is to use expression
patterns to narrow the region, ie, to prioritize the subset of genes
for analysis that map to the minimal search interval and are expressed
in the involved tissue. The MEN1 and PTEN genes
are examples of recently identified tumor suppressor genes that are
found in appropriate libraries (MEN1, NCI CGAP Lu5;
PTEN, NCI CGAP Pr3/Pr22).37, 38, 39
Integration of
cell type-specific gene expression and transcript map location is
likely to become an increasingly valuable approach for disease gene
hunting as molecular profiling databases grow and sequencing and
mapping of all human genes are completed.
cDNA Microarray-Based Profiling
Investigators using expression arrays to study prostate
tumorigenesis can prioritize the prostate epithelial unigene set for
study. This has both short-term and long-term advantages. In the near
term, a practical strategy is to use the prostate unigene set on an
expression array and focus on measuring the genes of moderate or high
abundance whose expression levels change substantially during
tumorigenesis. To facilitate these efforts the prostate expression data
were used to create a commercially available prostate cDNA expression
microarray, which includes a majority of the epithelial unigenes,
including those uniquely expressed in prostate.40
The
major long-term challenge of array-based studies will be quantitative
measurement of small expression changes, particularly for those genes
present at low levels. Refinement of experimental strategies will
likely be required, such as gene-specific primers to prepare cDNA for
analysis and careful selection of sequences used on the array to avoid
cross-hybridization. Efforts to design such custom arrays will be
facilitated by a successive reduction in the number of genes required
for analysis, beginning with prioritization of the relevant unigene set
and eventually reducing to the specific set of genes that mediate the
pathways and processes under study.
Single Nucleotide Polymorphisms (SNPs)
The genetic variation in genes that are found to be important in
prostate cancer can be determined through the Genetic Annotation
Initiative (GAI) section of the CGAP website. The GAI focuses on
identifying SNPs in genes expressed in cancers.25, 41
Analysis of the frequency and transmission of SNPs can be used for many
genetic studies, including traditional linkage mapping and dissection
of complex pathways. Gene-specific SNPs are also valuable polymorphic
markers for finely mapping regions of allelic loss in tumor loss of
heterozygosity studies. The GAI identifies candidate SNPs through an
analytical software package called SNPpipeline and then verifies
the variation by sequencing DNA from several individuals. To date, more
than 10,000 candidate SNPs have been identified. To make the
information easy to access, all SNP data are placed on an integrated
genetic/physical SNP map available through the GAI website.
Differential Gene Expression
An important use of molecular profiling data sets is to compare
and contrast the expression profiles that occur during evolution of a
disease process. Thus, we analyzed the sequence data from the normal
epithelial, premalignant, and invasive tumor libraries using a variety
of statistical methods and identified the genes that were
differentially expressed during tumor progression. The transcripts
that showed the largest change between normal and tumor cells were a
subset of mRNAs that encode for ribosomal proteins. This finding is
expected in cancer cells due to their requirement for increased protein
synthesis for cell division.12
Interestingly, though,
these ribosomal protein mRNAs were not increased in libraries from
premalignant cells that showed expression levels similar to normal
epithelium. This finding is at odds with most current thinking, which
presumes that premalignant foci develop due to a marked increase in
growth rate, with subsequent transition to cancer primarily involving
acquisition of an invasive phenotype. Based on the present gene
expression data set, one can propose two alternative hypotheses for
testing. First, premalignant cells do not proliferate at a rate near
that of invasive tumor cells, and fundamental alterations in oncogene
and/or tumor suppressor gene pathways that substantially increase the
rate of cell division are still required for their progression to
cancer. Second, a decreased rate of apoptosis is an important early
event in prostate tumor progression; ie, it is a decreased rate of cell
death, as opposed to an increase in cell division, that mediates the
development of premalignant foci.
In addition to expected findings such as increased ribosomal protein
transcripts in cancer, several unanticipated discoveries were made,
including both quantitative and qualitative alterations in gene
expression. For example, the transcript for T cell receptor
was
found in normal and cancerous prostate epithelium, and observed at
statistically elevated levels in cancer libraries. The presence of T
cell receptor
mRNA in prostate epithelium and the high level of
expression in tumor cells is both surprising and puzzling. A second
example was detection of a novel splice variant of PB39 transcript in a
library derived from premalignant cells. PB39 mRNA was previously
reported to be overexpressed in prostate cancer, but was not known to
exist in an alternative splice form.42
Interestingly,
based on a search of all cDNA libraries and sequences in dbEST the
novel splice variant is primarily expressed in fetal tissues and tumors
and thus may be associated with the loss of cellular differentiation
that occurs during prostate tumor progression.43
Additionally, PHDhtm and SignalP computer-based analysis of the
predicted amino acid sequence of PB39 indicates the N-terminus contains
a secretory signal peptide sequence for a secreted protein. Thus, the
protein product of the alternative splice form could potentially serve
as a serum marker of early prostate cancer development.
Certainly, the significance of ribosomal protein mRNAs,
T cell receptor
mRNA, and PB39 splice variant mRNA
in prostate tumors and premalignant lesions remains to be determined in
follow-up studies. However, the larger implication of these findings is
immediately clear. There is much yet to be learned with respect to gene
expression profiles in complex human tissues. Thus, exploratory studies
using developing expression technologies and the information provided
by the Human Genome Project are likely to have a unique and important
role in the study of normal cell physiology and the development of
diseases.17
In this regard, the present study is
encouraging and indicates molecular profiling of clinical tissue
specimens is a feasible and promising experimental approach.
| Molecular Profiling of Prostate Cancer |
|---|
|
|
|---|
Tissue Acquisition
The goal of molecular profiling of human tissue specimens is to
measure global gene expression levels as they exist in cells in
patients. In the present study the libraries were created from tissues
that had been surgically removed; thus, it is possible that alterations
in gene expression profiles occurred during or after the resection, eg,
transcription of new genes due to environmental stress or loss of
transcripts during tissue handling. This is an important issue that
needs to be addressed experimentally in the future by comparing
molecular profiles of needle biopsy samples (immediate removal and
freezing) with surgically resected samples of the same tissue type. If
molecular alterations are shown to occur in surgical specimens, then
two potential scenarios arise that will affect how samples should be
acquired for future molecular profiling studies. In the first scenario,
the induced changes are minimal and occur reproducibly, and thus can be
predicted and factored into subsequent data analyses. In this case
surgically resected samples will be useful templates for study as long
as they are appropriately acquired and processed. In the second, the
induced changes are substantial and cannot reliably be predicted. In
this case, future molecular profiling efforts will need to use biopsy
or cytology samples as templates, and/or will need to be performed like
intraoperative diagnostic frozen section analysis; ie, at the outset of
the operation the surgeon will need to procure and immediately freeze
several small tissue samples for molecular profiling studies.
Microdissection
Cells were procured by either manual microdissection or the
initial prototype laser capture microdissection
instrument.44, 45
Based on careful histopathological review
of the tissue sections, it is estimated each sample contained >90% of
desired cells. Newer laser-based dissection systems and associated
methodologies currently allow for dissections approaching 100%
purity.46, 47, 48
Following are some technical observations
made during the course of the study. Rapid dehydration of cryostat
sections is important to inactivate endogenous RNases. Staining with
hematoxylin and eosin allows microscopic visualization during
microdissection and does not significantly diminish mRNA recovery.
Approximately 5000 microdissected cells are required to produce a
library with acceptable numbers of recombinants (>100,000) and gene
diversity (>20%).
Library Preparation and Characteristics
Detailed protocols for all of the 156 CGAP libraries are indicated
on the web page. Each of the 12 prostate libraries in the present study
was made using microdissection library protocol no.
1.49, 50
Evaluation of the library sequence data showed two
important characteristics that impact on the overall utility of a
microdissection-based approach. First, the clone insert size averaged
only 500 to 600 bp in length due to the fragmented mRNA recovered from
tissue samples. Technical attempts to increase the insert size were not
considered a high priority, because the libraries were intended solely
for gene profiling and not as templates for full-length gene cloning.
Second, the number of recombinants ranged from approximately 100,000 to
200,000 per library, substantially less than in traditional libraries.
Additional PCR cycles of cDNA could increase the number of recombinants
significantly; however, because the libraries were prepared for
expressed sequence tag (EST) analysis as opposed to traditional
screening, the number of PCR cycles was limited to 10 to minimize
amplification bias.
Assessment of Library Quality
Measurement of one or a few genes from small numbers of cells
using RT-PCR is relatively straightforward to perform. However, global
expression profiling studies are significantly more challenging,
because the recovered mRNA and subsequent cDNA must contain a complex
set of genes reflective of the native abundance of the transcript
population. Gene diversity (number of genes observed/number of
sequences) was used as the measure of cDNA library quality and was
determined by sequencing a minimum of 500 randomly selected clones per
library. This was sufficient to provide a statistically reliable
indicator of complexity and was a useful tool that provided a rigorous
measure of library diversity. Additionally, the expression frequency of
all individual genes observed was calculated to determine relative
levels of abundance.
Informatics Analysis
Several analysis tools and all of the present prostate data
are provided on the CGAP website (www.ncbi.nlm.nih.gov/ncicgap/)
to allow statistical comparison of gene expression profiles in the
libraries. For additional information, relevant website links
include:
NCBI, www.ncbi.nlm.nih.gov/
Unigene, www.ncbi.nlm.nih.gov/UniGene/index.html
Library Browser, www.ncbi.nlm.nih.gov/UniGene/lbrowse. cgi?ORG=Hs
GeneMap99, www.ncbi.nlm.nih.gov/genemap/
CGAP GAI www.lpg.nci.nih.gov/GAI/
dbEST, www.ncbi.nlm.nih.gov/dbEST/index.html
Genes and Diseases, www.ncbi.nlm.nih.gov/disease/
CGAP Update, www.nih.gov/news/pr/aug99/nci-10a.htm
The dbEST and Unigene sites are continually updated. Investigators should query the data sets using the latest Unigene build for the most up-to-date information. As with all projects using EST data, one must use caution in interpreting results, and candidate genes of interest should be subjected to rigorous follow-up analysis.
Prostate-Unique and Prostate-Specific Genes
CGAP website tools were created to be capable of generating
two different classifications. Prostate-unique genes are those
that have been observed only in libraries derived from prostate and are
precalculated on the website (query "prostate" under the "Summary
Tables of Libraries, Genes and Sequences" section). Prostate-specific
genes include those expressed at statistically elevated levels in
prostate epithelial libraries compared to libraries from other cell and
tissue types. These can be determined using the Digital Differential
Display tool.
(Prostate-unique genes are included in this category based solely on detection in cDNA libraries used as part of EST projects. A subset of these genes may have been observed in non-prostate tissue in other studies. It is anticipated that with additional EST sequencing some of these genes will shift to the "prostate-specific" category or will drop out of both classifications.)
cDNA Microarray-Based Studies
We have observed two noteworthy features of expression array
studies. First, intense artifactual hybridization signals can be
problematic for 3' cDNA clone-based arrays due to hybridization to
polyA sequences and repetitive DNA elements, ie, samples can appear to
hybridize successfully to a large number of genes on an array when in
fact the majority of signal is artifactual. Thus, one must be careful
in evaluating the apparent success of methods to prepare array samples
from small numbers of microdissected cells, and must also be cautious
in using array results to construct a unigene set of expressed genes
from a given cell type. Secondly, individual transcripts often
hybridize strongly to at least a few additional DNAs on arrays
besides the intended DNA and hybridize less strongly to many DNAs. This
cross-hybridization reduces the overall sensitivity of arrays to
detect expression changes, and importantly, must be carefully
considered when using gene cluster algorithms to analyze array results.
Differential Gene Expression
This effort was considered a lesser priority goal of the project,
since it was thought that relatively few statistically valid
differences in gene expression could be determined based on the amount
of sequencing planned for the study. In fact, analysis of the library
data proved this to be the case. Even with completion of nearly 30,000
total sequences, the majority of epithelial genes were not expressed at
sufficient levels or in enough libraries to permit a reliable
statistical assessment of differential expression. However, the gene
distribution profile in the libraries indicates that comparison of
expression levels of a significant fraction of the prostate epithelial
unigene set could be achieved by using substantially greater sequencing
depth.
Initially, it was presumed that contamination of lymphocytes during the
dissection step was responsible for the presence of T cell receptor
transcript in the prostate libraries.51
However,
epithelial localization was confirmed by in situ
hybridization studies of tissue sections and appears selective for the
chain, as other components of the T cell receptor were not observed
in the prostate libraries.52
| Footnotes |
|---|
The work of C. P. was supported by the U.S. Department of Energy under contract W-7405-Eng-43.
Primary contributors to this article were M. E.-B., R. S., and D. K.
Accepted for publication February 18, 2000.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. H. Cho-Vega, S. Tsavachidis, K.-A. Do, J. Nakagawa, L. J. Medeiros, and T. J. McDonnell Dicarbonyl/L-Xylulose Reductase: A Potential Biomarker Identified by Laser-Capture Microdissection-Micro Serial Analysis of Gene Expression of Human Prostate Adenocarcinoma Cancer Epidemiol. Biomarkers Prev., December 1, 2007; 16(12): 2615 - 2622. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Stenman, A. Paju, O. Rissanen, T. Tenkanen, C. Haglund, J. Rasanen, J. Salo, U.-H. Stenman, and A. Orpana Targeted Gene-Expression Analysis by Genome-Controlled Reverse Transcription-PCR Clin. Chem., November 1, 2006; 52(11): 1988 - 1996. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Celis, P. Gromov, I. Gromova, J. M. A. Moreira, T. Cabezon, N. Ambartsumian, M. Grigorian, E. Lukanidin, P. thor Straten, P. Guldberg, et al. Integrating Proteomic and Functional Genomic Technologies in Discovery-driven Translational Breast Cancer Research Mol. Cell. Proteomics, June 1, 2003; 2(6): 369 - 377. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. H. Cazares, B.-L. Adam, M. D. Ward, S. Nasim, P. F. Schellhammer, O. J. Semmes, and G. L. Wright Jr. Normal, Benign, Preneoplastic, and Malignant Prostate Cells Have Distinct Protein Expression Profiles Resolved by Surface Enhanced Laser Desorption/Ionization Mass Spectrometry Clin. Cancer Res., August 1, 2002; 8(8): 2541 - 2552. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Todd and D.T.W. Wong DNA Hybridization Arrays for Gene Expression Analysis of Human Oral Cancer J. Dent. Res., February 1, 2002; 81(2): 89 - 97. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ladanyi, W. C. Chan, T. J. Triche, and W. L. Gerald Expression Profiling of Human Tumors: The End of Surgical Pathology? J. Mol. Diagn., August 1, 2001; 3(3): 92 - 97. [Full Text] [PDF] |
||||
![]() |
S. M. Albelda and D. Sheppard Functional Genomics and Expression Profiling . Be There or Be Square Am. J. Respir. Cell Mol. Biol., September 1, 2000; 23(3): 265 - 269. [Full Text] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |