Chromatin modifications and their function
The surface of nucleosomes is studded with a multiplicity of modifications. At least eight different classes have been characterized to date and many different sites have been identified for each class. Operationally, modifications function either by disrupting chromatin contacts or by affecting the recruitment of nonhistone proteins to chromatin. Their presence on histones can dictate the higher-order chromatin structure in which DNA is packaged and can orchestrate the ordered recruitment of enzyme complexes to manipulate DNA. In this way, histone modifications have the potential to influence many fundamental biological processes, some of which may be epigenetically inherited.
Transformation of intact yeast cells treated with alkali cations
Intact yeast cells treated with alkali cations took up plasmid DNA. Li+, Cs+, Rb+, K+, and Na+ were effective in inducing competence. Conditions for the transformation of Saccharomyces cerevisiae D13-1A with plasmid YRp7 were studied in detail with CsCl. The optimum incubation time was 1 h, and the optimum cell concentration was 5 x 10(7) cells per ml. The optimum concentration of Cs+ was 1.0 M. Transformation efficiency increased with increasing concentrations of plasmid DNA.
Polyethylene glycol was absolutely required. Heat pulse and various polyamines or basic proteins stimulated the uptake of plasmid DNA. Besides circular DNA, linear plasmid DNA was also taken up by Cs+-treated yeast cells, although the uptake efficiency was considerably reduced. The transformation efficiency with Cs+ or Li+ was comparable with that of conventional protoplast methods for a plasmid containing ars1, although not for plasmids containing a 2 microns origin replication.
Improved tools for biological sequence comparison
We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition.
The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a “graphic matrix” plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.
Recombinant genomes which express chloramphenicol acetyltransferase in mammalian cells
We constructed a series of recombinant genomes which directed expression of the enzyme chloramphenicol acetyltransferase (CAT) in mammalian cells. The prototype recombinant in this series, pSV2-cat, consisted of the beta-lactamase gene and origin of replication from pBR322 coupled to a simian virus 40 (SV40) early transcription region into which CAT coding sequences were inserted.
Readily measured levels of CAT accumulated within 48 h after the introduction of pSV2-cat DNA into African green monkey kidney CV-1 cells. Because endogenous CAT activity is not present in CV-1 or other mammalian cells, and because rapid, sensitive assays for CAT activity are available, these recombinants provided a uniquely convenient system for monitoring the expression of foreign DNAs in tissue culture cells. To demonstrate the usefulness of this system, we constructed derivatives of pSV2-cat from which part or all of the SV40 promoter region was removed.
Deletion of one copy of the 72-base-pair repeat sequence in the SV40 promoter caused no significant decrease in CAT synthesis in monkey kidney CV-1 cells; however, an additional deletion of 50 base pairs from the second copy of the repeats reduced CAT synthesis to 11% of its level in the wild type. We also constructed a recombinant, pSV0-cat, in which the entire SV40 promoter region was removed and a unique HindIII site was substituted for the insertion of other promoter sequences.
BLAT–the BLAST-like alignment tool
Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. BLAT’s speed stems from an index of all nonoverlapping K-mers in the genome. This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly. BLAT has several major stages. It uses the index to find regions in the genome likely to be homologous to the query sequence.
It performs an alignment between homologous regions. It stitches together these aligned regions (often exons) into larger alignments (typically genes). Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible. This paper describes how BLAT was optimized. Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches. BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications. http://genome.ucsc.edu hosts a web-based BLAT server for the human genome.
Comprehensive molecular portraits of human breast tumours
We analysed primary breast cancers by genomic DNA copy number arrays, DNA methylation, exome sequencing, messenger RNA arrays, microRNA sequencing and reverse-phase protein arrays. Our ability to integrate information across platforms provided key insights into previously defined gene expression subtypes and demonstrated the existence of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular heterogeneity.
Somatic mutations in only three genes (TP53, PIK3CA and GATA3) occurred at >10% incidence across all breast cancers; however, there were numerous subtype-associated and novel gene mutations including the enrichment of specific mutations in GATA3, PIK3CA and MAP3K1 with the luminal A subtype. We identified two novel protein-expression-defined subgroups, possibly produced by stromal/microenvironmental elements, and integrated analyses identified specific signalling pathways dominant in each molecular subtype including a HER2/phosphorylated HER2/EGFR/phosphorylated EGFR signature within the HER2-enriched expression subtype.
Comparison of basal-like breast tumours with high-grade serous ovarian tumours showed many molecular commonalities, indicating a related aetiology and similar therapeutic opportunities. The biological finding of the four main breast cancer subtypes caused by different subsets of genetic and epigenetic abnormalities raises the hypothesis that much of the clinically observable plasticity and heterogeneity occurs within, and not across, these major biological subtypes of breast cancer.