=========================================== Description of Oliverarray Annotations =========================================== Expression data presented here represents at least two hybridized slides per species, with RNA from male and female whole flies labelled with either Cy3 or Cy5. Analysis pipeline: ----------------- Arrays are scanned and the feature reporst are generated using NimbleScan. There reports are loaded into Bioconductor. Variation by channel is minimized using loess normalization across the entire slide using the limma package. No background subtraction is performed. Between-slide quantile normalization by platform is also performed. Channel specific background intensity is calculated by taking the mean intensity values for probes targeting negative control elements - probes targeting an Arabidopsis gene AF048702, for which no transcript was added to the sample, for each respective channel. To determine if there was signal detected for each probe, and hence a putative transcript present, median probeset intensity was compared to background. Percentile ranks of the median intensities of all probesets was calculated, and probes with a percentile rank greater than background in either channel, in each hybridization, were considered above background. Background values were in the 20-25th percentile. The maximum percentile rank in any hybridization for each probe is reported in the annotation file. Mapping probes to assembly: -------------------------- Arrays were designed using sequence versions available in April, 2005. Probe sequences were re-mapped to current (CAF1) sequence using standalone BLAT v.25x1. A "MapScore" was calculated for each mapping as: (matches - mismatches - gap bases). Since these are 50-mers, the maximum score is 50. Probes with minimum MapScore of 45 were kept. There may be multiple hits in the assembly per probe. All hits equal to the maximum MapScore are kept. Probes with > 10 hits were removed. File names: ------------------------------------------------------ The following files contain results for each species: ana.caf1.gff D. ananassae moj.caf1.gff D. mojavensis pse.caf1.gff D. pseudoobscura pserec.caf1.gff D. pseudoobscura (reconciled assembly) sim.caf1.gff D. simulans vir.caf1.gff D. virilis yak.caf1.gff D. yakuba yakrec.caf1.gff D. yakuba (reconciled assembly) Field descriptions: The name of the reference sequence (chromosome or scaffold) The source of this annoptation: "Oliverarray" The feature type name: "mRNA" , Coordinates of best match for this probe Maximum percentile rank of intensity for this probe (max=100). Higher numbers are stronger signals. '+', '-' Not used... "." [attribute] Probe ID given, along with score for the probe re-mapping (range 45..50) =========================================== Design overview for multiple species arrays: =========================================== Single-slide expression arrays - 380K spot slides from Nimblegen. Each slide has probes for ~19K gene predictions, with 10 probes per gene in duplicate (19K x 10 x 2 = 380K) Probes are not 3' biased - they are randomly distributed along the length of the gene. Sources of gene predictions: EIS: Annotations from Eisen lab (v.1.0, Feb. 2005). Provided by Venky Iyer. GID: ab-initio gene predictions on raw sequence. All gene predictions not overlapping with Eisen annotations are kept. GBK: Individual sequences deposited in GenBank, including ESTs. Probes were also designed based on reverse complements for ESTs. FBA: Annotation from flybase (D.pseudoobscura only - release 1.03) Total gene predictions for each type: sim yak ana moj vir pse EIS 13561 12675 12265 9817 11316 12395 GID 1919 5870 8248 7540 5582 2549 GBK 843 1273 4 0 92 150 FBA 0 0 0 0 0 651 total 16323 19818 20517 17357 16990 15745 Each sequence screened for uniqueness, and probes selected at N=10 density (density is lower for some smaller genes). ================================================================= Updated May. 24, 2006 Dave Sturgill, Yu Zhang, Michael Parisi, and Brian Oliver at the Laboratory of Cellular and Developmental Biology, NIDDK, NIH. These results are unpublished. Please contact us at davidsturgill@niddk.nih.gov if you are interested in using these data.