medp_pilpe-20110620 ------------------- stats.txt: Summary statistics associated with contigs.fa. In- cludes the total number of sequences and bases in the contig set, N50, etc. Q1, Q2, Q3 are the quartiles of the reported contig lengths. B1000 and B1000 indicate the percentage of bases in- volved in contigs at least 1000 bp and 2000 bp, respectively. -- contigs.fa: Contigs from the assembly, min. 100 bp. Possibly in- cludes UTRs. Sequences may contain IUPAC ambiguity codes repre- senting ambiguous bases, http://www.bioinformatics.org/sms/iu- pac.html. -- peptides.fa: Protein products predicted by ESTScan, min. 30 aa. These do not necessarily include initial methionine. Sequence identifiers for these predicted products correspond to the asso- ciated nucleotide sequence in contig.fa, and are provided suffix- es #1, #2, etc., to accommodate multiple predictions. -- readcounts/*.dat: Read counts obtained by post hoc alignment of reads using BWA to reported contigs, per sample, via gsnap with default parameters. Tab-delimited columns with the format sample contig_id all_aligned unique_aligned paired_aligned contig_len where sample indicates the sample, library, tissue, etc.; con- tig_id is the contig identifier, for example, medp_pilpe-20110620|1234); all_aligned is the number of reads aligned to this contig; unique_aligned is the number of reads that aligned uniquely to this contig; and paired_aligned is the number of pairs aligned to this contig; contig_len is the length of the contig in bp. ***PLEASE NOTE*** Read counts are provided for quality assessment of the contig set only. For differential expression analyses, it is recommended more sophisticated estimators of relative expression level be em- ployed. ------------------------------------ National Center for Genome Resources http://www.ncgr.org