=========================
Allen Human Brain Atlas
=========================

This data set contains all RNA-Seq samples for all genes in 1 brain.

RNAseqTPM.csv

    Contains TPM values arranged by (row, column) = (genes, samples).  
	These expression values are scaled fragment (read) counts derived 
	using RSEM.  The first column is the gene symbol.

RNAseqCounts.csv 

    Contains fragment counts arranged by (row, column) = (genes, samples).  
	Counts can be fractional, as ambiguous reads are distributed between
	relevant transcripts.  For present/absent calling, a value of 0 indicates
	that no transcript was seen.  The first column is the gene symbol.

Genes.csv

    Metadata for the genes in RNAseqTPM.csv, including information about
    gene identification, gene location, and gene size.	
	
	gene_symbol
		Unique identifier used by RSEM to summarize expression levels  
	
	gene_id, entrez_id
		Additional identifiers for mapping genes to other Allen Institute
		data sets.  NA indicates that this gene symbol is not used elsewhere.
	
	chromosome, strand, median_gene_start, median_gene_end
		Information about gene location in the genome.
	
	number_of_transcripts, median_transcriptome_length, median_genome_length, 
	median_number_of_exons	
		Transciptome length indicates the number of base pairs of the RNA 
		transcript (i.e., this excludes introns).  Genome length indicates the
		number of base pairs between the gene start site and the gene end site
		in the genome.  For single-exon genes, these values are the same.  For 
		genes with multiple isoforms, median values across isoforms are used. 

Ontology.csv

    The ontology on brain structures used for sampling.

SampleAnnot.csv

    The samples are listed in the same order as the columns in 
	RNAseqTPM.csv.  Information for the corresponding microarray sample 
	can be found by matching the "well_id" column in this table 
	with the "well_id" column in the corresponding microarray data table.

	RNAseq_sample_name, replicate_sample
	   unique identifier of the RNA profiled for RNA-Seq, and whether 
	   this is a replicate sample compared with microarray RNA

    sample_name, well_id
       unique identifier of the well containing the profiled RNA
	
    microarray_run_id, rnaseq_run_id
       used to assess batch effects for microarray and RNA-Seq, respectively
    
	ontology_color, main_structure, sub_structure, hemisphere, brain
	   additional biological information about the sample to supplement
	   information from corresponding microarray data table
	
	ontology_structure_id, ontology_structure_acronym
	   additional biological information for mapping to the Human Brain Atlas ontology
    
    million_clusters, A.Pct, C.Pct, G.Pct, T.Pct, N.Pct
	   information about the number of reads and the sequence composition

	clip_percentage, RIN_RNA_quality
	   measures of data quality for the sequencing run and the RNA
