HIERTAGS - Protein dataset

Protein dataset:

A collection of proteins tagged by their known function annotations, describing molecular functions. The original dataset was taken from the Gene Ontology, providing both a hierarchy between the molecular functions and also a quality controlled list of known function annotations for the proteins. We narrowed the list of molecular functions to be taken into account to the descendants of "catalytic" activity" in the hierarchy. Thus, our dataset is limited to proteins having at least one annotation from the descendants of "catalytic activity", and any tag coming from other parts of the hierarchy is excluded from the lists of annotations.

Reference

G. Tibély et al: Extracting tag hierarchies PLoS ONE 8(12): e84133 (2013).

Files:

File name	Description	Format	Size
List_of_proteins.zip	List of proteins, where each row corresponds to a protein	plain text file 1st. column: protein id rest of the columns: function ids.	31Mb
Protein_tag_hierarchy.txt	Tag hierarchy, giving the directed acyclic graph between the tags	plain text file 1st column: source id 2nd column: target id	52kB
Protein_function_names.zip	Function names, giving the names of the molecular functions	compressed plain text file 1st column: function id rest of the columns: the name	356kB

Protein_dataset.zip	All files in the Protein dataset as a zip archive	32Mb

Note:

Each file header contains instructions for processing the data with the Hierarchy Extracting Algorithms

Contact

hiertags@hal.elte.hu