Protein dataset:
A collection of proteins tagged by their known function annotations, describing molecular functions. The original dataset was taken from the Gene Ontology, providing both a hierarchy between the molecular functions and also a quality controlled list of known function annotations for the proteins. We narrowed the list of molecular functions to be taken into account to the descendants of "catalytic" activity" in the hierarchy. Thus, our dataset is limited to proteins having at least one annotation from the descendants of "catalytic activity", and any tag coming from other parts of the hierarchy is excluded from the lists of annotations.
Reference
G. Tibély et al: Extracting tag hierarchies PLoS ONE 8(12): e84133 (2013).
Files:
File name Description Format Size
List_of_proteins.zip List of proteins, where each row corresponds to a protein plain text file
1st. column: protein id
rest of the columns: function ids.
31Mb
Protein_tag_hierarchy.txt Tag hierarchy, giving the directed acyclic graph between the tags plain text file
1st column: source id
2nd column: target id
52kB
Protein_function_names.zip Function names, giving the names of the molecular functions compressed plain text file
1st column: function id
rest of the columns: the name
356kB
Protein_dataset.zip All files in the Protein dataset as a zip archive 32Mb
Note:
Each file header contains instructions for processing the data with the Hierarchy Extracting Algorithms

Contact
hiertags@hal.elte.hu