Protein dataset:
A collection of proteins tagged by their known function annotations, describing molecular functions. The original dataset was taken from the Gene Ontology, providing both a hierarchy between the molecular functions and also a quality controlled list of known function annotations for the proteins. We narrowed the list of molecular functions to be taken into account to the descendants of "catalytic" activity" in the hierarchy. Thus, our dataset is limited to proteins having at least one annotation from the descendants of "catalytic activity", and any tag coming from other parts of the hierarchy is excluded from the lists of annotations.
Reference
G. Tibély et al:
Extracting tag hierarchies
PLoS ONE 8(12): e84133 (2013).
Files:
File name | Description | Format | Size |
---|---|---|---|
List_of_proteins.zip | List of proteins, where each row corresponds to a protein | plain text file 1st. column: protein id rest of the columns: function ids. |
31Mb |
Protein_tag_hierarchy.txt | Tag hierarchy, giving the directed acyclic graph between the tags | plain text file 1st column: source id 2nd column: target id |
52kB |
Protein_function_names.zip | Function names, giving the names of the molecular functions | compressed plain text file 1st column: function id rest of the columns: the name |
356kB |
Protein_dataset.zip | All files in the Protein dataset as a zip archive | 32Mb |
---|
Note:
Each file header contains instructions for processing the data with the
Hierarchy Extracting Algorithms