RSS 2.0
Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative II.5

Homonym homolog mapper [2010-04-12]

This is an installable Python script (for Python versions 2.5-2.7) to extract homonym homologs (HHs) from a UniProt HH cluster file, such as provided together with the BC II.5 Elsevier corpus.

The script allows you to extract HH mappings from such a cluster file in various formats, for example in the format the BC II.5 evaluation library uses for the homonym ortholog mapping option. You can also extract mappings for any other purpose you might need a list of possible [HH] mappings for a given list of UniProt accession, NCBI Tax ID clusters.

Installation and Usage

After downloading and unpacking the compressed file, you will find a README.txt in the created directory with detailed instructions, and in the folder doc/html explications of the input/output formats of this script and its Python API. To install the package on UNIX systems, run the following command in a terminal inside the root folder:

sudo python install
With a given flat-file of UniProt accessions (one per line, let's assume it is called "source_acc_list.txt") for that you would like to extract mappings (e.g., to use them as homonym ortholog input in the BC II.5 evaluation library) and the default cluster file from the BC II.5 Elsevier corpus page, you can now run the script as follows:
bc-hhmap -ue uniprot_15.0_homonym_identity_clusters.tsv source_acc_list.txt
The flag -u tells the script to not group ("ungroup") the found mappings by NCBI Tax IDs and -e tells the script to exclude mappings to the same Tax ID as the source accession. You can see all options the script provides by using the help option (bc-hhmap -h).