We invite text mining teams to develop a system to assist curators in the selection of relevant articles for curation for the Comparative Toxicogenomic Database (CTD).
Given a chemical (input), the system should present the curator with a list of PubMed IDs in ranked order, from more likely to less likely curatable, along with information that will help the curator to assess such ranking.
Therefore for each abstract the system should provide, in a TAB-delimited flat file format, the following information:
- PubMed ID
- Title
- Abstract
- Journal
- Cited Gene Actors (Entries are '|' delimited)
- Cited Chemical Actors (Entries are '|' delimited)
- Cited Disease Actors (Entries are '|' delimited)
- Marked-up HTML of abstract with tagged links back to CTD for all actors and terms (see note below)
- Document Relevancy Score (Normalized 0,non curatable, to 1, curatable)
- *Marked-up HTML of relevant sentences/phrases extracted with tagged links back to CTD for all actors and terms (sentences/phrases are '|' delimited)
- *Cited Action Terms (Entries are '|' delimited)
- *Cited Interactions (Entries are '|' delimited)
Fields preceded by * are optional; contributors unable to provide this information should account for the field's position, but simply leave the individual entry blank. An example of the output format is provided in Downloads at the end of this page.
NOTE: Links to CTD should take the following form for chemical, disease, and gene actors:
http://ctd.mdibl.org/basicQuery.go?bqCat=<gene|chem|disease>&bq=<term>. Please refer to the output file for examples.
To help you with the system development we provide:
The columns for each file are as follows:
- PubMed ID
- Title
- Abstract
- Journal
- Date
- Curatable?
- Number of Interactions
- Curated Interactions (Entries are '|' delimited)
- Curated Gene Actors (Entries are '|' delimited)
- Curated Chemical Actors (Entries are '|' delimited)
- Curated Disease Actors (Entries are '|' delimited)
- Curated Action Terms (Entries are '|' delimited)
- For chemicals:
http://ctd.mdibl.org/downloads/#allchems - For genes:
http://ctd.mdibl.org/downloads/#allgenes - For diseases:
http://ctd.mdibl.org/downloads/#alldiseases - For action terms:
http://ctd.mdibl.org/downloads/#gcixntypes
Systems requirements:
Text Mining System Description:
Each participating team should provide a description of the system. The description should be no longer than 6 pages including figures and word or .rtf formats are preferred.
Important Dates
Item | Deadline | Submit via | Comment |
---|---|---|---|
Team Registration | November 15, 2011 | web | register here |
Release of Test Data | February 06, 2012 | email to twiegers@mdibl.org | Subject:Track1 commitment |
Text Mining System Description | February 20, 2012 | email to twiegers@mdibl.org | Subject:BioCreative-2012 Track I |
Submission of Benchmarking Results | February 20, 2012 | email to twiegers@mdibl.org | Subject:BioCreative-2012 Track I-results |
Interface Available for Testing | March 1, 2012 | email to twiegers@mdibl.org | Subject:BioCreative-2012 Track I-testing |
Return to Homepage |