BioCreative - FAQ and guidelines

Critical Assessment of Information Extraction in Biology - data sets are available from Resources/Corpora and require registration.

BioCreative II.5

FAQ and guidelines [2009-03-29]

General Participation
Corpus and Data Curation
Online Challenge
Offline Challenge
Test Data/Phase
Result Data
Evaluation

General Participation [top]

Which resources are allowed in the challenge?: In principal, any resource available to you can be used to create the annotations. However, you can declare that for any number of your submissions you did not use any existing annotations on the articles (MeSH terms, GOA or SwissProt annotations, etc.) to better simulate the most realistic setting in the context of author-based annotations. Also, a questionnaire will be part of your result submission to determine differences in approaches based on the used resources.
Should participants use SwissProt or the entire UniProt space for the protein mappings?: Obviously, it is much more challenging to perform well on the training set when trying to map against the entire UniProt accession space. Additionally, it is also more realistic to map to that larger set, as newly published articles frequently contain proteins which have so far not been annotated by the SwissProt curators. The distribution of SwissProt/TrEMBL accessions in the test set will be as close as possible to the training set, but there will be notable differences in the difficulty of the accession space you choose, which will be accredited in the evaluation.
Can a team participate in both the online and the offline challenge?: Yes, you can choose to submit results for both challenge classes, and in both cases you can submit up to the the five allotted runs (see Test Data/Phase - "How many different runs may a team submit?").
Can teams choose to participate only in a specific task of the challenge?: You are free to choose to only submit results for a specific task (ACT, INT, IPT) for any of your Annotation Servers or offline runs (in both cases simply by submitting only the relevant results).

Corpus and Data Curation [top]

How are the interaction-relevant articles chosen and which interactions are "curateable"?: The must contain direct, physical protein-protein interactions that have been specifically demonstrated in the article.
Can interaction annotations stem from figures or tables?: Yes. Figures often provide experimental evidence information, while for large interaction experiments tables are used to list annotations. Note although that in the context of this challenge these annotations (Interaction Type, Experimental Method, etc.) are not part of the challenge.
Are there cases where a protein interaction is between two proteins from different organisms (e.g. protein A from mouse and protein B from human)?: Yes, although this is not very common, there are cases.
So what kind of interaction pairs should participants extract?: All unique (in terms of the two UniProt accessions), undirected pairs found in the article which are "curatable" (see first question and section Result Data).

Online Challenge [top]

How will the full-text data be sent to the Annotation Servers?

The data will be transferred via XML-RPC and will be structured just as the plain-text files in the training set. If you want to receive XML data, please first read the next answer first. Following XML-RPC guidelines, this means each article will be sent as a list of structs (dictionaries, hashes, associative arrays). For example, the first four blocks from of article 10.1016/j.febslet.2008.01.064 are (shortened):

FIGURE Immunofluorescence analysis ... cells. Scale bar, 20μm.
QUALIFIER Fig. 1

FIGURE (A) Confirmation of the ... with RP2.
QUALIFIER Fig. 2

FIGURE Identification of affixin-binding ... anti-(His)6 antibody.
QUALIFIER Fig. 3

FIGURE Pulldown assay of active ... using the cell lysates.
QUALIFIER Fig. 4

This would be sent as the first four elements of an array where each element contains the struct with the text, the section name, and an optional qualifier and a number in case the section name and qualifier are not unique in the article (i.e., not the case in this example). Here, the array therefore will be:

[{
	"section": "figure",
	"qualifier": "Fig. 1",
	"content": "Immunofluorescence analysis ... cells. Scale bar, 20μm.",
},
{
	"section": "figure",
	"qualifier": "Fig. 2",
	"content": "(A) Confirmation of the ... with RP2.",
},
{
	"section": "figure",
	"qualifier": "Fig. 3",
	"content": "Identification of affixin-binding ... anti-(His)6 antibody.",
},
{
	"section": "figure",
	"qualifier": "Fig. 4",
	"content": "Pulldown assay of active ... using the cell lysates.",
}]

Why is the data preferably sent as structured text and not as XML?

The problem with the XML format is that if you would want to make your Annotation Server system available later in the BCMS platform, the XML format is of hardly any value: you would have to write parsers for each other format that might be used as input and you would have to think about how to handle plain-text input, too. Given that this challenge's setting is to evaluate author-annotation assistance, having plain-text is the more probable starting ground and it has better long-term value, while you do not have to occupy yourself with disambiguating the unicode symbols in the text - which results a huge performance boost if you get the right UTF-8 symbols in our baseline tests for NER, at least. Therefore, the BCMS platform will provide the full-text as structured data for you to use directly without the need to worry about constantly writing and maintaining parsers for various XML formats and entities or recognizing the text structure in pure plain-text data. Only if there are sufficient teams who insist on using the XML format, this format will be made available solely for the purpose of the challenge - but your Annotation Server would not benefit from the unified text structure and parsing the BCMS platform will offer for you and you will have to "reinvent the wheel" yourself each time another XML format is added. In other words, this measure is intended to make life easier for you and not as much limit you to unreasonable settings, but we obviously can provide the XML if there is are several teams interested in receiving the raw data. However, keep in mind that especially given the benefit of the extra data found in the XML, UTF-8 text resource use will be accredited during evaluation.

How much time do the Annotation Servers have to report results for an article?

Mainly to avoid the possibility of manual annotations in the online challenge and to simulate a real-time setting for generating the annotations, the total time for doing all annotations will be short. The current goal is that the servers report their results within 2 days for all articles; as you will receive about the same number of articles as found in the training set, this means a server will have about 5 minutes per article on average to report the results. Given our previous experience with the BCMS system, where any server was able to report results on MEDLINE abstracts in less than 10 seconds, this should be a generous timeframe.
In summary, this means that you will have the ability to set any timeout you wish per article for your Annotation Server, but after the total time span has passed, no further annotation requests will be sent from the MetaServer to the Annotation Servers and therefore no further results will be accepted. This means, if a particular article takes very long to annotate you can either choose to continue trying to report annotations for it or abort the annotation process for that specific article after a certain time-span has elapsed. The web interface will allow you to retry any articles where the annotation process has failed or skip it and continue with the next article in the set while the online test phase is still running.

Offline Challenge [top]

What will the available input formats/files be?: The files you will receive for the test phase will come in two sets, the articles in UTF-8 plain text and in XML format, in both cases the same format as the *.noSDA.utf8 and *.noSDA.xml files in the training set. I.e., the files will not contain the SDAs, neither in XML nor UTF-8 format.
How will the files be structured or ordered?: All files will be anonymized with a unique identifier (i.e., not their DOI identifier) which will be part of the filename minus the suffix (*.noSDA.utf8 and *.noSDA.xml), i.e., every participant will receive all files (negative and positive, interaction containing, articles) without any indication to which of those two sets (negative/positive article set) the file belongs.
In which (file) format should the results be reported?: As plain-text, tab-separated files, exactly as explained in the evaluation documentation. Other formats are currently not available, but if there is a significant amount of teams which would want use another common format (ieXML, etc.) and a conversion script to the official challenge format can be created in due time, we can arrange for that.
How can participants make sure the result file has the correct format?: You can check your files using the evaluation script. If the script returns an evaluation result and does not abort with an error, the format is valid.

Test Data/Phase [top]

What data and information will the participants receive?: For the online challenge, the participants will be sent the full-text (similar to UTF-8) for all articles (positive, interaction-containing articles, as well as negative articles) one-by-one from the MetaServer with no additional information. In the case of the offline challenge, participants will download the articles which will be grouped in two folders - for UTF-8 and XML formated articles - for which the filename of each article (sans the file suffix, just as in the training set) will act as the article identifier. As with the online challenge, no additional information will be provided beyond that. I.e., you will not know a priori for which articles you should generate normalization and pair annotations (also see Evaluation - "Will normalizations or pairs annotated on negative, non-interaction containing articles be evaluated?" about this issue).
Which interaction types and detection methods will the protein interaction pairs have in the test set?: As defined by the vocabulary, which forms part of the training set data, any type and method listed there might form part of the interaction pairs in the test set. However, note that interaction type and detection method annotations are not part of the challenge evaluation itself.
When will a team be allowed to enter the test phase?: Before entering the actual test phase, any participant will have to complete a complete run using the training data to ensure their systems are working and formats are correct. For offline submission this means you will send us the annotations you generated for the files from the training set you received from us. For the online challenge, you will have a web-interface to active the MetaServer, which will then try to query your Annotation Server and keep you informed about errors and problems. Once you complete the submission of valid annotations of the complete training set, you will be ready for the test phase. Additionally, we will use this data to compare performance changes when moving from the training to the test set.
How many different runs may a team submit?: You can set up five different servers (online) or submit up to five different runs (offline) per team. You will be asked to state fundamental differences between each run you submit (other resources used, XML or UTF-8 articles, MeSH terms, machine learning algorithms, possible ACT/INT/IPT task focus, etc.).
How will the test phase proceed?: After all participants have successfully tested their Annotation Servers with the training set, the online annotation of the test set can be started individually for each participant team from the team's web interface. Teams will have approx. two days (see Online Challenge - "How much time do the Annotation Servers have to report results for an article?") to complete the online test phase, finally filling in a survey. After all teams have completed the online phase, the complete article set will be made available for download and the offline phase will start where teams will have again two more days to submit offline results, again together with a survey. After the offline phase has passed, the whole test phase will end and the evaluation phase of the results by the organizers starts.

Result Data [top]

Will non-unique normalization/pair results be allowed?

No. Pairs and normalizations have to be unique. You can test this assertion by using the evaluation script on plain-text formated results - it will abort with an error if your results are not unique.

Will non-existing UniProt accessions be accepted?

No. For the online evaluation, the MetaServer will raise an error if you try to submit an accession that does not exist (NOTE: this has been disabled for the challenge to be comparable to offline results - any valid string is accepted and you have to make sure it is usable - i.e., just as for the offline evaluation). For the offline evaluation, they will although be "accepted", as they have no impact on the actual evaluation process.

What will the survey for result submission be asking from the teams?

For each active server/submitted run the participants will be asked to fill in a survey in addition to the result data itself to complete the test phase. You will be asked if you used any existing annotations, which mapping you attempted (SwissProt/UniProt), if you used the XML files or not, which resources you used, and a general short description of your approach (e.g., machine learning and NLP methods).

How should the results be ranked?

One question was raised about ranking of the results, a slightly modified quote from Jörg Hakenberg:
Could you please provide us with an exemplary ranking for the PPIs in the training data? In the annotated pairs:

10.1016/j.febslet.2008.01.064	Q9HBI1	Q9ES28
10.1016/j.febslet.2008.01.064	Q9HBI1	Q8K4I3
10.1016/j.febslet.2008.01.064	Q13153	P63001
10.1016/j.febslet.2008.01.064	Q13153	P70766
10.1016/j.febslet.2008.01.064	Q9HBI1	Q9ESD7
10.1016/j.febslet.2008.01.064	Q9HBI1	O55222

which of these 6 is the most important (rank 1), which the least (rank 6).
There is no order in the gold standard. Your evaluation result score will be equal no matter in which order you rank these results. If you want to show that a particular section of consecutive, ranked classifications are of equal importance, report them all with the same confidence score. Only the ranking influences results, i.e. (DOI - Acc1 - Acc2 - Rank - Confidence):

10.1016/j.febslet.2008.01.064	Q9HBI1	Q9ES28	1	0.8
10.1016/j.febslet.2008.01.064	Q9HBI1	Q8K4I3	2	0.8
10.1016/j.febslet.2008.01.064	FALSE_P	FALSE_P	3	0.6
10.1016/j.febslet.2008.01.064	Q13153	P63001	4	0.6
10.1016/j.febslet.2008.01.064	Q13153	P70766	5	0.4
10.1016/j.febslet.2008.01.064	Q9HBI1	Q9ESD7	6	0.4
10.1016/j.febslet.2008.01.064	Q9HBI1	O55222	7	0.1

is exactly equal to, for example:

10.1016/j.febslet.2008.01.064	Q13153	P63001	1	0.8
10.1016/j.febslet.2008.01.064	Q9HBI1	Q9ES28	2	0.8
10.1016/j.febslet.2008.01.064	FALSE_P	FALSE_P	3	0.6
10.1016/j.febslet.2008.01.064	Q9HBI1	Q9ESD7	4	0.6
10.1016/j.febslet.2008.01.064	Q13153	P70766	5	0.4
10.1016/j.febslet.2008.01.064	Q9HBI1	O55222	6	0.4
10.1016/j.febslet.2008.01.064	Q9HBI1	Q8K4I3	7	0.1

If you were to compare these two cases with the evaluation script, you would get the same evaluation scores. The only factor influencing the score is how many true positives you have and how many false positives you do not have in your (top influcencing) ranks.

Evaluation [top]

Which evaluation types will there be?: In principal, we will make a big differentiation between online and offline submissions of results and treat the offline results separately, as the goal is to show the online, real-time capability of text mining in the context of author and database-curator annotation. Additionally, given the settings, it is more realistic to use no existing annotations on the articles (MeSH Terms, GOA, etc.), so we will separate these two groups, too. The most prestigious class is therefore the online challenge without using any existing annotations. Finally, for reasons explained in the section ONLINE CHALLENGE - "Why is the data sent as structured text but not as XML?", it will be more difficult, yet more realistic, to generate the annotations from the UTF-8 articles and we will analyze differences on annotations stemming from the text resource (UTF-8 or XML) and mapping resource (SwissProt or UniProt) use, too.
Will normalizations or pairs annotated on negative, non-interaction containing articles be evaluated?: No. Given the setting, an author would know if the article contains that information or not and database curators would have decided already wether or not to annotate the article, so if you do annotate these articles, these classifications will not be counted as false positives against your total.

Finally, we want to remind you that another good soruce of information is the FAQ for the BioCreative II PPI task.