Links

Project Partners

Dr Maria Liakata


Principal investigator for the SAPIENTA project.
Maria is Assistant Professor at the University of Warwick. Previously she held an Early Career Fellowship from the Leverhulme Trust (2010-2013) and had a joint affiliation with the Department of Computer Science at Aberystwyth University, UK, and the text mining group at the European Bioinformatics Institute (EMBL-EBI) in Cambridge, where she was hosted for the duration of her fellowship. Her research interests include Computational Linguistics, Biomedical Text Mining, Knowledge Discovery, Machine learning applications for Natural language processing.

Dr Colin Batchelor


Senior Informatics Analyst, Royal Society of Chemistry, Thomas Graham House, Cambridge, UK CB4 0WF.
Role on project: knowledge expert  in chemistry and publishing, summary evaluation.

Dr Simone Teufel


Senior Lecturer, University of Cambridge, Computer Laborarory, Natural Language and Information Processing Group.
Role on project: advisor in natural language processing and especially in argumentative zoning, annotation schemes and text summarisation.

Prof. Sophia Ananiadou


Director of the National Centre for Text Mining (NacTeM), Professor in text mining, University of Manchester. 
Role on project: Advisor on text mining and biolexical resources, provision of annotated data.

Dr Amanda Clare


Lecturer in Computer Science, Department of Computer Science, Aberystwyth University.
Role on project: Advisor on semantic web technologies and machine learning.

Miss Shyamasree Saha


Software engineer, Text mining group, EMBL-EBI
Role on project: Software Engineer

Dr Simon Dobnik


Postdoctoral Research Fellow in Language Technology
Dialogue Technology Lab, Centre for Language Technology and Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg.
Role on project: Collaboration in automatic summary generation and evaluation

Dr Colin Sauze


Postdoctoral Research Associate, Department of Computer Science, Aberystwyth University.
Role on project: Collaboration involving the creation of a semantic wiki that links to SAPIENT, implementation of web-service components of SAPIENT.

Other Links

ART Project


The project that produced the SAPIENT tool for annotation of general scientific papers.

Easily browsable ART corpus


A site for browsing papers in the ART corpus hosted at UKOLN.
Contains the corpus description and the pages can also be downloaded from here.

The ART Corpus


As part of the ART project 265 chemistry papers were manually annotated using core scientific concepts. The resultant corpus contains over 1 million words or 40,000 sentences. For further information and downloading the corpus visit:
http://www.aber.ac.uk/en/cs/research/cb/projects/art/art-corpus/

Please reference the corpus as:
Liakata Maria and Soldatova Larisa. 2009. The ART corpus. Technical report, Aberystwyth University.

All 265 papers (225 + 41 from phase II of corpus development) can be obtained by contacting m.liakata@warwick.ac.uk

Multi-CoreSC CRA corpus (MCCRA)


As part of the SAPIENTA project 50 papers from the domain of Cancer Risk Assessment (CRA) were manually annotated by three biology experts, allowing multiple core scientific concepts per sentences. The corpus and its evaluation is described in our LREC 2016 paper:
Multi-label Annotation in Scientific Articles – The Multi-label Cancer Risk Assessment Corpus

You can download the corpus from here.

Please reference the corpus as:
James Ravenscroft, Maria Liakata, Anika Oellrich, and Shyamasree Saha. Multi-label Annotation in Scientific Articles – The Multi-label Cancer Risk Assessment Corpus. Proceedings of LREC 2016.

Multi-CoreSC Annotation Guidelines


Here you can find the annotation guidelines used by experts to annotate publications with multiple Core Scientific Concepts (CoreSC) per sentence.