CoreSC automation and Extractive summaries
In the SAPIENT Automation project we have also evaluated the CoreSC scheme and the ART/CoreSC corpus by incorporating machine learning algorithms into SAPIENT and automating the generation of core scientific concepts. The result system, SAPIENTA, has been trained and tested on the ART corpus and has also been employed to annotate biology papers from Pubmed Central. Classifier performance on the ART/CoreSC corpus has over 50% average accuracy for all 11 CoreSC concepts. The best performing categories are Experiment, Background and Model with a respective F-score of 76%, 62% and 53%. [Publication to follow soon].
We have also evaluated the usefulness of automated CoreSCs for real users, by using them to create extractive summaries, evaluated in a question answering task. Questions involved the content of the paper, were set by a chemistry expert and was evaluated by 12 more experts. We evaluated to what extent experts could answer questions correctly by only reading the summaries. Automated extractive summaries using CoreSCs had 75% precision and 65% recall. [Publication to follow soon].
Comments (0)