Comparison of Named Entity Recognition tools for raw OCR text


This short paper analyses an experimentcomparing the efficacy of several NamedEntity Recognition (NER) tools at extractingentities directly from the output of anoptical character recognition (OCR) work-flow. The authors present how they firstcreated a set of test data, consisting of rawand corrected OCR output manually annotatedwith people, locations, and organizations.They then ran each of the NER toolsagainst both raw and corrected OCR output,comparing the precision, recall, and F1score against the manually annotated data

Empirical Methods in Natural Language Processing