Biomedical Language processing

Automated information extraction using natural language processing (NLP) tools is crucial for analyzing the overwhelming volume of medical publications, beyond human capacity. A key challenge for NLP is the variability in medical terminology, especially for new diseases or fields. We present an NLP toolbox with extensive English dictionaries of synonyms for SARS-CoV-2 (including variants), compatible with dictionary-based NLP tools. It includes a silver standard corpus generated from these dictionaries and a gold standard corpus of manually annotated PubMed abstracts, covering key medical terms. The toolbox, available on GitHub Code and Zenodo, supports various COVID-19 text analytics tasks, such as creating knowledge graphs and developing text mining tools.

paper

Code