Deep Mining for Information Discovery
DIMRC researches AI deep learning technologies to assist the curation of online information resources. This research supports the NLM Strategic Plan Goal 1: Accelerate discovery and advance health by providing the tools for data-driven research.
Disaster Information Curation
Maintaining up-to-date collections of high-quality, highly-relevant and timely content published on the Web by authoritative sources is labor intensive. This is particularly challenging in the context of disaster health, where information timeliness is of essence. New content is usually announced via social media and other communications channels. In order to find new content, information specialists scan a large number of sources, including a curated list of hundreds of selected content providers on Twitter, RSS feeds and more.
Machine Learning Approach
We are using an artificial intelligence approach based on Bidirectional Encoder Representations from Transformers (BERT) to automatically detect Twitter posts promoting new content of interest. The model uses a pre-trained deep learning network of over 340 million parameters fine-tuned on our data. One advantage of this state-of-the-art approach is that the model does not require a large number of examples to learn the desired behavior.
A tool being developed, DisasterTweet Miner (DTM), is reducing the time and effort required to find new promising document leads in social media.
The posts are shown in descending order of relevance, which in this case we define as the probability that a post links to relevant content (i.e., the post is a “good lead”). The Information specialist can set a probability threshold to automatically discard posts that are not relevant enough. The tool also automatically discards posts that don’t comply with certain key conditions. DTM also has features that enable collecting examples for further fine-tuning the machine learning model.