BioCaster project - An Introduction
BioCaster is a research project aimed at providing advanced search and analysis of Internet news and research literature for public health workers, clinicians and researchers interested in communicable diseases. The portal is currently under development at the National Institute of Informatics by Dr. Nigel Collier with the cooperation of colleagues at the National Institute of Infectious Diseases, National Institute of Genetics, Okayama University, Vietnam National University at Ho Chi Minh City and Kasetsart University. Based on text mining technology we aim to provide intelligent tools to help users obtain a clearer picture about actual and potential disease outbreaks in a timely manner.
Detecting and tracking infectious disease outbreaks involves having access to information from a variety of sources. Increasingly this means monitoring many hundreds of Internet news feeds simultaneously. However three difficulties exist in finding information using traditional search methods: firstly the massive volume of dynamically changing unstructured news data available on the Internet makes it extremely difficult for governments and public health workers to obtain a clear picture of the outbreak. Secondly, the initial reports of an outbreak are contained in only a few news articles which will usually be overlooked using simple keyword indexing methods. Thirdly, the initial reports of an infectious disease will usually be reported in local none-English news media. In order to capture outbreak information in the most timely manner it is therefore crucial for computer systems to have an understandings g of several languages. Currently BioCaster is developing language modules for Japanese, Vietnam and Thai and we hope in the future to expand this capability.
The BioCaster system has two major components: a web/database server and a backend cluster computer equipped with text mining technology which continuously scans hundreds of RSS news feeds from local and national news providers. Since the text mining system has a detailed knowledge about the important concepts such as diseases, pathogens, symptoms, people, places, drugs etc. This allows us to semantically index relevant parts of news articles, enabling users to have quicker and highly precise access to information. The knowledge we use comes from annotated text collections, gazetteer lists of nomenclature and the BioCaster ontology.
The BioCaster Ontology is one of the main components in the BioCaster project. It includes an ontology of 50 infectious diseases and a geographical ontology (243 countries and 4,025 sub-countries). The infectious disease ontology was built by a team consisting of a linguist, an epidemiologist, a geneticist, and a medical anthropologist. The ontology is multilingual, supporting six languages (English, Japanese, Vietnamese, Thai, Korean, and Chinese); and has links to external ontologies (such as MeSH, SNOMED and ICD9/10) and resources (like Wikipedia).
For further information about the BioCaster project and BioCaster ontology please see at http://biocaster.nii.ac.jp