NLP and Data Modeling

Listed below are tools developed for NLP and Data Modeling.

Click on the name in the left column to download or access the tools.

Name Description

iDASH Virtual Machine of Clinical NLP Tools

(Download and install VirtualBox)

The NLP virtual machine (VM) is a downloadable VirtualBox image that has a suite of open-source NLP tools installed which are hard to install and configure by average NLP user (e.g. doctors). This Virtual machine also provides guides to use the installed tools to get users started. The NLP tools included are eHOST, cTAKES, and Automated Retrieval Console (ARC).

Platform: VM is based on Oracle VirtualBox, which is Java based and platform independent.

PhenDisco (Phenotype Discoverer)

PhenDisco provides a more user friendly and robust way to search studies with the phenotypes of interests that are available in dbGaP. PhenDisco supports concept-based searches by expanding search terms to keywords and children concepts using the concept ontology benchmarked from the Unified Medical Language System (UMLS). Its advanced search menu provides easy-to-use structured search options that users can use to perform more focused and precise searches. To facilitate easy review of the search results, PhenDisco also displays the records in the order of relevance and highlights the keywords in the records. Furthermore, users can modify the results display with the metadata of their interests.

Platform: Available as a web service. Written in PHP, Python, Java, and JavaScript.

Tutorial: Click “Download Manual” after login (no registration required).


Citation: Doan S, Lin KW, Conway M, Ohno-Machado L, Hsieh A, Feupe SF, Garland A, Ross MK, Jiang X, Farzaneh S, Walker R, Alipanah N, Zhang J, Xu H, Kim HE., "PhenDisco: Phenotype Discovery System for the Database of Genotypes and Phenotypes (dbGaP)". J Am Med Inform Assoc 2014;21:31-36 doi:10.1136/amiajnl-2013-001882.