Article ID Journal Published Year Pages File Type
10356085 Journal of Biomedical Informatics 2012 11 Pages PDF
Abstract
► Annotated documents are necessary for NLP machine learning, modeling and testing. ► We create a method to determine a required sample size for the annotation set. ► The probability of word capture from a corpus provides the basis for the method. ► Dictation letters from a pain management medical practice are used as an example. ► We also demonstrate steps for creating a representative sample of dictations.
Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
,