Abstract
This chapter presents the computational methods for text analysis and text classification, including both rule-based and machine learning-based methods such as unsupervised and supervised methods.
Chapter PDF
References
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings 20th International Conference on Very Large Data Bases, VLDB (Vol. 1215, pp. 487–499).
Alpaydin, E. (2014). Introduction to Machine Learning. Cambridge, MA: The MIT Press.
Bank, M., & Schierle, M. (2012). A survey of text mining architectures and the UIMA Standard. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012 (pp. 3479–3486).
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Boström, H., & Dalianis, H. (2012). De-identifying health records by means of active learning. In Proceedings of the 29th International Conference on Machine Learning ICML 2012 (pp. 1–3).
Boytcheva, S., Nikolova, I., & Angelova, G. (2017a). Mining association rules from clinical narratives. In Proceedings of Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria (pp. 130–138).
Dalianis, H., & Boström, H. (2012). Releasing a Swedish clinical corpus after removing all words–de-identification experiments with conditional random fields and random forests. In Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) Held in Conjunction with LREC (pp. 45–48).
Ehrentraut, C., Kvist, M., Sparrelid, E., & Dalianis, H. (2014). Detecting healthcare-associated infections in electronic health records: Evaluation of machine learning and preprocessing techniques. In Sixth International Symposium on Semantic Mining in Biomedicine (SMBM 2014). University of Aveiro.
Friedman, C. (2005). Semantic text parsing for patient records. In Medical Informatics (pp. 423–448). Berlin: Springer.
Friedman, C., Johnson, S. B., Forman, B., & Starren, J. (1995). Architectural requirements for a multipurpose natural language processor in the clinical environment. In Proceedings of the Annual Symposium on Computer Application in Medical Care (p. 347). American Medical Informatics Association.
Hanauer, D., Aberdeen, J., Bayer, S., Wellner, B., Clark, C., Zheng, K., & Hirschman, L. (2013). Bootstrapping a de-identification system for narrative patient records: Cost-performance tradeoffs. International Journal of Medical Informatics, 82(9), 821–831.
Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., & Duneld, M. (2014). Synonym extraction and abbreviation expansion with ensembles of semantic spaces. Journal of Biomedical Semantics, 5, 6.
Kholghi, M., Sitbon, L., Zuccon, G., & Nguyen, A. (2015). Active learning: A step towards automating medical concept extraction. Journal of the American Medical Informatics Association, 23(2), 289–296.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings 18th International Conference on Machine Learning (pp. 282–289). Los Altos, CA: Morgan Kaufmann.
Lingren, T., Deleger, L., Molnar, K., Zhai, H., Meinzen-Derr, J., Kaiser, M., et al. (2014). Evaluating the impact of pre-annotation on annotation speed and potential bias: Natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. Journal of the American Medical Informatics Association, 21(3), 406–413.
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).
Olsson, F. (2008). Bootstrapping Named Entity Annotation by Means of Active Machine Learning: A Method for Creating Corpora. PhD thesis, Department of Swedish Language, University of Gothenburg.
Olsson, F. (2009). A Literature Survey of Active Machine Learning in the Context of Natural Language Processing. Technical report, Swedish Institute of Computer Science.
Rosell, M. (2009). Text Clustering Exploration: Swedish Text Representation and Clustering Results Unraveled. PhD thesis, Computer Science and Communications, CSC, KTH.
Sahlgren, M. (2006). The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations Between Words in High-Dimensional Vector Spaces. PhD thesis, Department of Linguistics, Stockholm University.
Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., et al. (2010). Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507–513.
Settles, B. (2009). Active Learning Literature Survey. Computer Sciences Technical report 1648, University of Wisconsin–Madison.
Skeppstedt, M. (2013). Annotating named entities in clinical text by combining pre-annotation and active learning. In ACL (Student Research Workshop) (pp. 74–80).
Skeppstedt, M., Kvist, M., Nilsson, G., & Dalianis, H. (2014). Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. In Journal of Biomedical Informatics, 49, 148–158.
Skeppstedt, M., Paradis, C., & Kerren, A. (2017). PAL, a tool for pre-annotation and active learning. Journal for Language Technology and Computational Linguistics, 31(1), 91–110.
Stumpf, S., Rajaram, V., Li, L., Wong, W.-K., Burnett, M., Dietterich, T., et al. (2009). Interacting meaningfully with machine learning systems: Three experiments. International Journal of Human-Computer Studies, 67(8), 639–662.
Van Rijsbergen, C. J. (1979). Information Retrieval. Butterworth & Co. http://www.dcs.glasgow.ac.uk/Keith/Preface.html. Accessed 11 Jan 2018.
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2018 The Author(s)
About this chapter
Cite this chapter
Dalianis, H. (2018). Computational Methods for Text Analysis and Text Classification. In: Clinical Text Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-78503-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-78503-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78502-8
Online ISBN: 978-3-319-78503-5
eBook Packages: Computer ScienceComputer Science (R0)