Assessing the Heterogeneity of Complaints Related to Tinnitus and Hyperacusis from an Unsupervised Machine Learning Approach: An Exploratory Study

Palacios, Guillaume; Noreña, Arnaud; Londero, Alain

doi:10.1159/000504741

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Assessing the Heterogeneity of Complaints Related to Tinnitus and Hyperacusis from an Unsupervised Machine Learning Approach: An Exploratory Study
Citation	Palacios G, Noreña A, Londero A. Audiol. Neurootol. 2020; 25(4): 174-189.
Copyright	(Copyright © 2020, Karger Publishers)
DOI	10.1159/000504741
PMID	32062654
Abstract	INTRODUCTION: Subjective tinnitus (ST) and hyperacusis (HA) are common auditory symptoms that may become incapacitating in a subgroup of patients who thereby seek medical advice. Both conditions can result from many different mechanisms, and as a consequence, patients may report a vast repertoire of associated symptoms and comorbidities that can reduce dramatically the quality of life and even lead to suicide attempts in the most severe cases. The present exploratory study is aimed at investigating patients' symptoms and complaints using an in-depth statistical analysis of patients' natural narratives in a real-life environment in which, thanks to the anonymization of contributions and the peer-to-peer interaction, it is supposed that the wording used is totally free of any self-limitation and self-censorship. METHODS: We applied a purely statistical, non-supervised machine learning approach to the analysis of patients' verbatim exchanged on an Internet forum. After automated data extraction, the dataset has been preprocessed in order to make it suitable for statistical analysis. We used a variant of the Latent Dirichlet Allocation (LDA) algorithm to reveal clusters of symptoms and complaints of HA patients (topics). The probability of distribution of words within a topic uniquely characterizes it. The convergence of the log-likelihood of the LDA-model has been reached after 2,000 iterations. Several statistical parameters have been tested for topic modeling and word relevance factor within each topic. RESULTS: Despite a rather small dataset, this exploratory study demonstrates that patients' free speeches available on the Internet constitute a valuable material for machine learning and statistical analysis aimed at categorizing ST/HA complaints. The LDA model with K = 15 topics seems to be the most relevant in terms of relative weights and correlations with the capability to individualizing subgroups of patients displaying specific characteristics. The study of the relevance factor may be useful to unveil weak but important signals that are present in patients' narratives. DISCUSSION/CONCLUSION: We claim that the LDA non-supervised approach would permit to gain knowledge on the patterns of ST- and HA-related complaints and on patients' centered domains of interest. The merits and limitations of the LDA algorithms are compared with other natural language processing methods and with more conventional methods of qualitative analysis of patients' output. Future directions and research topics emerging from this innovative algorithmic analysis are proposed. Language: en
Keywords	Humans; Internet; Quality of Life; Natural Language Processing; Data Mining; Tinnitus; Statistics as Topic; Machine learning; Machine Learning; Text mining; Unsupervised Machine Learning; Computational predictive medicine; Hyperacusis