Comparative analysis of supervised and unsupervised learning algorithms for online user content suicidal ideation detection

Narynov, S.; Mukhtarkhanuly, D.; Kerimov, I.; Omarov, B.

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Comparative analysis of supervised and unsupervised learning algorithms for online user content suicidal ideation detection
Citation	Narynov S, Mukhtarkhanuly D, Kerimov I, Omarov B. J. Theor. Appl. Inf. Technol. 2019; 97(22): 3304-3317.
Copyright	(Copyright © 2019, JATIT)
DOI	unavailable
PMID	unavailable
Abstract	Suicide is one of the leading causes of death in most countries around the world; it is one of the three most common causes of death in a group of young people (15-24 years old), but so far no methods have been developed for diagnosing suicidal tendencies. In this connection, the problem of developing methods for identifying people prone to suicidal behavior is becoming especially topical. One of the directions of such research is the search for typological features of the speech related to suicide using the methods of mathematical linguistics, automatic text processing and machine learning. In foreign science, the texts of people that were motivated by suicide (mainly suicide notes) are studied using methods of automatic text processing (natural language processing), machine learning methods, and models that are constructed to allow to classify whether the text is related to suicide or not. It seems obvious that in order to develop methods for identifying people who are prone to suicide, it is necessary to analyze not only suicide notes (which are usually texts of small volume), but also other texts created by people who have committed suicide. The purpose of this work is to build a model of machine learning, apply teaching methods with and without a teacher, then select the most efficient algorithm for the task to classify whether the text is connected to suicide using comparative analysis. Our research contributes to detection of depressive content that can cause suicide, and to help such people reach confident help from psychologists of national suicide preventing center in Kazakhstan. Obtaining highest result for 95% of f1-score for Random Forest (Supervised) with tf-idf vectorization model, in conclusion we may say that K-means (Unsupervised) using tf-idf shows impressive results, which is only 4% lower in f1-score and precision. © 2005 - ongoing JATIT & LLS. Language: en
Keywords	Machine Learning; K-means; Random Forest; Sentiment Analysis; Suicidal Ideation Detection