Characteristics of multiclass suicide risks Tweets through feature extraction and machine learning techniques

Lim, Y.Q.; Loo, Y.L.

doi:10.30630/joiv.7.4.2284

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Characteristics of multiclass suicide risks Tweets through feature extraction and machine learning techniques
Citation	Lim YQ, Loo YL. JOIV Int. J. Inform. Vis. 2023; 7(4): 2297-2305.
Copyright	(Copyright © 2023, Society of Visual Informatics, and Institute of Visual Informatics)
DOI	10.30630/joiv.7.4.2284
PMID	unavailable
Abstract	This paper presents a detailed analysis of the linguistic characteristics connected to specific levels of suicide risks, providing insight into the impact of the feature extraction techniques on the effectiveness of the predictive models of suicide ideation. Prevalent initiatives of research works had been observed in the detection of suicide ideation from social media posts through feature extraction and machine learning techniques but scarcely on the multiclass classification of suicide risks and analysis of linguistic characteristics' impact on predictability. To address this issue, this paper proposes the implementation of a machine learning framework that is capable of analyzing multiclass classification of suicide risks from social media posts with extended analysis of linguistic characteristics that contribute to suicide risk detection. A total of 552 samples of a supervised dataset of Twitter posts were manually annotated for suicide risk modeling. Feature extraction was done through a combination of feature extraction techniques of term frequency-inverse document frequency (TF-IDF), Part-of-Speech (PoS) tagging, and valence-aware dictionary for sentiment reasoning (VADER). Data training and modeling were conducted through the Random Forest technique. Testing of 138 samples with scenarios of detections in real-time data for the performance evaluation yielded 86.23% accuracy, 86.71% precision, and 86.23% recall, an improved result with a combination of feature extraction techniques rather than data modeling techniques. An extended analysis of linguistic characteristics showed that a sentence's context is the main contributor to suicide risk classification accuracy, while grammatical tags and strong conclusive terms were not. © 2023, Politeknik Negeri Padang. All rights reserved. Language: en
Keywords	machine learning; sentiment analysis; feature extraction; suicide ideation detection; Multiclass suicide risks