SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.
RSS Feed

HELP: Tutorials | FAQ
CONTACT US: Contact info

Search Results

Journal Article

Citation

Diaz M, Panangadan A. IEEE Int Conf Inf Reuse Integr Data Sci 2020; 2020: 259-264.

Copyright

(Copyright © 2020, IEEE (Institute of Electrical and Electronics Engineers))

DOI

10.1109/iri49571.2020.00044

PMID

34853666

PMCID

PMC8631306

Abstract

From 21st conference (2020)

There is increasing interest in automatically identifying advertisements related to sex trafficking in online review sites. The main challenge is to identify the changing patterns in text reviews that are used to indicate illegal businesses. This work describes a novel means of identifying illegal business advertisements using natural language processing and machine learning. The method relies on building a training set of reviews of known illegal businesses. This training data is created by integrating a small high precision set of known illegal businesses (Rubmaps) with a large collection of online reviews from a general purpose review site (Yelp). Standard natural language pre-processing techniques are then applied to the text reviews and converted into a bag-of-words model with Term frequency-inverse document weighting. The resulting Document-Term matrix is used to train a classifier and then to identify suspicious activity from the remaining reviews. This approach therefore leverages a high-precision, low-recall dataset to identify relevant instances from the large low-precision, high-recall dataset. The approach was evaluated on a collection of 456,050 reviews from the Yelp online forum with a variety of machine learning algorithms and different number of text features. The method achieved a f1-score of 0.77 with a random forests classifier. The number of text features could also be reduced from 1,473 to 447 for a compact classifier with only a small drop in accuracy.

Keywords: Human trafficking;


Language: en

Keywords

machine learning; record linkage; sex trafficking

NEW SEARCH


All SafetyLit records are available for automatic download to Zotero & Mendeley
Print