An accurate violence detection framework using unsupervised spatial-temporal action translation network