Framework for performance evaluation of face, text, and vehicle detection and tracking in video: data, metrics, and protocol