Audio-coupled video content understanding of unconstrained video sequences