Multi-modal Data-efficient Learning for 3D Machine Vision