Predicting Desired Temporal Waypoints From Camera And Route Planner Images Using End-To-Mid Imitation Learning