Learning and exploiting reward machines for reinforcement learning