ENHANCING POLICY OPTIMIZATION FOR IMPROVED SAMPLE EFFICIENCY AND GENERALIZATION IN DEEP REINFORCEMENT LEARNING