Vanishing Gradient Problem in Training Neural Networks