find some problem in DQN and DDPG : the stock is a Markov process instead of a decision Markov process. implement the recurrent reinforcement learning algorithm.
do some experiments with DRRL.