LONG SHORT TERM MEMORY CELLS

Jump to bottom

Riccardo Viviano edited this page Apr 29, 2021 · 8 revisions

Here some tricks to know how to use a recurrent model:

each lstm cell computes the following calculations:

logo

inpout dimension = dimension of x, output dimension = dimension of c,h
after a back propagation passage you can receive as output the tensor dfioc, the partial derivatives of the first lstm cells in orizontal of the rmodel of f,i,o,c of the picture above.
if you are using the rmodel in the test mode and you used right dropout during the training mode, remember that the h_t and c_t passed to the function of the feed forward have been deleted according to the dropout function, for this reason you have to shift the h_t and c_t arrays according to the dropout used for the lstm
Group normalization my advice: if you have many to many don't use group normalization. If you have many to 1, you can use it, but not on the final layers, because the backpropagated error seems that leads to unstability of the training if you have many to 1 and use group normalization at the final layers.