深度学习Course5第一周Recurrent Neural Networks习题整理-CFANZ编程社区

Recurrent Neural Networks

Suppose your training examples are sentences (sequences of words). Which of the following refers to the word in the

Consider this RNN:

深度学习Course5第一周Recurrent Neural Networks习题整理_ide_07

This specific type of architecture is appropriate when:

To which of these tasks would you apply a many-to-one RNN architecture? (Check all that apply).

深度学习Course5第一周Recurrent Neural Networks习题整理_ide_12

Using this as the training model below, answer the following:

深度学习Course5第一周Recurrent Neural Networks习题整理_sed_13

True/False: At the time step the RNN is estimating

解析：in a training model we try to predict the next steps based on the knowledge of all prior steps.

You have finished training a language model RNN and are using it to sample random sentences, as follows:

深度学习Course5第一周Recurrent Neural Networks习题整理_sed_16

What are you doing at each time step ?

(i) Use the probabilities output by the RNN to pick the highest probability word for that time-step as.
(ii) Then pass the ground-truth word from the training set to the next time-step.
(i) Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as
(ii) Then pass the ground-truth word from the training set to the next time-step.
(i) Use the probabilities output by the RNN to pick the highest probability word for that time-step as
(ii) Then pass this selected word to the next time-step.
(i) Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as
(ii) Then pass this selected word to the next time-step.

True/False: If you are training an RNN model, and find that your weights and activations are all taking on the value of NaN (“Not a Number”) then you have an exploding gradient problem.

解析：Exploding gradients happen when large error gradients accumulate and result in very large updates to the NN model weights during training. These weights can become too large and cause an overflow, identified as NaN.

Suppose you are training an LSTM. You have a 10000 word vocabulary, and are using an LSTM with 100-dimensional activations . What is the dimension of

解析：

True/False: In order to simplify the GRU without vanishing gradient problems even when training on very long sequences you should remove the i.e., setting = 1 always.

解析： If ≈0 for a timestep, the gradient can propagate back through that timestep without much decay. For the signal to backpropagate without vanishing, we need to be highly dependent on .

True/False: Using the equations for the GRU and LSTM below the Update Gate and Forget Gate in the LSTM play a role similar to 1- and .

深度学习Course5第一周Recurrent Neural Networks习题整理_ide_32

解析：Instead of using Γu to compute 1 -, LSTM uses 2 gates ( and ) to compute the final value of the hidden state. So, is used instead of 1 - .

You have a pet dog whose mood is heavily dependent on the current and past few days’ weather. You’ve collected data for the past 365 days on the weather, which you represent as a sequence as ,…,. You’ve also collected data on your dog’s mood, which you represent as ,…,. You’d like to build a model to map from . Should you use a Unidirectional RNN or Bidirectional RNN for this problem?

Unidirectional RNN, because the value ofdepends only on,…,, but not on,…,
Unidirectional RNN, because the value ofdepends only on, and not other days’ weather.