深度学习Course5第三周Sequence Models & Attention Mechanism习题整理-CFANZ编程社区

Consider using this encoder-decoder model for machine translation.

深度学习Course5第三周Sequence Models & Attention Mechanism习题整理_nlp

True/False: This model is a “conditional language model” in the sense that the decoder portion (shown in green) is modeling the probability of the input sentence .

解析：The encoder-decoder model for machine translation models the probability of the output sentence y conditioned on the input sentence x. The encoder portion is shown in green, while the decoder portion is shown in purple.

In beam search, if you increase the beam width B, which of the following would you expect to be true?

Beam search will generally find better solutions (i.e. do a better job maximizing.

解析：As the beam width increases, beam search runs more slowly, uses up more memory, and converges after more steps, but generally finds better solutions.

In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations.

Suppose you are building a speech recognition system, which uses an RNN model to map from audio clip to a text transcript . Your algorithm uses beam search to try to find the value of that maximizes .On a dev set example, given an input audio clip, your algorithm outputs the transcript = “I’m building an A Eye system in Silly con Valley.”, whereas a human gives a much superior transcript “I’m building an AI system in Silicon Valley.”According to your model,
=
=
Would you expect increasing the beam width B to help correct this example?

Yes, because
Yes, because
No, because
No, because

Continuing the example from Q4, suppose you work on your algorithm for a few more weeks, and now find that for the vast majority of examples on which your algorithm makes a mistake, . This suggests you should focus your attention on improving the search algorithm.

Consider the attention model for machine translation.

深度学习Course5第三周Sequence Models & Attention Mechanism习题整理_深度学习_19

Which of the following statements about

We expectto be generally larger for values ofthat are highly relevant to the value the network should output for. (Note the indices in the superscripts.)
is equal to the amount of attentionshould pay toThis should not be

The network learns where to “pay attention” by learning the values , which are computed using a small neural network:
We can replace with as an input to this neural network because is independent of and .

Compared to the encoder-decoder model shown in Question 1 of this quiz (which does not use an attention mechanism), we expect the attention model to have the least advantage when:

The input sequence length
The input sequence length

解析：The encoder-decoder model works quite well with short sentences. The true advantage for the attention model occurs when the input sentence is large.

Under the CTC model, identical repeated characters not separated by the “blank” character (_) are collapsed. Under the CTC model, what does the following string collapse to?

__c_oo_o_kk___b_ooooo__oo__kkk