- Consider using this encoder-decoder model for machine translation.
- True/False: This model is a “conditional language model” in the sense that the decoder portion (shown in green) is modeling the probability of the input sentence
.
解析:The encoder-decoder model for machine translation models the probability of the output sentence y conditioned on the input sentence x. The encoder portion is shown in green, while the decoder portion is shown in purple.
- In beam search, if you increase the beam width B, which of the following would you expect to be true?
- Beam search will generally find better solutions (i.e. do a better job maximizing
.
解析:As the beam width increases, beam search runs more slowly, uses up more memory, and converges after more steps, but generally finds better solutions.
- In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations.
- Suppose you are building a speech recognition system, which uses an RNN model to map from audio clip
to a text transcript
. Your algorithm uses beam search to try to find the value of
that maximizes
.On a dev set example, given an input audio clip, your algorithm outputs the transcript
= “I’m building an A Eye system in Silly con Valley.”, whereas a human gives a much superior transcript
“I’m building an AI system in Silicon Valley.”According to your model,
=
=
Would you expect increasing the beam width B to help correct this example?
- Yes, because
- Yes, because
- No, because
- No, because
- Continuing the example from Q4, suppose you work on your algorithm for a few more weeks, and now find that for the vast majority of examples on which your algorithm makes a mistake,
. This suggests you should focus your attention on improving the search algorithm.
- Consider the attention model for machine translation.
- Which of the following statements about
- We expect
to be generally larger for values of
that are highly relevant to the value the network should output for
. (Note the indices in the superscripts.)
is equal to the amount of attention
should pay to
This should not be
- The network learns where to “pay attention” by learning the values
, which are computed using a small neural network:
We can replacewith
as an input to this neural network because
is independent of
and
.
- Compared to the encoder-decoder model shown in Question 1 of this quiz (which does not use an attention mechanism), we expect the attention model to have the least advantage when:
- The input sequence length
- The input sequence length
解析:The encoder-decoder model works quite well with short sentences. The true advantage for the attention model occurs when the input sentence is large.
- Under the CTC model, identical repeated characters not separated by the “blank” character (_) are collapsed. Under the CTC model, what does the following string collapse to?
__c_oo_o_kk___b_ooooo__oo__kkk
- In trigger word detection,
- Whether someone has just finished saying the trigger word at time
.
- Features of the audio (such as spectrogram features) at time
.
- Whether the trigger word is being said at time
.
- The
-th input word, represented as either a one-hot vector or a word embedding.