A Bilingual Generative Transformer for Semantic Sentence Embedding
John Wieting,CMU
文章要解决什么科学问题?
跨语言
摘要
Semantic sentence embedding models:把语义相同的句子编码更近。
Bilingual data:平行的数据提供更好的shared properties 在语义方面的,因此for parallel data,divergent(有歧义的,不同的) properties可能是风格上的或者是特定语言方面的。
重点说明:本文提出deep latent variable model that attempts to perform source separation on parallel sentence, isolating what they have in common in a latent semantic vector, and explaining what is left over with language-specific latent vectors.
模型特点:
-
利用a variational probabilistic framework,引入priors that encourage source separation, and can use our model‘s posterior to predict sentence embeddings for monolingual data at test time.
-
high-capacity tranformers
1 Introduction
para1 大致说明一下语言表示的前人研究
Mikolov:word2vec
Pennington:glove
Sentnece:Kiros el al.,
深度神经模型:ELMO、BERT
本文focus on learning semantic sentence embedding
这样的模型优点:out-of-the-box,可直接用于下游任务:Semantic textual similarity,mining bitext,paraphrase identification
设计句子嵌入模型三个主要因素:
- architecture
- training data
- objective function
指出过往模型在in-domain data work,受约束的地方是 fundamentally limited by their inability to capture word order.
Specifically,
生成模型,source separation on parallel sentence,隔离在latent semantic embedding 上的common,explaining what is left over with language-specific latent vectors.
At test time, 直接用inference networks 估算出模型在语义方面的后验,
2 Model
解码器:用的事transformer
modelling
( 1 ) p ( x f r ∣ z s e m , z f r ; θ ) ) ; ( 2 ) p ( x e n ∣ z s e m , z e n ; θ ) (1)p(x_{fr}|z_{sem},z_{fr};\theta));(2) p(x_{en}|z_{sem},z_{en};\theta) (1)p(xfr∣zsem,zfr;θ));(2)p(xen∣zsem,zen;θ)