【论文阅读】A Bilingual Generative Transformer for Semantic Sentence Embedding-CFANZ编程社区

【论文阅读】A Bilingual Generative Transformer for Semantic Sentence Embedding

A Bilingual Generative Transformer for Semantic Sentence Embedding

John Wieting，CMU

文章要解决什么科学问题？
跨语言

摘要

Semantic sentence embedding models：把语义相同的句子编码更近。

Bilingual data：平行的数据提供更好的shared properties 在语义方面的，因此for parallel data，divergent（有歧义的，不同的） properties可能是风格上的或者是特定语言方面的。

重点说明：本文提出deep latent variable model that attempts to perform source separation on parallel sentence， isolating what they have in common in a latent semantic vector, and explaining what is left over with language-specific latent vectors.

模型特点：

利用a variational probabilistic framework，引入priors that encourage source separation， and can use our model‘s posterior to predict sentence embeddings for monolingual data at test time.
high-capacity tranformers

1 Introduction

para1 大致说明一下语言表示的前人研究

Mikolov：word2vec
Pennington：glove

Sentnece：Kiros el al.,

深度神经模型：ELMO、BERT

本文focus on learning semantic sentence embedding

这样的模型优点：out-of-the-box，可直接用于下游任务：Semantic textual similarity，mining bitext，paraphrase identification

设计句子嵌入模型三个主要因素：

architecture
training data
objective function

指出过往模型在in-domain data work，受约束的地方是 fundamentally limited by their inability to capture word order.

Specifically,
生成模型，source separation on parallel sentence，隔离在latent semantic embedding 上的common，explaining what is left over with language-specific latent vectors.

At test time, 直接用inference networks 估算出模型在语义方面的后验，

Fig1

2 Model

解码器：用的事transformer

modelling

$(1)p(x_{fr}|z_{sem},z_{fr};\theta))；(2) p(x_{en}|z_{sem},z_{en};\theta)$

0 条评论