0
点赞
收藏
分享

微信扫一扫

【论文阅读】A Bilingual Generative Transformer for Semantic Sentence Embedding

A Bilingual Generative Transformer for Semantic Sentence Embedding

John Wieting,CMU

文章要解决什么科学问题?
跨语言

摘要

Semantic sentence embedding models:把语义相同的句子编码更近。

Bilingual data:平行的数据提供更好的shared properties 在语义方面的,因此for parallel data,divergent(有歧义的,不同的) properties可能是风格上的或者是特定语言方面的。

重点说明:本文提出deep latent variable model that attempts to perform source separation on parallel sentence, isolating what they have in common in a latent semantic vector, and explaining what is left over with language-specific latent vectors.

模型特点:

  1. 利用a variational probabilistic framework,引入priors that encourage source separation, and can use our model‘s posterior to predict sentence embeddings for monolingual data at test time.

  2. high-capacity tranformers

1 Introduction

para1 大致说明一下语言表示的前人研究

Mikolov:word2vec
Pennington:glove

Sentnece:Kiros el al.,

深度神经模型:ELMO、BERT

本文focus on learning semantic sentence embedding

这样的模型优点:out-of-the-box,可直接用于下游任务:Semantic textual similarity,mining bitext,paraphrase identification

设计句子嵌入模型三个主要因素:

  • architecture
  • training data
  • objective function

指出过往模型在in-domain data work,受约束的地方是 fundamentally limited by their inability to capture word order.

Specifically,
生成模型,source separation on parallel sentence,隔离在latent semantic embedding 上的common,explaining what is left over with language-specific latent vectors.

At test time, 直接用inference networks 估算出模型在语义方面的后验,

Fig1

2 Model

解码器:用的事transformer

modelling

( 1 ) p ( x f r ∣ z s e m , z f r ; θ ) ) ; ( 2 ) p ( x e n ∣ z s e m , z e n ; θ ) (1)p(x_{fr}|z_{sem},z_{fr};\theta));(2) p(x_{en}|z_{sem},z_{en};\theta) (1)p(xfrzsem,zfr;θ))(2)p(xenzsem,zen;θ)

举报

相关推荐

0 条评论