目录
介绍:
one-hot:
### sentences
sentences=[ 'the glass of milk',
'the glass of juice',
'the cup of tea',
'I am a good boy',
'I am a good developer',
'understand the meaning of words',
'your videos are good',]
### Vocabulary size
voc_size=10000
onehot_repr=[one_hot(words,voc_size) for words in sentences]
print(onehot_repr)
'''结果:
[[1607, 1898, 6281, 9401], [1607, 1898, 6281, 3401], [1607, 6359, 6281, 2217], [7508, 378, 2733, 8693, 7438], [7508, 378, 2733, 8693, 5363], [8292, 1607, 4448, 6281, 8555], [1825, 3648, 3717, 8693]]'''
pad_sequences:
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
import numpy as np
sent_length=8
embedded_docs=pad_sequences(onehot_repr,padding='pre',maxlen=sent_length)
print(embedded_docs)
'''结果:
[[ 0 0 0 0 1607 1898 6281 9401]
[ 0 0 0 0 1607 1898 6281 3401]
[ 0 0 0 0 1607 6359 6281 2217]
[ 0 0 0 7508 378 2733 8693 7438]
[ 0 0 0 7508 378 2733 8693 5363]
[ 0 0 0 8292 1607 4448 6281 8555]
[ 0 0 0 0 1825 3648 3717 8693]]
'''
建模:
dim=10
model=Sequential()
model.add(Embedding(voc_size,dim,input_length=sent_length))
model.compile('adam','mse')
model.summary()