keras的处理文本数据的取长补短功能
from keras.preprocessing.sequence import pad_sequences
# # help(pad_sequences)
# pad_sequences(sequences, maxlen=None,
# dtype='int32', padding='pre',
# truncating='pre', value=0.0)
x = [[1, 2, 3], [4, 5], [1, 3, 4, 5, 6, 7, 8, 9]]
x_trans = pad_sequences(x, maxlen=5,
padding='post',
truncating='post')
x_trans
结果: 变成了长度相同 注意此时九一斤格式ndarray 类型的了
array([[1, 2, 3, 0, 0],
[4, 5, 0, 0, 0],
[1, 3, 4, 5, 6]])
将标签进行onthot 编码
from keras.utils import to_categorical
y = [0, 2, 9]
y_onehot = to_categorical(y)
y_onehot
结果:
array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])