Be aware, overflowing tokens are not returned for the setting you have chosen-CFANZ编程社区

Be aware, overflowing tokens are not returned for the setting you have chosen

原文提醒如下：

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the ‘longest_first’ truncation strategy. So the returned list will always be empty even if some tokens have been removed.

出现场景：

encode_tokens = tokenizer.encode_plus(
                            text="1 1 1 1 1", 
                            text_pair="2 2 2",
                            padding='max_length', 
                            max_length=10, 
                            truncation=True
                )

上面的两个文本段含有8个单词（5个1，3个2），加上1个CLS、2个SEP，一共切分出11个token，大于最大长度10，会被截断，所以提醒你。
解决方案：

1、你把上面的例子改成7个单词，不会截断，不会提醒。

    2、其他博客提到加上下面一句话：

import transformers
transformers.logging.set_verbosity_error()

相当于改变报错级别。

0 条评论