探索国际数据空间（IDS）架构（下）-CFANZ编程社区

文章目录

When Do LMs Need Retrieval Augmentation

When Do LMs Need Retrieval Augmentation

Trigger retrieval when language model can not provide correct answers. Therefore, many work focuses on determining whether the model can provide a correct answer.

LMs’ Perception of Their Knowledge Boundaries

These methods focus on determining whether the model can provide a correct answer but do not perform adaptive Retrieval-Augmented Generation (RAG).

White-box Investigation

These methods require access to the full set of model parameters, such as for model training or using internal signals of the model.

Training The Language Model

[EMNLP 2020, Token-prob-based] Calibration of Pre-trained Transformers Shrey Desai et.al. 17 Mar 2020
[TACL 2021, Token-prob-based] How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering Zhengbao Jiang et.al. 2 Dec 2020
[TMLR 2022] Teaching Models to Express Their Uncertainty in Words Stephanie Lin et.al. 28 May 2022
[ACL 2023] A Close Look into the Calibration of Pre-trained Language Models Yangyi Chen et.al. 31 Oct 2022
[NeurIPS 2024] Alignment for Honesty Yuqing Yang et.al. 12 Dec 2023

Utilizing Internal States or Attention Weights

These papers focus on determining the truth of a statement or the model’s ability to provide a correct answer by analyzing the model’s internal states or attention weights. It usually involves using mathematical methods to extract features or training a lightweight MLP (Multi-Layer Perceptron).

[EMNLP 2023 Findings] The Internal State of an LLM Knows When It’s Lying Amos Azaria et.al. 26 Apr 2023
[ICLR 2024] Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models Mert Yuksekgonul et.al. 26 Sep 2023
[EMNLP 2023] The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models Aviv Slobodkin et.al. 18 Oct 2023
[ICLR 2024] INSIDE: LLMs’ internal states retain the power of hallucination detection Chao Chen et.al. 6 Feb 2024
[ACL 2024 Findings, MIND] Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models Weihang Su et.al. 11 Mar 2024
[NAACL 2024] On Large Language Models’ Hallucination with Regard to Known Facts Che Jiang et.al. 29 Mar 2024
[Arxiv, FacLens] Hidden Question Representations Tell Non-Factuality Within and Across Large Language Models Yanling Wang et.al. 8 Jun 2024

Grey-box Investigation

Need to access to the probability of generated tokens. Some methods also rely on the probability of generated tokens; however, since training is involved in the paper, they do not fall into this category.

[ICML 2017, Token-prob-based] On Calibration of Modern Neural Networks Chuan Guo et.al. 14 Jun 2017
[ICLR 2023] Prompting GPT-3 To Be Reliable Chenglei Si et.al. 17 Oct 2022
[ICLR 2023 Spotlight, Semantic Uncertainty] Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation Lorenz Kuhn et.al. 19 Feb 2023
[ACL 2024] Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models Abhishek Kumar et.al. 25 May 2024
[CCIR 2024] Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence? Shiyu Ni et.al. 19 Aug 2024

Black-box Investigation

These methods only require access to the model’s text output.

[EMNLP 2023, Selfcheckgpt] Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models Potsawee Manakul et.al. 15 Mar 2023
[EMNLP 2023] Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback Katherine Tian et.al. 24 May 2023
[ACL 2023 Findings] Do Large Language Models Know What They Don’t Know? Zhangyue Yin et.al. 29 May 2023
[ICLR 2024] Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs Miao Xiong et.al. 22 Jun 2023
[EMNLP 2023, SAC3] SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency Jiaxin Zhang et.al. 3 Nov 2023
[Arxiv] Large Language Model Confidence Estimation via Black-Box Access Tejaswini Pedapati et.al. 1 Jun 2024

Adaptive RAG

These methods focus directly on the “when to retrieve”, designing strategies and evaluating their effectiveness in Retrieval-Augmented Generation (RAG).

[ACL 2023 Oral, Adaptive RAG] When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories (Adaptive RAG) Alex Mallen et.al. 20 Dec 2022
[EMNLP 2023, FLARE] Active Retrieval Augmented Generation Zhengbao Jiang et.al. 11 May 2023
[EMNLP 2023 Findings, SKR] Self-Knowledge Guided Retrieval Augmentation for Large Language Models Yile Wang et.al. 8 Oct 2023
[ICLR 2024 Oral, Self-RAG] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Akari Asai et.al. 17 Oct 2023
[Arxiv, Rowen, Enhanced SAC3] Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models Hanxing Ding et.al. 16 Feb 2024
[ACL 2024 Findings] When Do LLMs Need Retrieval Augmentation? Mitigating LLMs’ Overconfidence Helps Retrieval Augmentation Shiyu Ni et.al. 18 Feb 2024
[Arxiv, Position paper] Reliable, Adaptable, and Attributable Language Models with Retrieval Akari Asai et.al. 5 Mar 2024
[ACL 2024 Oral, DRAGIN, Enhanced FLARE] DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models Weihang Su et.al. 15 Mar 2024
[EMNLP 2024 Findings, UAR] Unified Active Retrieval for Retrieval Augmented Generation Qinyuan Cheng et.al. 18 Jun 2024
[Arxiv, SEAKR] SEAKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation Zijun Yao et.al. 27 Jun 2024

后续更新将在github上进行：https://github.com/ShiyuNee/Awesome-When-To-Retrieve-Papers