文章目录
When Do LMs Need Retrieval Augmentation
Trigger retrieval when language model can not provide correct answers. Therefore, many work focuses on determining whether the model can provide a correct answer.
LMs’ Perception of Their Knowledge Boundaries
These methods focus on determining whether the model can provide a correct answer but do not perform adaptive Retrieval-Augmented Generation (RAG).
White-box Investigation
These methods require access to the full set of model parameters, such as for model training or using internal signals of the model.
Training The Language Model
-
[EMNLP 2020, Token-prob-based] Calibration of Pre-trained Transformers Shrey Desai et.al. 17 Mar 2020
-
[TACL 2021, Token-prob-based] How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering Zhengbao Jiang et.al. 2 Dec 2020
-
[TMLR 2022] Teaching Models to Express Their Uncertainty in Words Stephanie Lin et.al. 28 May 2022
-
[ACL 2023] A Close Look into the Calibration of Pre-trained Language Models Yangyi Chen et.al. 31 Oct 2022
-
[NeurIPS 2024] Alignment for Honesty Yuqing Yang et.al. 12 Dec 2023
Utilizing Internal States or Attention Weights
These papers focus on determining the truth of a statement or the model’s ability to provide a correct answer by analyzing the model’s internal states or attention weights. It usually involves using mathematical methods to extract features or training a lightweight MLP (Multi-Layer Perceptron).
-
[EMNLP 2023 Findings] The Internal State of an LLM Knows When It’s Lying Amos Azaria et.al. 26 Apr 2023
-
[ICLR 2024] Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models Mert Yuksekgonul et.al. 26 Sep 2023
-
[EMNLP 2023] The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models Aviv Slobodkin et.al. 18 Oct 2023
-
[ICLR 2024] INSIDE: LLMs’ internal states retain the power of hallucination detection Chao Chen et.al. 6 Feb 2024
-
[ACL 2024 Findings, MIND] Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models Weihang Su et.al. 11 Mar 2024
-
[NAACL 2024] On Large Language Models’ Hallucination with Regard to Known Facts Che Jiang et.al. 29 Mar 2024
-
[Arxiv, FacLens] Hidden Question Representations Tell Non-Factuality Within and Across Large Language Models Yanling Wang et.al. 8 Jun 2024
Grey-box Investigation
Need to access to the probability of generated tokens. Some methods also rely on the probability of generated tokens; however, since training is involved in the paper, they do not fall into this category.
-
[ICML 2017, Token-prob-based] On Calibration of Modern Neural Networks Chuan Guo et.al. 14 Jun 2017
-
[ICLR 2023] Prompting GPT-3 To Be Reliable Chenglei Si et.al. 17 Oct 2022
-
[ICLR 2023 Spotlight, Semantic Uncertainty] Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation Lorenz Kuhn et.al. 19 Feb 2023
-
[ACL 2024] Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models Abhishek Kumar et.al. 25 May 2024
-
[CCIR 2024] Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence? Shiyu Ni et.al. 19 Aug 2024
Black-box Investigation
These methods only require access to the model’s text output.
-
[EMNLP 2023, Selfcheckgpt] Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models Potsawee Manakul et.al. 15 Mar 2023
-
[EMNLP 2023] Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback Katherine Tian et.al. 24 May 2023
-
[ACL 2023 Findings] Do Large Language Models Know What They Don’t Know? Zhangyue Yin et.al. 29 May 2023
-
[ICLR 2024] Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs Miao Xiong et.al. 22 Jun 2023
-
[EMNLP 2023, SAC3] SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency Jiaxin Zhang et.al. 3 Nov 2023
-
[Arxiv] Large Language Model Confidence Estimation via Black-Box Access Tejaswini Pedapati et.al. 1 Jun 2024
Adaptive RAG
These methods focus directly on the “when to retrieve”, designing strategies and evaluating their effectiveness in Retrieval-Augmented Generation (RAG).
-
[ACL 2023 Oral, Adaptive RAG] When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories (Adaptive RAG) Alex Mallen et.al. 20 Dec 2022
-
[EMNLP 2023, FLARE] Active Retrieval Augmented Generation Zhengbao Jiang et.al. 11 May 2023
-
[EMNLP 2023 Findings, SKR] Self-Knowledge Guided Retrieval Augmentation for Large Language Models Yile Wang et.al. 8 Oct 2023
-
[ICLR 2024 Oral, Self-RAG] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Akari Asai et.al. 17 Oct 2023
-
[Arxiv, Rowen, Enhanced SAC3] Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models Hanxing Ding et.al. 16 Feb 2024
-
[ACL 2024 Findings] When Do LLMs Need Retrieval Augmentation? Mitigating LLMs’ Overconfidence Helps Retrieval Augmentation Shiyu Ni et.al. 18 Feb 2024
-
[Arxiv, Position paper] Reliable, Adaptable, and Attributable Language Models with Retrieval Akari Asai et.al. 5 Mar 2024
-
[ACL 2024 Oral, DRAGIN, Enhanced FLARE] DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models Weihang Su et.al. 15 Mar 2024
-
[EMNLP 2024 Findings, UAR] Unified Active Retrieval for Retrieval Augmented Generation Qinyuan Cheng et.al. 18 Jun 2024
-
[Arxiv, SEAKR] SEAKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation Zijun Yao et.al. 27 Jun 2024
后续更新将在github上进行:https://github.com/ShiyuNee/Awesome-When-To-Retrieve-Papers