具有 Azure AI 搜索和 LlamaIndex 的高级 RAG，Azure AI OpenAI-CFANZ编程社区

Azure AI 搜索和LlamaIndex之间的新合作，使开发人员能够使用全面的 RAG 框架和最先进的检索系统，通过高级检索增强生成 (RAG) 构建更好的应用程序。

目前建议企业通过微软官方合作伙伴获取服务，可以合规、稳定地提供企业用户使用ChatGPT的可能，满足国内发票需求，同时也能解决连接不稳定/响应速度慢/并发配额低等问题。

RAG 是将公司信息合并到基于大型语言模型 (LLM) 的应用程序中的一种流行方法。借助 RAG，人工智能应用程序可以近乎实时地访问最新信息，团队可以保持对其数据的控制。

在 RAG 中，您可以评估和修改各个阶段以改进结果，它们分为三类：检索前、检索和检索后。

预检索使用查询重写等技术提高检索数据的质量。
检索使用混合搜索和语义排名等先进技术改进结果。
检索后的重点是优化检索到的信息并增强提示。

LlamaIndex 为初学者和经验丰富的开发人员提供了一个全面的框架和生态系统，以便在其数据源上构建 LLM 应用程序。

Azure AI 搜索是一个信息检索平台，具有尖端搜索技术和无缝平台集成，专为任何规模的高性能生成式 AI 应用程序而构建。

在这篇文章中，我们将重点关注预检索和检索阶段。我们将向您展示如何在预检索中使用 LlamaIndex 进行查询转换，以及如何使用 Azure AI 搜索进行高级检索技术。

具有 Azure AI 搜索和 LlamaIndex 的高级 RAG，Azure AI OpenAI_搜索

图 1：高级 RAG 中的预检索、检索和检索后

预检索技术和优化查询编排

为了优化预检索，LlamaIndex 提供了查询转换，这是一项可以优化用户输入的强大功能。一些查询转换技术包括：

路由：保持查询不变，但标识查询适用的相关工具子集。输出这些工具作为相关选择。
查询重写：保持工具不变，但以各种不同的方式重写查询以针对相同的工具执行。
子问题：通过不同的工具将查询分解为多个子问题，由其元数据标识。
ReAct 代理工具选择：给定初始查询，确定 (1) 要选择的工具，以及 (2) 要在工具上执行的查询。

以查询重写为例：查询重写使用LLM将您的初始查询重新表述为多种形式。这使得开发人员能够探索数据的不同方面，从而得到更细致、更准确的响应。通过查询重写，开发人员可以生成多个查询进行集成检索和融合检索，从而获得更高质量的检索结果。利用Azure OpenAI，可以将初始查询分解为多个子查询。

考虑这个初始查询：

“作者怎么了？”

如果问题过于宽泛或者似乎不太可能在我们的语料库文本中找到直接比较，建议将问题分解为多个子查询。

子查询：

“作者最近写的书是什么？”
“作者得过文学奖吗？”
“有什么即将发生的活动或对作者的采访吗？”
“作者的背景和写作风格是什么？”
“作者是否有任何争议或丑闻？”

子问题查询引擎

LlamaIndex 的一大优点是框架内置了像这样的高级检索策略。例如，可以使用子问题查询引擎一步处理上面的子查询，该引擎将问题分解为更简单的问题，然后将答案组合成单个响应。

response = query_engine.query("What happened to the author?")

使用 Azure AI 搜索进行检索

为了增强检索，Azure AI 搜索提供混合搜索和语义排名。混合搜索同时执行关键字检索和向量检索，并应用融合步骤（倒数排名融合 (RRF)）来从每种技术中选择最佳结果。

语义排名器在初始 BM25 排名或 RRF 排名结果的基础上添加了二级排名。该二级排名使用多语言深度学习模型来推广语义最相关的结果。

通过将“query_type”参数更新为“semantic”，可以轻松启用语义排名器。由于语义排名是在 Azure AI 搜索堆栈中完成的，因此我们的数据表明，语义排名器与混合搜索相结合是提高开箱即用相关性的最有效方法。

此外，Azure AI 搜索支持矢量查询中的筛选器。您可以设置过滤器模式以在向量查询执行之前或之后应用过滤器：

预过滤模式：在查询执行之前应用过滤器，减少矢量搜索算法查找相似内容的搜索表面积。预过滤通常比后过滤慢，但有利于召回率和精确度。
后过滤模式：在查询执行后应用过滤器，缩小搜索结果范围。后过滤更注重速度而不是选择。

我们很高兴与 LlamaIndex 合作，提供更简单的方法来优化预检索和检索以实施高级 RAG。运行高级 RAG 并不止于预检索和检索优化，我们才刚刚开始！请继续关注我们正在共同探索的未来方法。

例子

设置 Azure OpenAI

aoai_api_key = "YourAzureOpenAIAPIKey"  
aoai_endpoint = "YourAzureOpenAIEndpoint"
aoai_api_version = "2023-05-15"

llm = AzureOpenAI(
    model="YourAzureOpenAICompletionModelName",
    deployment_name="YourAzureOpenAICompletionDeploymentName",
    api_key=aoai_api_key,
    azure_endpoint=aoai_endpoint,
    api_version=aoai_api_version,
)

# You need to deploy your own embedding model as well as your own chat completion model
embed_model = AzureOpenAIEmbedding(
    model="YourAzureOpenAIEmbeddingModelName",
    deployment_name="YourAzureOpenAIEmbeddingDeploymentName",
    api_key=aoai_api_key,
    azure_endpoint=aoai_endpoint,
    api_version=aoai_api_version,
)

设置 Azure AI 搜索

search_service_api_key = "YourAzureSearchServiceAdminKey"
search_service_endpoint = "YourAzureSearchServiceEndpoint"
search_service_api_version = "2023-11-01"
credential = AzureKeyCredential(search_service_api_key)

# Index name to use
index_name = "llamaindex-vector-demo"

# Use index client to demonstrate creating an index
index_client = SearchIndexClient(
    endpoint=search_service_endpoint,
    credential=credential,
)

# Use search client to demonstration using existing index
search_client = SearchClient(
    endpoint=search_service_endpoint,
    index_name=index_name,
    credential=credential,
)

创建新索引

metadata_fields = {
    "author": "author",
    "theme": ("topic", MetadataIndexFieldType.STRING),
    "director": "director",
}

vector_store = AzureAISearchVectorStore(
    search_or_index_client=index_client,
    filterable_metadata_field_keys=metadata_fields,
    index_name=index_name,
    index_management=IndexManagement.CREATE_IF_NOT_EXISTS,
    id_field_key="id",
    chunk_field_key="chunk",
    embedding_field_key="embedding",
    embedding_dimensionality=1536,
    metadata_string_field_key="metadata",
    doc_id_field_key="doc_id",
    language_analyzer="en.lucene",
    vector_algorithm_type="exhaustiveKnn",

加载文档

documents = SimpleDirectoryReader("../data/paul_graham/").load_data()
storage_context = StorageContext.from_defaults(vector_store=vector_store)

Settings.llm = llm
Settings.embed_model = embed_model
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

矢量搜索

from llama_index.core.vector_stores.types import VectorStoreQueryMode

default_retriever = index.as_retriever(
    vector_store_query_mode=VectorStoreQueryMode.DEFAULT
)
response = default_retriever.retrieve("What is inception about?")

# Loop through each NodeWithScore in the response
for node_with_score in response:
    node = node_with_score.node  # The TextNode object
    score = node_with_score.score  # The similarity score
    chunk_id = node.id_  # The chunk ID

    # Extract the relevant metadata from the node
    file_name = node.metadata.get("file_name", "Unknown")
    file_path = node.metadata.get("file_path", "Unknown")

    # Extract the text content from the node
    text_content = node.text if node.text else "No content available"

    # Print the results in a user-friendly format
    print(f"Score: {score}")
    print(f"File Name: {file_name}")
    print(f"Id: {chunk_id}")
    print("\nExtracted Content:")
    print(text_content)
    print("\n" + "=" * 40 + " End of Result " + "=" * 40 + "\n")

混合搜索

from llama_index.core.vector_stores.types import VectorStoreQueryMode

hybrid_retriever = index.as_retriever(
    vector_store_query_mode=VectorStoreQueryMode.HYBRID
)
hybrid_retriever.retrieve("What is inception about?")

混合搜索和语义排名

hybrid_retriever = index.as_retriever(
    vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID
)
hybrid_retriever.retrieve("What is inception about?")

查询重写

from llama_index.core import PromptTemplate
 
query_gen_str = """\
You are a helpful assistant that generates multiple search queries based on a \
single input query. Generate {num_queries} search queries, one on each line, \
related to the following input query:
Query: {query}
Queries:
"""
query_gen_prompt = PromptTemplate(query_gen_str)

def generate_queries(query: str, llm, num_queries: int = 5):
    response = llm.predict(
        query_gen_prompt, num_queries=num_queries, query=query
    )
    # assume LLM proper put each query on a newline
    queries = response.split("\n")
    queries_str = "\n".join(queries)
    print(f"Generated queries:\n{queries_str}")
    return queries

queries = generate_queries("What happened to the author?", llm)

生成的查询：

作者最近写的书是什么？
作者是否获得过任何文学奖项？
有没有即将举办的活动或对作者的采访？
作者的背景和写作风格是什么？
作者是否有任何争议或丑闻？

子问题查询引擎

from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# setup base query engine as tool
query_engine_tools = [
    QueryEngineTool(
        query_engine=index.as_query_engine(),
        metadata=ToolMetadata(
            name=”pg_essay”,
            description="Paul Graham essay on What I Worked On",
        ),
    ),
]
# build a sub-question query engine over this tool
# this allows decomposing the question down into sub-questions which then execute against the tool
query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
   use_async=True,
)

response = query_engine.query("What happened to the author?")

生成 1 个子问题。

[pg_essay] 问：作者主要从事什么工作？

[pg_essay] A：作者在大学之前从事写作和编程工作。他们写了短篇小说，并尝试使用早期版本的 Fortran 在 IBM 1401 计算机上进行编程。后来，他们使用微型计算机，自己制造了一台，最终得到了 TRS-80。他们编写了简单的游戏、预测火箭高度的程序和文字处理器。大学时，作者原本打算学习哲学，但因为对智能计算机感兴趣而转而学习人工智能。