代码收藏家技术教程 2025-01-11

Python LCEL 入门范例

1. 简介

Langchain Expression Language（LCEL）是 LangChain 中的一个重要概念，LCEL是一种声明式的链式组合语言。它提供了一种统一的接口，允许不同的组件（如 retriever, prompt, llm 等）可以通过统一的 Runnable 接口连接起来。每个 Runnable 组件都实现了相同的方法，如 .invoke()、.stream() 或 .batch()，这使得它们可以通过 | 操作符轻松连接。

1.1 LCEL 的优势

LCEL使得从基本组件构建复杂链变得容易，并支持流式处理、并行处理和日志记录等开箱即用的功能

统一接口: LCEL 通过 Runnable 接口将不同的组件统一起来，简化了复杂操作的实现。

模块化: 各个组件可以独立开发和测试，然后通过 LCEL 轻松集成。

可扩展性: LCEL 支持异步调用、批处理和流式处理，适应不同的应用场景。

2. 范例

查看版本

$ pip show langchain langchain_community
Name: langchain
Version: 0.3.7
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /home/xjg/.conda/envs/langchain/lib/python3.10/site-packages
Requires: aiohttp, async-timeout, langchain-core, langchain-text-splitters, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: langchain-community
---
Name: langchain-community
Version: 0.3.5
Summary: Community contributed LangChain integrations.
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /home/xjg/.conda/envs/langchain/lib/python3.10/site-packages
Requires: aiohttp, dataclasses-json, httpx-sse, langchain, langchain-core, langsmith, numpy, pydantic-settings, PyYAML, requests, SQLAlchemy, tenacity
Required-by: langchain-experimental

2.1 持久化存储的向量数据库的 RAG 示例

代码：

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain.docstore.document import Document
from dotenv import load_dotenv, find_dotenv
import os

# 删除代理环境变量
if 'all_proxy' in os.environ:
    del os.environ['all_proxy']

if 'ALL_PROXY' in os.environ:
    del os.environ['ALL_PROXY']

_ = load_dotenv(find_dotenv())

# 初始化模型
model = ChatOpenAI(model="gpt-4o-mini")

# 创建或加载持久化向量数据库
embedding_model = OpenAIEmbeddings()
vectorstore_path = "faiss_index"

if os.path.exists(vectorstore_path):
    # 尝试加载现有的 FAISS 数据库
    vectorstore = FAISS.load_local(vectorstore_path, embedding_model,allow_dangerous_deserialization=True)
    print("Loaded existing FAISS index.")
else:
    # 如果没有找到数据库，创建新的数据库
    texts = ["harrison worked at kensho", "bears like to eat honey"]
    docs = [Document(page_content=text) for text in texts]
    vectorstore = FAISS.from_documents(docs, embedding_model)
    vectorstore.save_local(vectorstore_path)
    print("Created and saved new FAISS index.")

retriever = vectorstore.as_retriever()

# 创建一个聊天提示模板，用中文设置模板以便生成基于特定上下文和问题的完整输入
template = """根据以下上下文回答问题:
{context}

问题: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# 初始化输出解析器，将模型输出转换为字符串
output_parser = StrOutputParser()

# 设置上下文和问题的处理逻辑
setup_and_retrieval = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
)

# 构建一个处理链，包括上下文和问题的设置、提示生成、模型调用和输出解析
chain = setup_and_retrieval | prompt | model | output_parser

# 调用处理链，传入问题"where did harrison work?"（需翻译为中文），并基于给定的文本上下文生成答案
print(chain.invoke("harrison在哪里工作？"))

# 验证检索器是否正常工作
print(retriever)

其中 faiss_index 文件

$ tree faiss_index/
faiss_index/
├── index.faiss
└── index.pkl

0 directories, 2 files

vectorstore = FAISS.load_local(vectorstore_path, embedding_model,allow_dangerous_deserialization=True)
中参数 allow_dangerous_deserialization 选用默认参数的话，因为安全问题会报错

ValueError: The de-serialization relies loading a pickle file. Pickle files can be modified to deliver a malicious payload that results in execution of arbitrary code on your machine.You will need to set `allow_dangerous_deserialization` to `True` to enable deserialization. If you do this, make sure that you trust the source of the data. For example, if you are loading a file that you created, and know that no one else has modified the file, then this is safe to do. Do not set this to `True` if you are loading a file from an untrusted source (e.g., some random site on the internet.).

2.2 多链

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
import os
from dotenv import load_dotenv, find_dotenv

# 删除all_proxy环境变量
if 'all_proxy' in os.environ:
    del os.environ['all_proxy']

# 删除ALL_PROXY环境变量
if 'ALL_PROXY' in os.environ:
    del os.environ['ALL_PROXY']

_ = load_dotenv(find_dotenv())


planner = (
    ChatPromptTemplate.from_template("总结功能需求: {input}")
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
    | {"base_response": RunnablePassthrough()}
)

# 生成关于cpp的代码
code_cpp = (
    ChatPromptTemplate.from_template(
        "写出关于{base_response}的cpp代码"
    )
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
)

# 生成关于python的代码
code_python = (
    ChatPromptTemplate.from_template(
        "写出关于{base_response}的python代码"
    )
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
)

# 创建最终响应者，综合原始回应和正反论点生成最终的回应
final_responder = (
    ChatPromptTemplate.from_messages(
        [
            ("ai", "{original_response}"),
            ("human", "cpp代码:\n{code_cpp}\n\npython代码:\n{code_python}"),
            ("system", "打印出生成的完整的cpp代码，逐步检查cpp代码，发现是否有问题，并提出解决方法；同理，python代码同样"),
        ]
    )
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
)

# 构建完整的处理链，从生成论点到列出正反论点，再到生成最终回应
chain = (
    planner
    | {
        "code_cpp": code_cpp,
        "code_python": code_python,
        "original_response": itemgetter("base_response"),
    }
    | final_responder
)

print(chain.invoke({"input": "异步多线程并行运算的简单demo"}))

输出：

好的，我们将逐步检查提供的C++和Python代码，确保它们没有问题，并提出任何需要改进的地方。

C++代码逐步检查

完整代码回顾

#include <iostream>
#include <vector>
#include <thread>
#include <future>
#include <numeric>
#include <chrono>

// 计算任务：计算一段数据的和
int computeSum(const std::vector<int>& data, size_t start, size_t end) {
    int sum = 0;
    for (size_t i = start; i < end; ++i) {
        sum += data[i];
    }
    return sum;
}

// 主函数
int main() {
    // 1. 用户输入数据
    size_t dataSize;
    std::cout << "请输入数据大小: ";
    std::cin >> dataSize;

    std::vector<int> data(dataSize);
    std::cout << "请输入" << dataSize << "个整数: ";
    for (size_t i = 0; i < dataSize; ++i) {
        std::cin >> data[i];
    }

    // 2. 定义线程数量
    const size_t numThreads = std::thread::hardware_concurrency();
    std::vector<std::future<int>> futures;
    size_t chunkSize = dataSize / numThreads;

    // 3. 异步执行计算任务
    auto startTime = std::chrono::high_resolution_clock::now();
    for (size_t i = 0; i < numThreads; ++i) {
        size_t start = i * chunkSize;
        size_t end = (i == numThreads - 1) ? dataSize : start + chunkSize;

        // 使用 std::async 启动异步任务
        futures.push_back(std::async(std::launch::async, computeSum, std::ref(data), start, end));
    }

    // 4. 汇总结果
    int totalSum = 0;
    for (auto& fut : futures) {
        totalSum += fut.get();
    }

    auto endTime = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = endTime - startTime;

    // 5. 显示结果
    std::cout << "计算结果: " << totalSum << std::endl;
    std::cout << "计算时间: " << elapsed.count() << "秒" << std::endl;

    return 0;
}

逐步检查

输入验证：当前代码没有对用户输入进行验证，建议在获取 dataSize 和数据数组时进行有效性检查，以避免用户输入负数或非整数。
解决方法：使用 std::cin.fail() 检查输入是否有效，并在输入不合法时清除输入流并提示用户重新输入。
线程数量：std::thread::hardware_concurrency() 返回的线程数可能为 0，表示无法查询硬件支持的线程数。
解决方法：在使用该值之前，检查其是否为 0，可以设置一个默认值，例如 2。
内存安全：在 computeSum 函数中，使用了对 data 的引用，确保在多线程环境中不会出现数据竞争。
解决方法：虽然当前情况下不会出现问题，但可以考虑使用 std::vector<std::mutex> 来保护共享资源。
性能：如果 dataSize 小于 numThreads，可能会导致某些线程没有任务可做。
解决方法：在分配任务时，可以根据 dataSize 的大小调整 numThreads，确保每个线程都有工作。

Python代码逐步检查

完整代码回顾

import concurrent.futures
import random
import time

# 模拟计算任务的函数
def compute_task(n):
    # 模拟一个耗时的计算
    time.sleep(random.uniform(0.1, 1.0))  # 随机延时
    return n * n  # 示例：返回平方值

def main():
    # 用户输入任务数据
    try:
        tasks = input("请输入要处理的数字（用逗号分隔）：")
        numbers = list(map(int, tasks.split(',')))
    except ValueError:
        print("无效输入，请输入数字。")
        return

    results = []  # 存储计算结果

    # 使用线程池进行异步计算
    with concurrent.futures.ThreadPoolExecutor() as executor:
        future_to_num = {executor.submit(compute_task, num): num for num in numbers}

        # 处理结果
        for future in concurrent.futures.as_completed(future_to_num):
            num = future_to_num[future]
            try:
                result = future.result()
                results.append((num, result))
                print(f"数字 {num} 的平方是 {result}")
            except Exception as e:
                print(f"任务 {num} 处理时出现错误：{e}")

    # 汇总结果
    print("\n所有计算完成。")
    print("结果汇总：")
    for num, result in results:
        print(f"数字 {num} 的平方是 {result}")

if __name__ == "__main__":
    main()

逐步检查

输入验证：当前代码在输入时有基本的异常处理，但没有限制用户输入的数字范围，用户可以输入非常大的数字导致性能问题。
解决方法：可以在输入后对 numbers 列表进行检查，限制数量和范围。
随机延时：在 compute_task 中使用随机延时可能导致任务执行时间不可预测，这对性能测试不是特别有利。
解决方法：可以考虑使用固定的延时或实现更复杂的计算任务，以使测试更具可重复性。
结果存储：在结果处理时，当前实现是将每个数字和结果存储在 results 列表中，这对于小规模数据是合理的，但在大规模数据时可能会导致内存问题。
解决方法：可以考虑分批处理结果，或将结果写入文件而不是在内存中保存。
异常处理：当前的异常处理只是捕获了所有异常，但没有提供详细的错误信息。
解决方法：可以将异常信息详细化，记录具体的错误类型和信息。

总结

通过以上逐步检查，C++和Python代码均存在一些可以改进的地方。针对每个问题提供了相应的解决方法。通过这些改进，可以提高代码的健壮性和用户体验。

2.3 RAG 应用

RAG 是一种将检索到的文档上下文与大语言模型（LLM）结合起来生成答案的技术。

整个过程主要分为以下几个步骤：

加载文档：将原始数据(来源可能是在线网站、本地文件、各类平台等)加载到 LangChain 中。

文档分割：将加载的文档分割成较小的块，以适应模型的上下文窗口，并更容易进行向量嵌入和检索。

存储嵌入：将分割后的文档内容嵌入到向量空间，并存储到向量数据库中，以便后续检索。

检索文档：通过查询向量数据库，检索与问题最相关的文档片段。

生成回答：将检索到的文档片段与用户问题组合，生成并返回答案。

通过这些步骤，可以构建一个强大的问答系统，将复杂任务分解为更小的步骤并生成详细回答。

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain import hub
from langchain_core.prompts import PromptTemplate

import os
from dotenv import load_dotenv, find_dotenv

# 删除all_proxy环境变量
if 'all_proxy' in os.environ:
    del os.environ['all_proxy']

# 删除ALL_PROXY环境变量
if 'ALL_PROXY' in os.environ:
    del os.environ['ALL_PROXY']

_ = load_dotenv(find_dotenv())

os.environ["http_proxy"] = "socks5://127.0.0.1:1080"
os.environ["https_proxy"] = "socks5://127.0.0.1:1080"

#保留的HTML元素的类名。想要保留类名为post-title、post-header和post-content的元素,在解析网页时，只有这些类名的元素会被保留下来，其他的HTML元素会被忽略。
bs4_strainer = bs4.SoupStrainer(class_=("post-title","post-header","post-content"))
loader = WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
                       bs_kwargs={"parse_only": bs4_strainer},)
docs = loader.load()


# 使用 RecursiveCharacterTextSplitter 将文档分割成块，每块1000字符，重叠200字符
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

vectorstore = Chroma.from_documents(documents=all_splits,embedding=OpenAIEmbeddings())
print(type(vectorstore))

retriever = vectorstore.as_retriever(search_type="similarity",
                search_kwargs={'k': 6})

# 定义 RAG 链，将用户问题与检索到的文档结合并生成答案
llm = ChatOpenAI(model="gpt-4o-mini")

# 使用 hub 模块拉取 rag 提示词模板
prompt = hub.pull("rlm/rag-prompt")

# 定义格式化文档的函数
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# 使用 LCEL 构建 RAG Chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print(rag_chain.invoke("What is ToT?"))
# 自定义提示词模板
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""

custom_rag_prompt = PromptTemplate.from_template(template)
custom_rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)
print("*"*40)
print(custom_rag_chain.invoke("What is ToT?"))

输出：

ToT stands for Tree of Thoughts, a reasoning framework that extends the Chain of Thought (CoT) approach. It involves breaking down problems into multiple thought steps, generating various thoughts at each step, and organizing them into a tree structure for evaluation. The process can utilize search strategies like breadth-first or depth-first search to explore different reasoning possibilities.
****************************************
ToT refers to "Tree of Thoughts," a framework that extends Chain of Thought (CoT) reasoning by exploring multiple reasoning possibilities at each step of problem-solving. It decomposes tasks into various thought steps and organizes them into a tree structure, utilizing search strategies like BFS or DFS for evaluation. Thanks for asking!

遇到的问题：

2.3.1 langchain_core 和 langchain_openai 的版本冲突

报错：
ImportError: cannot import name ‘InputTokenDetails’ from ‘langchain_core.messages.ai’

Traceback (most recent call last):
  File "/home/xjg/workspace/openai-quickstart/langchain/pycharm/LCEL/rag.py", line 4, in <module>
    from langchain_openai import ChatOpenAI
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/langchain_openai/__init__.py", line 1, in <module>
    from langchain_openai.chat_models import AzureChatOpenAI, ChatOpenAI
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/langchain_openai/chat_models/__init__.py", line 1, in <module>
    from langchain_openai.chat_models.azure import AzureChatOpenAI
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/langchain_openai/chat_models/azure.py", line 29, in <module>
    from langchain_openai.chat_models.base import BaseChatOpenAI
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/langchain_openai/chat_models/base.py", line 66, in <module>
    from langchain_core.messages.ai import (
ImportError: cannot import name 'InputTokenDetails' from 'langchain_core.messages.ai' (/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/langchain_core/messages/ai.py)

查看版本：

$ pip show langchain-core
Name: langchain-core
Version: 0.2.43
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author:
Author-email:
License: MIT
Location: /home/xjg/.conda/envs/langchain/lib/python3.10/site-packages
Requires: jsonpatch, langsmith, packaging, pydantic, PyYAML, tenacity, typing-extensions
Required-by: langchain, langchain-chroma, langchain-community, langchain-experimental, langchain-openai, langchain-text-splitters

解决方法：

$ pip install --upgrade langchain-openai langchain-core --proxy=""

再次检查：

$ pip show langchain-core
Name: langchain-core
Version: 0.3.15
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author:
Author-email:
License: MIT
Location: /home/xjg/.conda/envs/langchain/lib/python3.10/site-packages
Requires: jsonpatch, langsmith, packaging, pydantic, PyYAML, tenacity, typing-extensions
Required-by: langchain, langchain-chroma, langchain-community, langchain-experimental, langchain-openai, langchain-text-splitters

2.3.2 连接中断

报错：
requests.exceptions.ConnectionError: (‘Connection aborted.’, ConnectionResetError(104, ‘Connection reset by peer’))

USER_AGENT environment variable not set, consider setting it to identify your requests.
Traceback (most recent call last):
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connectionpool.py", line 490, in _make_request
    raise new_e
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
    self._validate_conn(conn)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
    conn.connect()
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connection.py", line 730, in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connection.py", line 909, in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 469, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 513, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/ssl.py", line 1104, in _create
    self.do_handshake()
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/ssl.py", line 1375, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/util/retry.py", line 474, in increment
    raise reraise(type(error), error, _stacktrace)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/util/util.py", line 38, in reraise
    raise value.with_traceback(tb)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connectionpool.py", line 490, in _make_request
    raise new_e
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connectionpool.py", line 466, in _make_request
    self._validate_conn(conn)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
    conn.connect()
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connection.py", line 730, in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/connection.py", line 909, in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 469, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 513, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/ssl.py", line 1104, in _create
    self.do_handshake()
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/ssl.py", line 1375, in do_handshake
    self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xjg/workspace/openai-quickstart/langchain/pycharm/LCEL/rag.py", line 38, in <module>
    docs = loader.load()
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/langchain_core/document_loaders/base.py", line 31, in load
    return list(self.lazy_load())
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/langchain_community/document_loaders/web_base.py", line 329, in lazy_load
    soup = self._scrape(path, bs_kwargs=self.bs_kwargs)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/langchain_community/document_loaders/web_base.py", line 308, in _scrape
    html_doc = self.session.get(url, **self.requests_kwargs)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/requests/adapters.py", line 682, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

解决方法：
打开代理，然后代码中添加

import os

os.environ["http_proxy"] = "socks5://127.0.0.1:1080"
os.environ["https_proxy"] = "socks5://127.0.0.1:1080"

2.3.3 缺少库

报错：
ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29’ not found (required by /home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)

File "/home/xjg/workspace/openai-quickstart/langchain/pycharm/LCEL/rag.py", line 6, in <module>
    from langchain_chroma import Chroma
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/langchain_chroma/__init__.py", line 6, in <module>
    from langchain_chroma.vectorstores import Chroma
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/langchain_chroma/vectorstores.py", line 24, in <module>
    import chromadb
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/chromadb/__init__.py", line 6, in <module>
    from chromadb.auth.token_authn import TokenTransportHeader
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/chromadb/auth/token_authn/__init__.py", line 24, in <module>
    from chromadb.telemetry.opentelemetry import (
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 13, in <module>
    from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/trace_exporter/__init__.py", line 20, in <module>
    from grpc import ChannelCredentials, Compression
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/grpc/__init__.py", line 22, in <module>
    from grpc import _compression
  File "/home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/grpc/_compression.py", line 20, in <module>
    from grpc._cython import cygrpc
ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/xjg/.conda/envs/langchain/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-x86_64-linux-gnu.so)

参考：
ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29‘ not found

sudo find / -name "libstdc++.so.6*"
strings /home/xjg/.conda/envs/ai_endpoint/lib/libstdc++.so.6.0.29 | grep GLIBCXX
sudo cp /home/xjg/.conda/envs/ai_endpoint/lib/libstdc++.so.6.0.29 /usr/lib/x86_64-linux-gnu/
sudo rm /usr/lib/x86_64-linux-gnu/libstdc++.so.6
sudo ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29 /usr/lib/x86_64-linux-gnu/libstdc++.so.6
strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX