跳转到主要内容

标签(标签)

资源精选(342) Go开发(108) Go语言(103) Go(99) angular(82) LLM(75) 大语言模型(63) 人工智能(53) 前端开发(50) LangChain(43) golang(43) 机器学习(39) Go工程师(38) Go程序员(38) Go开发者(36) React(33) Go基础(29) Python(24) Vue(22) Web开发(20) Web技术(19) 精选资源(19) 深度学习(19) Java(18) ChatGTP(17) Cookie(16) android(16) 前端框架(13) JavaScript(13) Next.js(12) 安卓(11) 聊天机器人(10) typescript(10) 资料精选(10) NLP(10) 第三方Cookie(9) Redwoodjs(9) LLMOps(9) Go语言中级开发(9) 自然语言处理(9) PostgreSQL(9) 区块链(9) mlops(9) 安全(9) 全栈开发(8) ChatGPT(8) OpenAI(8) Linux(8) AI(8) GraphQL(8) iOS(8) 软件架构(7) Go语言高级开发(7) AWS(7) C++(7) 数据科学(7) whisper(6) Prisma(6) 隐私保护(6) RAG(6) JSON(6) DevOps(6) 数据可视化(6) wasm(6) 计算机视觉(6) 算法(6) Rust(6) 微服务(6) 隐私沙盒(5) FedCM(5) 语音识别(5) Angular开发(5) 快速应用开发(5) 提示工程(5) Agent(5) LLaMA(5) 低代码开发(5) Go测试(5) gorm(5) REST API(5) 推荐系统(5) WebAssembly(5) GameDev(5) CMS(5) CSS(5) machine-learning(5) 机器人(5) 游戏开发(5) Blockchain(5) Web安全(5) Kotlin(5) 低代码平台(5) 机器学习资源(5) Go资源(5) Nodejs(5) PHP(5) Swift(5) 智能体(4) devin(4) Blitz(4) javascript框架(4) Redwood(4) GDPR(4) 生成式人工智能(4) Angular16(4) Alpaca(4) 编程语言(4) SAML(4) JWT(4) JSON处理(4) Go并发(4) kafka(4) 移动开发(4) 移动应用(4) security(4) 隐私(4) spring-boot(4) 物联网(4) nextjs(4) 网络安全(4) API(4) Ruby(4) 信息安全(4) flutter(4) 专家智能体(3) Chrome(3) CHIPS(3) 3PC(3) SSE(3) 人工智能软件工程师(3) LLM Agent(3) Remix(3) Ubuntu(3) GPT4All(3) 软件开发(3) 问答系统(3) 开发工具(3) 最佳实践(3) RxJS(3) SSR(3) Node.js(3) Dolly(3) 移动应用开发(3) 低代码(3) IAM(3) Web框架(3) CORS(3) 基准测试(3) Go语言数据库开发(3) Oauth2(3) 并发(3) 主题(3) Theme(3) earth(3) nginx(3) 软件工程(3) azure(3) keycloak(3) 生产力工具(3) gpt3(3) 工作流(3) C(3) jupyter(3) 认证(3) prometheus(3) GAN(3) Spring(3) 逆向工程(3) 应用安全(3) Docker(3) Django(3) R(3) .NET(3) 大数据(3) Hacking(3) 渗透测试(3) C++资源(3) Mac(3) 微信小程序(3) Python资源(3) JHipster(3) 大型语言模型(2) 语言模型(2) 可穿戴设备(2) JDK(2) SQL(2) Apache(2) Hashicorp Vault(2) Spring Cloud Vault(2) Go语言Web开发(2) Go测试工程师(2) WebSocket(2) 容器化(2) AES(2) 加密(2) 输入验证(2) ORM(2) Fiber(2) Postgres(2) Gorilla Mux(2) Go数据库开发(2) 模块(2) 泛型(2) 指针(2) HTTP(2) PostgreSQL开发(2) Vault(2) K8s(2) Spring boot(2) R语言(2) 深度学习资源(2) 半监督学习(2) semi-supervised-learning(2) architecture(2) 普罗米修斯(2) 嵌入模型(2) productivity(2) 编码(2) Qt(2) 前端(2) Rust语言(2) NeRF(2) 神经辐射场(2) 元宇宙(2) CPP(2) 数据分析(2) spark(2) 流处理(2) Ionic(2) 人体姿势估计(2) human-pose-estimation(2) 视频处理(2) deep-learning(2) kotlin语言(2) kotlin开发(2) burp(2) Chatbot(2) npm(2) quantum(2) OCR(2) 游戏(2) game(2) 内容管理系统(2) MySQL(2) python-books(2) pentest(2) opengl(2) IDE(2) 漏洞赏金(2) Web(2) 知识图谱(2) PyTorch(2) 数据库(2) reverse-engineering(2) 数据工程(2) swift开发(2) rest(2) robotics(2) ios-animation(2) 知识蒸馏(2) 安卓开发(2) nestjs(2) solidity(2) 爬虫(2) 面试(2) 容器(2) C++精选(2) 人工智能资源(2) Machine Learning(2) 备忘单(2) 编程书籍(2) angular资源(2) 速查表(2) cheatsheets(2) SecOps(2) mlops资源(2) R资源(2) DDD(2) 架构设计模式(2) 量化(2) Hacking资源(2) 强化学习(2) flask(2) 设计(2) 性能(2) Sysadmin(2) 系统管理员(2) Java资源(2) 机器学习精选(2) android资源(2) android-UI(2) Mac资源(2) iOS资源(2) Vue资源(2) flutter资源(2) JavaScript精选(2) JavaScript资源(2) Rust开发(2) deeplearning(2) RAD(2)

category

Cohere的Command R在检索增强生成(RAG)和工具使用任务方面拥有高精度。它提供低延迟和高吞吐量,具有长的128k令牌上下文长度。此外,它还展示了10种关键语言的强大多语能力。

在这个工作室里,我们正在构建一个完全自主托管的“与您的文档聊天”RAG应用程序,使用:

  • -Cohere的“R”在当地使用Ollama服务。
  • -Qdrant矢量数据库(自托管)
  • -用于生成嵌入的Fastembed

下面是我们正在构建的内容的快速演示:

https://youtu.be/aLLw3iCPhtM

Run main notebook

You can start by running the main.ipynb notebook, which contains the essential code to set up a query engine for interacting with the repository you provide.

Getting started in a notebook

Chat with your documents app

You can also interact with your docs using a nice UI we've created using streamlit, served directly from the Studio. 

Follow the steps below to launch the app:

1. Click on the Streamlit plugin:

Launching the streamlit plugin

2. Then create a new app by clicking on the "New App" button on the top right (or clicking on "select a Studio file"):

Launching a new app

3. Now select the app.py, which contains the Streamlit application, and click on "Run":

Selecting the streamlit app code file

4. And there you go, you're now all set to chat with your documents!

Chat with your code streamlit app

Key architecture components

Building a robust RAG application involves a lot of moving parts, the architecture diagram presented below illustrates some of the key components & how they interact with each other, followed by detailed descriptions of each component, we've used:

  • -用于编排的LlamaIndex
  • -流媒体,用于创建聊天UI
  • -Cohere的命令R作为LLM
  • -Qdrant矢量数据库(自托管)
  • -用于生成嵌入的Fastembed

A chat with your docs RAG application

1. Custom knowledge base

Custom Knowledge Base: A collection of relevant and up-to-date information that serves as a foundation for RAG. It can be a database, a set of documents, or a combination of both. In this case it's a PDF provided by you that will be used as a source of truth to provide answers to user queries.

2. Chunking

Chunking is the process of breaking down a large input text into smaller pieces. This ensures that the text fits the input size of the embedding model and improves retrieval efficiency.

Following code will load pdf documents from a directory specified by the user using LlamaIndex's SimpleDirectoryReader:

1 2 3 4 5 6 7 8 9 from llama_index.core import SimpleDirectoryReader # load data loader = SimpleDirectoryReader( input_dir = input_dir_path, required_exts=[".pdf"], recursive=True ) docs = loader.load_data()

3. Embeddings model

A technique for representing text data as numerical vectors, which can be input into machine learning models. The embedding model is responsible for converting text into these vectors. We will use BAAI/bge-large-en-v1.5 as embedding model using Fastembed.

FastEmbed is a lightweight library with minimal dependencies, ideal for serverless runtimes. It prioritizes speed using the faster ONNX Runtime and offers accuracy surpassing OpenAI Ada-002, with support for various models, including multilingual ones.

Fastembed comes with seamless integration with Qdrant vector database, which we are going to use here.

1 2 3 from llama_index.embeddings.fastembed import FastEmbedEmbedding embed_model = FastEmbedEmbedding(model_name="BAAI/bge-large-en-v1.5")

4. Vector databases

A collection of pre-computed vector representations of text data for fast retrieval and similarity search, with capabilities like CRUD operations, metadata filtering, and horizontal scaling.

In this studio we are using Qdrant vector database, self hosted on the studio. Your data stays completely on premise. 

import qdrant_client
from llama_index.core import Settings
from qdrant_client.models import Distance, VectorParams, Batch

from llama_index.core import StorageContext
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.vector_stores.qdrant import QdrantVectorStore

# Creating an index over loaded data
Settings.embed_model = embed_model

client = qdrant_client.QdrantClient(
    host="localhost",
    port=6333
)
unique_collection_name = f"document_chat_{uuid.uuid4()}"
vector_store = QdrantVectorStore(client=client, collection_name=unique_collection_name)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    docs,
    storage_context=storage_context,
)

5. User chat interface

A user-friendly interface that allows users to interact with the RAG system, providing input query and receiving output. We have built a streamlit app to do the same. The code for it can be found in app.py

6. Query engine

The query engine takes query string to use it to fetch relevant context and then sends them both as a prompt to the LLM to generate a final natural language response. The LLM used here is Cohere's Command R! The final response is displayed in the user interface.

from llama_index.core import Settings
from llama_index.llms.ollama import Ollama

# setup the llm
llm=Ollama(model="command-r", request_timeout=60.0)

# Create the query engine
Settings.llm = llm
query_engine = index.as_query_engine()

7. Prompt template

A custom prompt template is use to refine the response from LLM & include the context as well:

from llama_index.core import PromptTemplate

qa_prompt_tmpl_str = (
            "Context information is below.\n"
            "---------------------\n"
            "{context_str}\n"
            "---------------------\n"
            "Given the context information above I want you to 
              think step by step to answer the query in a crisp manner,
              incase case you don't know the answer say 'I don't know!'.\n"
            "Query: {query_str}\n"
            "Answer: "
            )

qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)
query_engine.update_prompts({"response_synthesizer:text_qa_template": qa_prompt_tmpl})

response = query_engine.query('What is RAFT algorithm?')
print(response)

Conclusion

In this studio, we developed a completely self-hosted Retrieval Augmented Generation (RAG) application that allows you to "Chat with your documents", without compromising your data privacy. Throughout this process, we learned about LlamaIndex, the go to library for building RAG application & Cohere's Command R model designed for RAG application & tool use, served locally using Ollama.

We also learned how to self hosted vector database, we used Qdrant VectorDB and a fast, lightweight library Fastembed for embedding generation.

We also explored the concept of prompt engineering to refine and steer the responses of our LLM. These techniques can similarly be applied to anchor your LLM to various knowledge bases, such as documents, PDFs, videos, and more.

LlamaIndex has a variety of data loaders you can learn more about the same here.

Next Steps

As we continue to enhance the system, there are several promising directions to explore:

  • To further improve the accuracy of the retrieved context, we can use a dedicated reranker.

  • Optimizing the ingestion process is another critical area. Utilizing parallel ingestion with LlamaIndex

  • We can also experiment with a different chunking strategy by trying different chunk sizes & overlaps between them.

By pursuing these avenues, we aim to continuously improve the RAG system we have built!

文章链接

标签