跳转到主要内容

标签(标签)

资源精选(342) Go开发(108) Go语言(103) Go(99) angular(82) LLM(75) 大语言模型(63) 人工智能(53) 前端开发(50) LangChain(43) golang(43) 机器学习(39) Go工程师(38) Go程序员(38) Go开发者(36) React(33) Go基础(29) Python(24) Vue(22) Web开发(20) Web技术(19) 精选资源(19) 深度学习(19) Java(18) ChatGTP(17) Cookie(16) android(16) 前端框架(13) JavaScript(13) Next.js(12) 安卓(11) 聊天机器人(10) typescript(10) 资料精选(10) NLP(10) 第三方Cookie(9) Redwoodjs(9) LLMOps(9) Go语言中级开发(9) 自然语言处理(9) PostgreSQL(9) 区块链(9) mlops(9) 安全(9) 全栈开发(8) ChatGPT(8) OpenAI(8) Linux(8) AI(8) GraphQL(8) iOS(8) 软件架构(7) Go语言高级开发(7) AWS(7) C++(7) 数据科学(7) whisper(6) Prisma(6) 隐私保护(6) RAG(6) JSON(6) DevOps(6) 数据可视化(6) wasm(6) 计算机视觉(6) 算法(6) Rust(6) 微服务(6) 隐私沙盒(5) FedCM(5) 语音识别(5) Angular开发(5) 快速应用开发(5) 提示工程(5) Agent(5) LLaMA(5) 低代码开发(5) Go测试(5) gorm(5) REST API(5) 推荐系统(5) WebAssembly(5) GameDev(5) CMS(5) CSS(5) machine-learning(5) 机器人(5) 游戏开发(5) Blockchain(5) Web安全(5) Kotlin(5) 低代码平台(5) 机器学习资源(5) Go资源(5) Nodejs(5) PHP(5) Swift(5) 智能体(4) devin(4) Blitz(4) javascript框架(4) Redwood(4) GDPR(4) 生成式人工智能(4) Angular16(4) Alpaca(4) 编程语言(4) SAML(4) JWT(4) JSON处理(4) Go并发(4) kafka(4) 移动开发(4) 移动应用(4) security(4) 隐私(4) spring-boot(4) 物联网(4) nextjs(4) 网络安全(4) API(4) Ruby(4) 信息安全(4) flutter(4) 专家智能体(3) Chrome(3) CHIPS(3) 3PC(3) SSE(3) 人工智能软件工程师(3) LLM Agent(3) Remix(3) Ubuntu(3) GPT4All(3) 软件开发(3) 问答系统(3) 开发工具(3) 最佳实践(3) RxJS(3) SSR(3) Node.js(3) Dolly(3) 移动应用开发(3) 低代码(3) IAM(3) Web框架(3) CORS(3) 基准测试(3) Go语言数据库开发(3) Oauth2(3) 并发(3) 主题(3) Theme(3) earth(3) nginx(3) 软件工程(3) azure(3) keycloak(3) 生产力工具(3) gpt3(3) 工作流(3) C(3) jupyter(3) 认证(3) prometheus(3) GAN(3) Spring(3) 逆向工程(3) 应用安全(3) Docker(3) Django(3) R(3) .NET(3) 大数据(3) Hacking(3) 渗透测试(3) C++资源(3) Mac(3) 微信小程序(3) Python资源(3) JHipster(3) 大型语言模型(2) 语言模型(2) 可穿戴设备(2) JDK(2) SQL(2) Apache(2) Hashicorp Vault(2) Spring Cloud Vault(2) Go语言Web开发(2) Go测试工程师(2) WebSocket(2) 容器化(2) AES(2) 加密(2) 输入验证(2) ORM(2) Fiber(2) Postgres(2) Gorilla Mux(2) Go数据库开发(2) 模块(2) 泛型(2) 指针(2) HTTP(2) PostgreSQL开发(2) Vault(2) K8s(2) Spring boot(2) R语言(2) 深度学习资源(2) 半监督学习(2) semi-supervised-learning(2) architecture(2) 普罗米修斯(2) 嵌入模型(2) productivity(2) 编码(2) Qt(2) 前端(2) Rust语言(2) NeRF(2) 神经辐射场(2) 元宇宙(2) CPP(2) 数据分析(2) spark(2) 流处理(2) Ionic(2) 人体姿势估计(2) human-pose-estimation(2) 视频处理(2) deep-learning(2) kotlin语言(2) kotlin开发(2) burp(2) Chatbot(2) npm(2) quantum(2) OCR(2) 游戏(2) game(2) 内容管理系统(2) MySQL(2) python-books(2) pentest(2) opengl(2) IDE(2) 漏洞赏金(2) Web(2) 知识图谱(2) PyTorch(2) 数据库(2) reverse-engineering(2) 数据工程(2) swift开发(2) rest(2) robotics(2) ios-animation(2) 知识蒸馏(2) 安卓开发(2) nestjs(2) solidity(2) 爬虫(2) 面试(2) 容器(2) C++精选(2) 人工智能资源(2) Machine Learning(2) 备忘单(2) 编程书籍(2) angular资源(2) 速查表(2) cheatsheets(2) SecOps(2) mlops资源(2) R资源(2) DDD(2) 架构设计模式(2) 量化(2) Hacking资源(2) 强化学习(2) flask(2) 设计(2) 性能(2) Sysadmin(2) 系统管理员(2) Java资源(2) 机器学习精选(2) android资源(2) android-UI(2) Mac资源(2) iOS资源(2) Vue资源(2) flutter资源(2) JavaScript精选(2) JavaScript资源(2) Rust开发(2) deeplearning(2) RAD(2)

category

Introduction

Large Language Models (LLMs) are increasingly utilised across various domains, including question answering over private enterprise documents, where data security and robustness are paramount.

Retrieval-Augmented Generation (RAG) is a prominent framework for building such applications, but ensuring its robustness requires extensive customisation.

This study shares experiences in deploying an LLM application for question answering over private organisational documents, using a system called Tree-RAG (T-RAG) that incorporates entity hierarchies for improved performance.

Evaluations demonstrate the effectiveness of this approach, providing valuable insights for real-world LLM applications.

Data Privacy

Security risks are a primary concern due to the sensitive nature of these documents, making it impractical to use proprietary LLM models over publich APIs, to avoid data leakage risks.

This calls for the use of open-source models that can be deployed on-premise.

Additionally, limited computational resources and smaller training datasets based on available documents present challenges.

Furthermore, ensuring reliable and accurate responses to user queries adds complexity, necessitating extensive customisation and decision-making in deploying robust applications in such environments.

Take-Aways

What interested me in this study is that the researches develop an application that integrates Retrieval-Augmented Generation (RAG) with a fine-tuned open-source Large Language Model (LLM) for generating responses. This model is trained using an instruction dataset derived from the organisation’s documents.

They introduce a novel evaluation metric, termed Correct-Verbose, designed to assess the quality of generated responses. This metric evaluates responses based on their correctness while also considering the inclusion of additional relevant information beyond the scope of the original question.

T-RAG

Below the workflow of Tree-RAG (T-RAG)…

For a given user query, the vector database is searched for the relevant document chunks, the chunk serves as the contextual reference for LLM in-context learning.

If the query mentions any organisational related entities, information regarding the entities is extracted from the entities tree and added to the context. The fine-tuned Llama-2 7B model generates a response from the presented data.

Source

A feature of T-RAG is the inclusion of an entities tree in addition to the vector database for context retrieval.

Entities Tree

One distinguishing aspect of T-RAG is its incorporation of an entities tree alongside the vector database for context retrieval. The entities tree stores details regarding the organization’s entities and their hierarchical arrangement. Each node within this tree represents an entity, with parent nodes indicating their respective group memberships.

During the retrieval process, the framework leverage the entities tree to enhance the context retrieved from the vector database.

The procedure for entity tree search and context generation unfolds as follows:

  1. Initially, a parser module scans the user query for keywords corresponding to entity names within the organisation.
  2. Upon identifying one or more matches, details regarding each matched entity are extracted from the tree.
  3. These details are transformed into textual statements that furnish information about the entity and its position within the organisation’s hierarchy.
  4. Subsequently, this information is amalgamated with the document chunks retrieved from the vector database to construct the context.
  5. By adopting this approach, the model gains access to pertinent information about entities and their hierarchical positioning within the organisation when users inquire about them.
Source

Considering the image above, the retrieval process for context generation involves utilising an illustrative example from an organisational chart to demonstrate how tree search and retrieval are executed.

In addition to fetching contextual documents, a spaCy library is used with custom rules to identify named entities within the organisation.

If the query contains one or more such entities, relevant information regarding the entity’s hierarchical location is extracted from the tree and transformed into textual statements. These statements are then incorporated into the context along with the retrieved documents.

However, if the user’s query does not mention any entities, the tree search is omitted, and only the context from the retrieved documents is utilised.

In Conclusion

I found this study fascinating in the sense that it combines RAG and also fine-tuning. While making use of an open-sourced model hosted on premise to address issues of data-privacy, while simultaneously solving for inference latency, token usage cost and regional and geographic availability.

It is also interesting how entities are used via spaCy framework for entity search and context generation. The fact that this was not just a research piece, but lessons learned based on experiences building an LLM application for real-world use.