category
LLM和ML模型的评估和测试框架
控制人工智能模型中的性能、偏见和安全问题的风险
Install Giskard 🐢
Install the latest version of Giskard from PyPi using pip:
pip install "giskard[llm]" -U
We officially support Python 3.9, 3.10 and 3.11.
Try in Colab 📙
Giskard是一个开源Python库,可以自动检测人工智能应用程序中的性能、偏见和安全问题。该库涵盖了基于LLM的应用程序,如RAG代理,一直到用于表格数据的传统ML模型。
Scan: Automatically assess your LLM-based agents for performance, bias & security issues ⤵️
Issues detected include:
- 幻觉
- 有害内容生成
- 提示注入
- 稳健性问题
- 敏感信息披露
- 刻板印象和歧视
- 更多。。。
RAG评估工具包(RAGET):自动生成评估数据集并评估RAG应用程序的答案⤵️
If you're testing a RAG application, you can get an even more in-depth assessment using RAGET, Giskard's RAG Evaluation Toolkit.
-
RAGET can generate automatically a list of
question
,reference_answer
andreference_context
from the knowledge base of the RAG. You can then use this generated test set to evaluate your RAG agent. -
RAGET computes scores for each component of the RAG agent. The scores are computed by aggregating the correctness of the agent’s answers on different question types.
- Here is the list of components evaluated with RAGET:
Generator
: the LLM used inside the RAG to generate the answersRetriever
: fetch relevant documents from the knowledge base according to a user queryRewriter
: rewrite the user query to make it more relevant to the knowledge base or to account for chat historyRouter
: filter the query of the user based on his intentionsKnowledge Base
: the set of documents given to the RAG to generate the answers
- Here is the list of components evaluated with RAGET:
Giskard works with any model, in any environment and integrates seamlessly with your favorite tools ⤵️
Contents
- 🤸♀️ Quickstart
- 👋 Community
🤸♀️ Quickstart
1. 🏗️ Build a LLM agent
Let's build an agent that answers questions about climate change, based on the 2023 Climate Change Synthesis Report by the IPCC.
Before starting let's install the required libraries:
pip install langchain tiktoken "pypdf<=3.17.0"
from langchain import OpenAI, FAISS, PromptTemplate from langchain.embeddings import OpenAIEmbeddings from langchain.document_loaders import PyPDFLoader from langchain.chains import RetrievalQA from langchain.text_splitter import RecursiveCharacterTextSplitter # Prepare vector store (FAISS) with IPPC report text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, add_start_index=True) loader = PyPDFLoader("https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf") db = FAISS.from_documents(loader.load_and_split(text_splitter), OpenAIEmbeddings()) # Prepare QA chain PROMPT_TEMPLATE = """You are the Climate Assistant, a helpful AI assistant made by Giskard. Your task is to answer common questions on climate change. You will be given a question and relevant excerpts from the IPCC Climate Change Synthesis Report (2023). Please provide short and clear answers based on the provided context. Be polite and helpful. Context: {context} Question: {question} Your answer: """ llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0) prompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["question", "context"]) climate_qa_chain = RetrievalQA.from_llm(llm=llm, retriever=db.as_retriever(), prompt=prompt)
2. 🔎 Scan your model for issues
Next, wrap your agent to prepare it for Giskard's scan:
import giskard import pandas as pd def model_predict(df: pd.DataFrame): """Wraps the LLM call in a simple Python function. The function takes a pandas.DataFrame containing the input variables needed by your model, and must return a list of the outputs (one for each row). """ return [climate_qa_chain.run({"query": question}) for question in df["question"]] # Don’t forget to fill the `name` and `description`: they are used by Giskard # to generate domain-specific tests. giskard_model = giskard.Model( model=model_predict, model_type="text_generation", name="Climate Change Question Answering", description="This model answers any question about climate change based on IPCC reports", feature_names=["question"], )
✨✨✨Then run Giskard's magical scan✨✨✨
scan_results = giskard.scan(giskard_model)
Once the scan completes, you can display the results directly in your notebook:
display(scan_results) # Or save it to a file scan_results.to_html("scan_results.html")
If you're facing issues, check out our docs for more information.
3. 🪄 Automatically generate an evaluation dataset for your RAG applications
If the scan found issues in your model, you can automatically extract an evaluation dataset based on the issues found:
test_suite = scan_results.generate_test_suite("My first test suite")
By default, RAGET automatically generates 6 different question types (these can be selected if needed, see advanced question generation). The total number of questions is divided equally between each question type. To make the question generation more relevant and accurate, you can also provide a description of your agent.
from giskard.rag import generate_testset, KnowledgeBase # Load your data and initialize the KnowledgeBase df = pd.read_csv("path/to/your/knowledge_base.csv") knowledge_base = KnowledgeBase.from_pandas(df, columns=["column_1", "column_2"]) # Generate a testset with 10 questions & answers for each question types (this will take a while) testset = generate_testset( knowledge_base, num_questions=60, language='en', # optional, we'll auto detect if not provided agent_description="A customer support chatbot for company X", # helps generating better questions )
Depending on how many questions you generate, this can take a while. Once you’re done, you can save this generated test set for future use:
# Save the generated testset testset.save("my_testset.jsonl")
You can easily load it back
from giskard.rag import QATestset loaded_testset = QATestset.load("my_testset.jsonl") # Convert it to a pandas dataframe df = loaded_testset.to_pandas()
Here’s an example of a generated question:
question | reference_context | reference_answer | metadata |
---|---|---|---|
For which countries can I track my shipping? | Document 1: We offer free shipping on all orders over $50. For orders below $50, we charge a flat rate of $5.99. We offer shipping services to customers residing in all 50 states of the US, in addition to providing delivery options to Canada and Mexico. Document 2: Once your purchase has been successfully confirmed and shipped, you will receive a confirmation email containing your tracking number. You can simply click on the link provided in the email or visit our website’s order tracking page. | We ship to all 50 states in the US, as well as to Canada and Mexico. We offer tracking for all our shippings. | {"question_type": "simple", "seed_document_id": 1, "topic": "Shipping policy"} |
Each row of the test set contains 5 columns:
question
: the generated questionreference_context
: the context that can be used to answer the questionreference_answer
: the answer to the question (generated with GPT-4)conversation_history
: not shown in the table above, contain the history of the conversation with the agent as a list, only relevant for conversational question, otherwise it contains an empty list.metadata
: a dictionary with various metadata about the question, this includes the question_type, seed_document_id the id of the document used to generate the question and the topic of the question
- 登录 发表评论