Development
2 min read

Building a Personalized ChatGPT: Architecting Your Own Q&A Engine

Published on
September 19, 2023
Contributors
Maximilian Bielecki
CEO Essentio CodeLab
Christian Vancea
CEO & Co-Founder Essentio
Subscribe to our newsletter
Read about our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

In today's burgeoning AI landscape, Large Language Models (LLMs) like ChatGPT and GPT-4 have emerged as significant players. Many companies and individuals are intrigued by the prospect of integrating these models with their specific datasets. But how feasible is it to cultivate a tailored ChatGPT experience, especially with one's proprietary corporate data?

Join us on this exploration as we uncover the architecture and data requisites for crafting "your private ChatGPT", enriched with your unique datasets. We will also delve into the inherent potential and the nuances of circumventing the model's intrinsic limitations.

Disclaimer: While this article leans into architectural concepts universal in nature, for illustrative purposes, we will be referencing Azure services, given my role as a Solution Architect at Microsoft.

1. Disadvantages of Finetuning an LLM with Your Own Data

Often, finetuning a pretrained language model with specific data is seen as the solution for personalized results. However, there are inherent challenges, such as hallucinations, as spotlighted during GPT-4's launch. Additionally, GPT-4's knowledge only extends up to September 2021.

Some notable drawbacks of finetuning LLMs include:

  • Factual Correctness & Traceability: It becomes ambiguous discerning the origin of an answer.
  • Access Control: It's challenging to restrict certain documents to specific users or groups.
  • Cost Implications: Every addition of new documents necessitates model retraining, escalating costs.

Given these obstacles, using finetuning for Question Answering (QA) seems nearly impossible. How, then, can we extract the best from these LLMs?

2. Separate Your Knowledge from Your Language Model

For accuracy, we must delineate the language model from our knowledge base. This strategy harnesses the LLM's semantic prowess while offering relevant data. The process is real-time and sidesteps any model retraining.

Although feeding every document to the model during execution may seem ideal, it's impractical due to token processing limits. For instance, while GPT-3 supports up to 4K tokens, GPT-4 ranges from 8K to 32K tokens. Economically, fewer tokens can also lead to cost savings.

Here's a simplified approach:

  • The user poses a question.
  • The application retrieves the most pertinent text likely to contain the answer.
  • This precise prompt, containing the relevant data, is fed to the LLM.
  • The user either gets a well-informed answer or a 'No answer found' reply.

This methodology is often termed Retrieval Augmented Generation (RAG), wherein the application offers the LLM added context to generate answers rooted in factual sources.

3. Retrieve the Most Relevant Data

The importance of context cannot be overstated. By establishing a robust knowledge base, we can utilize semantic search to feed the LLM the most appropriate documents, enabling it to produce accurate answers.

3.1 Chunk and Split Your Data:

Considering the token limit, documents need segmentation into manageable chunks. Depending on chunk size, one could even amalgamate sections for a comprehensive answer.

One can start with basic strategies like page-wise splits or employ text splitters based on token length. Post-segmentation, it's crucial to develop a search index responsive to user queries. Embedding metadata, such as the original source or page number, can enrich the search experience and aid in traceability.

Options for Implementation:

  • Leverage a pre-existing service like Azure's Cognitive Search, which offers semantic ranking aided by Bing's language models.
  • Utilize OpenAI's text embedding models for a more granular control over your search index. Here, precomputed embeddings are stored and used for real-time comparison with user-generated embeddings.

3.2 Improve Relevancy with Different Chunking Strategies:

Understanding the data and potential user queries is pivotal. Depending on the data nature, certain strategies like the sliding window or additional context provision can bolster relevancy.

4. Craft a Precise Prompt to Ward Off Hallucinations

In the evolving world of AI, the art of prompting is akin to programming. Here, through meticulous instructions or deftly crafted examples, we instruct the model. As foundational as it is pivotal, the prompt is your line of defense against misleading or out-of-context responses. With the burgeoning recognition of "prompt engineering" as a distinct skill, there's an influx of sample prompts being shared, expanding the collective wisdom.

4.1. The Principles of Prompt Design

Clarity: The prompt must unequivocally direct the model to remain succinct and rely exclusively on the provided context.

Fallback Mechanism: In instances where the model hits a roadblock and cannot deduce an answer, it must resort to a default ‘no answer’ response.

Source Citation: Every output should be appended with footnotes or references linking back to the original document. This transparency not only boosts credibility but also offers users a route to validate the model's response.

Here's a prompt blueprint:

"You are a smart aide for Contoso Inc, assisting employees with queries related to their healthcare plan and employee handbook.
Reference 'you' when addressing the individual posing the question, even if the query is framed using 'I'.
Respond to the subsequent query solely based on the data gleaned from the sources that follow.
Should the response include tabulated information, format it as an HTML table, eschewing markdown.
Each information source is tagged with a distinct name. Ensure your response encapsulates the source name for each fact cited.
In the event you cannot deduce an answer from the provided sources, simply state 'I don't know'."

Example:
Question: 'What's the deductible for the employee plan concerning a visit to Overlake in Bellevue?'
Sources:
info1.txt: In-network vs. out-of-network deductibles vary. While in-network stands at $500 for an individual and $1000 for a family, out-of-network is $1000 and $2000 respectively.
info2.pdf: Overlake, for the employee plan, is categorized as in-network.
info3.pdf: Overlake encompasses an area inclusive of a park and ride in Bellevue's proximity.
info4.pdf: Institutions like Overlake and Swedish, among others, are under the in-network umbrella.
Answer:
The in-network deductible is pegged at $500 for an individual, while for families, it's $1000 [info1.txt]. Given that Overlake is part of the in-network cadre for the employee plan, this holds true [info2.pdf][info4.pdf].


In practice, "{q}" is populated with the user's question and "{retrieved}" incorporates relevant sections from the knowledge base to sculpt the final prompt.

4.2. Harnessing One-Shot Learning and Tweaking Parameters

The potency of one-shot learning is leveraged to refine the output. By furnishing an exemplar of the desired user query handling, the model is better positioned to deliver. And if the pursuit is a more consistent, predictable response, it's advisable to dial down the temperature in your parameters. Cranking it up, however, can yield more inventive outcomes.

Ultimately, this prompt is the key to eliciting a response through platforms like the (Azure) OpenAI API. Utilizing models like gpt-35-turbo (ChatGPT), the conversation's historical context can be integrated, paving the way for clarifying questions or alternate cognitive tasks, such as summaries. A fantastic repository for prompt engineering insights is dair-ai/Prompt-Engineering-Guide available on GitHub.

5. Next Steps

There are a plethora of tools and resources available for those interested in venturing into this domain:

  • Azure OpenAI Service – On Your Data: This feature provides a seamless way to integrate OpenAI models like ChatGPT and GPT-4 with your data.
  • ChatGPT Retrieval Plugin: This will enable ChatGPT to fetch up-to-date information. While it currently supports only the public ChatGPT, we anticipate expanded capabilities in the future.
  • LangChain: An esteemed library for integrating LLMs with other knowledge sources or computational methods.
  • Azure Cognitive Search + OpenAI Accelerator: Ready-to-deploy solutions for a ChatGPT-like experience over your proprietary data.
  • OpenAI Cookbook: Demonstrates leveraging OpenAI embeddings for Q&A in Jupyter notebooks.
  • Semantic Kernel: This innovative library facilitates the fusion of conventional programming languages with LLMs.

Exploring tools like LangChain or Semantic Kernel further can amplify the capabilities of 'your own ChatGPT'.

6. Conclusion

Building a private ChatGPT, armed with your own data, is not a journey without its challenges. However, with the right approach, architecture, and guidance, it is very much within reach. It's crucial to separate the knowledge from the language model, retrieve data efficiently, and craft concise prompts to ensure the desired output.

We are proud to share that in collaboration with our esteemed client, 123Sonography, we have already successfully developed and deployed such a model. This experience has allowed us to understand the intricate nuances and best practices, which we are eager to share with the broader community.

In essence, relying solely on a language model to generate factual text can be misleading. Fine-tuning a model, while having its own set of challenges, doesn't grant the model new knowledge nor a reliable verification mechanism. By building a Q&A engine atop an LLM and separating the knowledge base from the large language model, you can generate answers anchored in the provided context, ensuring accuracy and reliability.

Ready to talk

Let's talk tech, and take your business to the next level!

Loading ideas...