RAG is more "design" than "development"

I tried setting up a Retrieval Augmented Generation type chatbot recently, as part of my prep work for an upcoming project at Good Work

Speedrun setup

For a proof-of-concept, I chose to use as much 3rd party APIs as possible to speed up the prototyping process. Basically:

I set up a Postgres Database with pgvector enabled, and created a table where one of the columns is a 1024-length vector. This vector length matches the embedding size in step 2.
I used VoyageAI to generate embeddings for stored knowledge and user query input.
I used Anthropic for text generation.

Thoughts so far

Here are my thoughts on how it feels to actually be on the “editor” as well as the “user” side of things.

Big idea is information retrieval using some form of “semantic search”, then representing the “most correct” information as contextually-accurately as possible to the user.
How the information is split and inserted into the database is probably important so that:
1. We anticipate the granularity of user’s query and match the “block size” of information so that the most accurate unit of information can be retrieved, but not too granular that there is not enough context to construct a meaningful reply from.
2. Probably on top of the “semantic search”, if there is a clear user journey on the application side, we could tag information blocks with that metadata as well for a combined (presumably more effective) retrieval strategy.
“Prompt engineering” plays quite a big part in the tone of the final response generated, as well as the length and specificity. So far the hints here gave me a really good starting point.
RAG is way less complex than I thought, but the whole setup is a lot more subtle than I anticipated. There are many parameters available for fine-tuning, including how information is split, stored, retrieved and represented.
LLMs can easily be asked to write HTML, or pseudo-XML tags. This is an interesting way to provide rich content responses.
Information can be provided (input to the database) in a very human-like manner. Instead of building out a schema to capture every type of data structure, it is possible to just describe it in a list. e.g.
- ```
Here is a list of ingredients and their pictures. 

item: Egg
image: egg.jpg

item: Chicken
image: chicken.jpg

item: Spinach
image: spinach.jpg```
```
- It is then possible for the LLM to provide an image when asked to show what “egg” looks like. Coupled with (5) this provides a cool way to input non-textual information with metadata, and resurface them in LLM generated replies.
When retrieving matching blocks of information, it might be interesting to do smaller blocks, but fuzzier matching (e.g. returning more blocks. ) Is there a context vs accuracy “slider” here?

I will update this post with additional thoughts.