RAG is more "design" than "development"
I tried setting up a Retrieval Augmented Generation type chatbot recently, as part of my prep work for an upcoming project at Good Work
Speedrun setup
For a proof-of-concept, I chose to use as much 3rd party APIs as possible to speed up the prototyping process. Basically:
- I set up a Postgres Database with
pgvector
enabled, and created a table where one of the columns is a 1024-length vector. This vector length matches the embedding size in step 2. - I used VoyageAI to generate embeddings for stored knowledge and user query input.
- I used Anthropic for text generation.
Thoughts so far
Here are my thoughts on how it feels to actually be on the âeditorâ as well as the âuserâ side of things.
- Big idea is information retrieval using some form of âsemantic searchâ, then representing the âmost correctâ information as contextually-accurately as possible to the user.
- How the information is split and inserted into the database is probably important so that:
- We anticipate the granularity of userâs query and match the âblock sizeâ of information so that the most accurate unit of information can be retrieved, but not too granular that there is not enough context to construct a meaningful reply from.
- Probably on top of the âsemantic searchâ, if there is a clear user journey on the application side, we could tag information blocks with that metadata as well for a combined (presumably more effective) retrieval strategy.
- âPrompt engineeringâ plays quite a big part in the tone of the final response generated, as well as the length and specificity. So far the hints here gave me a really good starting point.
- RAG is way less complex than I thought, but the whole setup is a lot more subtle than I anticipated. There are many parameters available for fine-tuning, including how information is split, stored, retrieved and represented.
- LLMs can easily be asked to write HTML, or pseudo-XML tags. This is an interesting way to provide rich content responses.
- Information can be provided (input to the database) in a very human-like manner. Instead of building out a schema to capture every type of data structure, it is possible to just describe it in a list. e.g.
-
Here is a list of ingredients and their pictures. item: Egg image: egg.jpg item: Chicken image: chicken.jpg item: Spinach image: spinach.jpg```
- It is then possible for the LLM to provide an image when asked to show what âeggâ looks like. Coupled with (5) this provides a cool way to input non-textual information with metadata, and resurface them in LLM generated replies.
-
- When retrieving matching blocks of information, it might be interesting to do smaller blocks, but fuzzier matching (e.g. returning more blocks. ) Is there a context vs accuracy âsliderâ here?
I will update this post with additional thoughts.