Reasoning for Retrieval Augmented Generation

Jan 2, 2024

Can we do better Retrieval Augmented-Generation by checking how the model is internally "thinking" and processing sources?

pleias is releasing an initial series of experiments to enhance the reliability of RAG by drawing from the "attention scores". The basic idea is to leverage the internal reading process, as the model goes back and forth to the sources to find information and potential quotes.

The experiments are based on Pleias-Pico, our multilingual model (350m parameters) we pretrained with built-in support and structure for RAG and source analysis. Its design and size, makes it one of the most appropriate model to study attention across various ranges.



What is especially promising is to track back the dynamics of attention across text and how they relate to citation practices. The text is attending to different sources at time as it is either actively citing them or drawing information from them. Overall attention scores seems to be more reliable than the generated text: on one instance the model is citing a text from source 2 while claiming to use source 1 and attention scores do track the discrepancy. This opens up many opportunities for enhance the reliability and verifiability of RAG.

The overall approach is similar to the Entropix project and belong to the current wave of new LLM sampling research. Here we are not drawing from the entropy of token probabilities but from the attention relationships. There are currently many attemps to "augment" the models using supplementary processes like MCTS, that may not be that necessary if we properly internal processes and metrics that remain surprisingly under-used.

Maybe, after all, attention is all you need.