Memory

TARS uses an advanced Retrieval-Augmented Generation (RAG) system for memory management. Our system offers two distinct search methods, with the newer Hybrid Search being the recommended approach.

1. Naive RAG (Legacy)

Vector search (Naive RAG) converts text into high-dimensional numerical representations (embeddings) and uses cosine similarity to find semantically similar documents. While fast and effective for capturing general meaning, it may sometimes miss specific keyword matches.

Key characteristics:

Fast retrieval speed
Good at understanding semantic similarity
Works well with conceptual queries
May occasionally miss exact keyword matches

Example:

# Get the embedding for your query text
embedding = get_embedding("your query text")
# Perform a vector search based on that embedding
results = vector_search(embedding)

2. Hybrid RAG (Recommended)

Our hybrid search (Hybrid RAG) combines multiple techniques to provide more accurate and reliable results:

Vector Search: Creates embeddings to understand semantic meaning
BM25 Full Text Search: Performs keyword-based matching
Reciprocal Rank Fusion (RRF): Intelligently combines results from both methods
Neural Reranking: Fine-tunes results using a CrossEncoder model

Benefits of each component:

Vector Search: Captures conceptual similarity and context
BM25: Ensures important keywords aren’t missed
RRF: Balances semantic and keyword matching
Neural Reranking: Improves accuracy through deep learning

Example:

# Perform a hybrid search that combines all methods
results = hybrid_search("your query text")
 
# Example response format:
[
  {
    "document": "Memory text example 1",
    "score": 0.95,
    "metadata": {
      "source": "conversation_123",
      "timestamp": "2024-03-20T10:30:00Z"
    }
  },
  // Additional results...
]

💡

Best Practice: Use hybrid RAG for most applications. Fall back to naive RAG only if you need maximum speed and can tolerate slightly lower accuracy.

Future Works

Our system currently faces challenges in context management when retrieving the top-k most relevant results. For example, in a conversation where a user asks “What’s the capital of France?”, gets the answer “Paris”, and then asks “What’s its population?”, the system might miss that “its” refers to “Paris” because it prioritizes relevance over temporal context.

We are actively working on solutions such as temporal decay factors for relevance scores, maintaining sliding windows of recent context, and implementing multi-hop retrieval to connect related pieces of information. For TARS to truly function as a personal assistant, we’re also exploring more sophisticated memory management approaches like knowledge graphs to track connections between entities and events, and structured memory types to manage different types of information (episodic, semantic, procedural) separately.

📊

Detailed performance benchmarks comparing different RAG approaches are currently in development and will be published soon.

⚠️

Implementation of these advanced features requires careful consideration of privacy, computational resources, and complexity trade-offs.

👉

Documentation Contributors: @alexander-wang03 @latishab

Character Text-to-Speech