The Future of Information Retrieval

A Deep Dive into RAG


Barry S. Stahl

Solution Architect & Developer

@bsstahl@cognitiveinheritance.com

https://CognitiveInheritance.com

Transparent Half Width Image 800x800.png

Favorite Physicists & Mathematicians

Favorite Physicists

  1. Harold "Hal" Stahl
  2. Carl Sagan
  3. Richard Feynman
  4. Marie Curie
  5. Nikola Tesla
  6. Albert Einstein
  7. Neil Degrasse Tyson
  8. Niels Bohr
  9. Galileo Galilei
  10. Michael Faraday

Other notables: Stephen Hawking, Edwin Hubble, Leonard Susskind, Christiaan Huygens

Favorite Mathematicians

  1. Ada Lovelace
  2. Alan Turing
  3. Johannes Kepler
  4. Rene Descartes
  5. Isaac Newton
  6. Emmy Noether
  7. George Boole
  8. Blaise Pascal
  9. Johann Gauss
  10. Grace Hopper

Other notables: Daphne Koller, Grady Booch, Leonardo Fibonacci, Evelyn Berezin, Benoit Mandelbrot

Fediverse Supporter

Logos.png

Some OSS Projects I Run

  1. Liquid Victor : Media tracking and aggregation [used to assemble this presentation]
  2. Prehensile Pony-Tail : A static site generator built in c#
  3. TestHelperExtensions : A set of extension methods helpful when building unit tests
  4. Conference Scheduler : A conference schedule optimizer
  5. IntentBot : A microservices framework for creating conversational bots on top of Bot Framework
  6. LiquidNun : Library of abstractions and implementations for loosely-coupled applications
  7. Toastmasters Agenda : A c# library and website for generating agenda's for Toastmasters meetings
  8. ProtoBuf Data Mapper : A c# library for mapping and transforming ProtoBuf messages

http://GiveCamp.org

GiveCamp.png

Achievement Unlocked

bss-100-achievement-unlocked-1024x250.png
 
 
 
 

Keyword Search

  • Tokenization

    • Break into lower-case tokens
    • best,ways,to,learn,about,my,problem,domain
  • Remove Stop Words

    • Words like "to," and "my" are removed
    • best,ways,learn,problem,domain
  • Stemming/Lemmatization

    • Reduce tokens to their root form
    • "runner" => "run", "children" => "child"
  • Inverted Index Lookup:

    • Find documents from words
Card Catalog 800x800.jpg

Evolution of Text Models

Timeline-Evolution of Search Tech.png

Questions to be Answered

  • What are Embeddings?
  • How does Vector Search leverage Embeddings to find relevant info?
  • How does RAG build on top of Vector Search?
  • How do we overcome the limitations of these models?
  • How do we leverage these tools for the benefit of our Users?
Questions to be Answered 800x800.jpg

Embeddings

  • A point in multi-dimensional space
  • Mathematical representation of a word or phrase
  • Encode both semantic and contextual information

  • Model: text-embedding-ada-002
  • Vectors normalized to unit length
  • Use 1536 dimensions
Embeddings - Cosmic Desert 800x800.jpg
  Ram - Just Statements.png
  Ram - With Clusters.png

Embedding

Creating Order from Chaos

  • Unstructured string => Structured float[]
  • Allows mathematical operations
    • Cosine Similarity & Distance
    • Nearest Neighbor Search
    • Clustering
    • Vector Addition & Subtraction
    • Dimensionality Reduction (e.g., PCA)
    • Anomaly Detection
Order from Chaos 800x800.jpg

LM Studio

  • Interface to language & embedding models
  • Fine-tune, evaluate, and integrate models into apps
  • Local server for testing models and applications
  • Models can be downloaded from HuggingFace
LMStudio - Features 800x800.png
 
 
 
 

Embedding Services

  • Local Models

    1. Load Model in LM Studio

    2. Start Local Inference Server

    3. Issue HTTP request for embedding

    4. Retrieve embedding from response

  • Remote Models

    1. Deploy model API

    2. Issue HTTP request for embedding

    3. Retrieve embedding from response

Postman - Get Embedding from Local Model.png

Cosine Similarity & Distance

Relate vectors based on the angle between them

  • Cosine Similarity ranges from -1 to 1, where:

    • +1 indicates that the vectors represent similar semantics & context
    • 0 indicates that the vectors are orthogonal (no similarity)
    • -1 indicates that the vectors have opposing semantics & context
  • Cosine Distance is defined as 1 - cosine similarity where:

    • 0 = Synonymous
    • 1 = Orthogonal
    • 2 = Antonymous

Note: For normalized vectors, cosine similarity is the same as the dot-product

Cosine Unit Circle - Enhanced.jpg

Cosine Distance

Cosine Distance 989x600.png

Cosine Distance

Angles2.svg

Embedding Distance

Feature Example
Synonym "Happy" is closer to "Joyful" than to "Sad"
Language "The Queen" is very close to "La Reina"
Idiom "He kicked the bucket" is closer to "He died" than to "He kicked the ball"
Sarcasm "Well, look who's on time" is closer to "Actually Late" than "Actually Early"
Homonym "Bark" (dog sound) is closer to "Howl" than to "Bark" (tree layer)
Collocation "Fast food" is closer to "Junk food" than to "Fast car"
Proverb "The early bird catches the worm" is closer to "Success comes to those who prepare well and put in effort" than to "A bird in the hand is worth two in the bush"
Metaphor "Time is money" is closer to "Don't waste your time" than to "Time flies"
Simile "He is as brave as a lion" is closer to "He is very courageous" than to "He is a lion"

Vector Databases

  • Store/retrieve high-dimensional vectors
  • Values are retrieved using similarity searches
  • Leverage data-structures such as K-D Trees
  • Examples
    • Azure AI Search
    • Redis
    • Qdrant
    • Pinecone
    • Chroma
VectorDB-650x650.png

KD-Tree

  • "Binary" Search across K-Dimensions
    • O(n log n) for construction
    • O(log n) for search
  • Construction
    • Recursively partitioning dataset
    • Rotate the dimension for each depth level
  • Nearest Neighbor Search
    • Recursively search for nearest neighbor
    • Backtrack to find additional neighbors
KD-Tree 800x800.jpg

Vector Search

Vector Search: critical to modern information retrieval systems

  • Closest vectors to a query vector
    • Identifies embeddings that are similar to the query
    • Requires the same embedding model for query and search vectors
  • Works with nearly any type of data
    • Images, text, audio, etc.
  • Still a stochastic process
    • May vary between executions
    • Will vary between models
    • May produce unexpected results
Vector Search 800x800.jpg
 
 
 

Resume Scanning

  • Armando's Resume: 12.7% "worse"
    • More distant from a match to the job listing
    • A 95 for Jonathon's resume ≈ an 84 for Armando's
  • If Armando had 2 additional years of experience
    • His score only increases by ≈ 1 point to 85
TwoResumes_800x800.jpg

Embeddings Search

Transformer Architectures - Embedding Only - 938x800.png

Cosine Distances

Country D(USA,C)
United Kingdom 0.11981422
United Kingdom of Great Britain and Northern Ireland 0.16426033
England 0.16678649
Argentine Republic 0.18766826
República Argentina 0.20281911
Country Distances.png

Embeddings Differences

  • Language
    • English (different dialects) vs Spanish
  • Historical Ties
    • USA and UK - Political (colonial), economic and cultural ties
  • Cultural Similarities
    • USA and UK - shared media, literature, entertainment, etc.
  • Political Systems
    • USA and UK - Democracies w/ common roots
  • Other differences
    • Economic Ties
    • Geopolitical Alliances
    • Educational Systems
    • Legal Systems
Embedding Differences - USA UK Argentina - 800x800.jpg
 

Resume Clustering

  • Clustering
    • Unsupervised machine learning technique
    • Clusters form around centroids (the geometric center)
    • Data points are grouped (clustered) based on similarity
    • Minimize the error (distance from centroid)
  • Resume Clustering
    • Identify a group of relevant resumes
    • Not for ranking, just filtering
    • Cluster the resumes, the GPT summaries, or both
    • Still not perfect
k-means results.png

Resume Summarization

  • System: As a resume scanner specializing in software engineers, focus solely on summarizing their key skills and experience durations, omitting demographics like age and location.

  • User: [Submits appropriate resume]

  • Assistant:

- Expertise in Java, J2EE, Spring Framework, Hibernate
- Experience with front-end technologies such as HTML, CSS, JavaScript, and AngularJS
- Proficient in SQL databases and ORM technologies
- Familiar with containerization and orchestration tools like Docker and Kubernetes
- Strong understanding of software design patterns and best practices
- Excellent problem-solving abilities and attention to detail
- Effective team player with strong communication skills

**Experience:**
- Senior Java Developer at XYZ Tech Solutions (June 2016 – Present, 5 years)
- Java Software Engineer at Innovative Software Co. (May 2012 – May 2016, 4 years)

**Certifications:**
- Oracle Certified Professional, Java SE 8 Programmer
- Certified ScrumMaster (CSM)

**Education:**
- Bachelor of Science in Computer Science, University of Technical Excellence (Graduated: May 2012)

Retrieval Augmented Generation (RAG)

  • Combines the benefits of retrieval-based and generative models

  • Identify and retrieve relevant information

    • Agument context of the generative models
    • Generative responses based on the augmented context
  • Potential uses include

    • Explore large documentation sets
    • Summarize articles in light of known relevant information
RAG 800x800.jpg

Beary - The Beary Barry Bot

Beary_600x600.png
 
 

Beary Flow

Beary Demo - Flowchart - Horizontal Flow - 1280x381.png
 
 
 
 

No More Search Engines

Information Recommendation 600x600.jpeg

We now use Information Recommendation Engines

More than Info Presenters

Information Radiation 600x600.jpeg

Our applications must be Information Radiators

More than just a query

Don't leave important information "on the table"

  • Leverage contextual data to enrich the user experience
    • Ensure interactions are relevant
  • Always Maintain high standards
    • User Privacy
    • User Data Protection
    • Consent where appropriate
More than Just a Query 800x800.jpeg

Contextual Clues

Use Responsibly - Be careful to respect user privacy

  • Time of Day & Week: Routine vs urgent
  • Mouse & Eye Movements: Regions of interest
  • Device & Platform: Accessibility preferences
  • Location: Geographical relevance
  • Browsing History: Interests
  • Social Media: Personal interests
  • Purchases: Preferences and future needs
  • Content Consumption Rate: Casual vs focused
  • Feedback: Satisfaction and preferences
Contextual Clues 800x800.jpeg
 

Meet Bentley

  • Role: Operations Manager
  • Location: Tolleson Dealership
  • Form of Address: Bentley
  • Pronouns: He/Him
  • Date Format: American (M/D/Y)
  • Time Format: 12-hour (1:45 pm)
  • Time Zone: Arizona (MST)
  • Info Format: Bullet-Points
bentley-silverstone 800x294.png

Operations Manager Role

Including details of the user's role allows the model to make better predictions about what is important to that user.

  • Key Responsibilities
  • Primary Goals and Metrics
  • Factors that impact decisions
  • Common Challenges
  • Tools and Technologies Used
Operations-Manager_Job-Description_800x269.png

Additional Information

Including additional context allows the model to make predictions about how this information might impact the user's activities and experiences

  • Location
    • Including local weather
    • Traffic if relevant
  • Current Situation
    • Upcoming events and requirements
    • Current state of the network
  • Other possibilities
    • Purchase propensity
    • User Survey Information
Weather and Key Info 800x379.png

Prompts

Allowing the model to make predictions about what information this user most needs to know, we can improve the user's experience and the relevance of our application's content

  • Better Awareness of Issues
  • Improved Decision-Making
  • Greater Efficiency
  • Improved User Satisfaction
  • Easier Adaptability
Prompts_800x600.png
 

Yo Dawg!

  • I heard you like 'cooking' so I calculated some recipes from your shopping list so you so you can turn that grocery haul into a Michelin-star meal
  • I heard you like 'apple products' so I ordered you a ladder so you can reach those elevated features
  • I heard you like coffee, so I scoured local social-media to make you a map so you can find all the best hidden cafes in town
  • I heard you like 'JavaScript' so I created a playlist for you composed entirely of loops so you can get into the proper frame-of-mind for coding
Definitely not Xzibit 800x800.jpg

What Context is Important?

Critical Context 600x600.jpeg

Consider carefully what context matters to your users

RAG via MCP

  • Allow the host agent to determine when and how to use our data
    • Based on our instructions and descriptions
    • Can also be used to take actions on the user's behalf
BearyMCP-Code-800x800.png

Challenge: Think Outside the App

How can we leverage these tools to create amazing experiences for our users?

  • Move Beyond Tables and Chat Boxes
    • Explore unconventional formats for information
  • Understand the user's goals
    • Design interactions that guide them to solutions
    • Example: CoPilot Suggestions

Resources

PresentationQR.png

What Are Embeddings?

  • Arrays of 1536 floating-point values
  • Numeric data representing unstructured data
  • Representations of the semantics and context of data
  • Vectors that support standard mathematical operations
Embeddings - Cosmic Desert 800x800.jpg

Usage of Embeddings

Embeddings can be used directly, or as an input to other models

  • Direct Usage
    • Measuring Semantic Distance
      • Quantify similarity between pieces of text
      • Useful for tasks like semantic search
    • Clustering for Pattern Discovery
      • Discover groupings in the data
      • Useful when categorizing user comments or other textual characteristics
  • Indirect Usage
    • Inputs to traditional ML models
    • Input to Transformer Attention Mechanisms
      • Dynamically adjusted by attention blocks
      • Powers text generation tasks
Cosmic Desert under the Milky Way 800x800.jpg

Retrieval Augmented Generation (RAG)

  • Combines the benefits of retrieval-based and generative models

  • Identify and retrieve relevant information

    • Agument context of the generative models
    • Generative responses based on the augmented context
  • Potential uses include

    • Explore large documentation sets
    • Summarize articles in light of known relevant information
RAG 800x800.jpg

Resources

PresentationQR.png