← Blog
6 min read

How to Add Real-Time News to Your RAG Pipeline

Most RAG pipelines have a freshness problem. Here's how to add verified, confidence-scored intelligence to your vector database so your agent always has current context.

Your RAG pipeline probably has a freshness problem. The documents in your vector database were indexed days or weeks ago. When a user asks “What happened with the Fed rate decision?” your agent retrieves stale context and hallucinates the rest.

The fix: continuously ingest verified intelligence into your vector database, with confidence scores and bias metadata on every document. Here's how.

The Problem with Static RAG

Without real-time intelligence

Agent retrieves a 3-week-old article about the Fed, hallucinates a rate decision that didn't happen.

With Polaris intelligence

Agent retrieves a verified brief from 2 hours ago with confidence score 0.94, source count 7, and counter-arguments.

Ingest Briefs into Your Vector DB

python
from polaris_news import PolarisClient
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

client = PolarisClient(api_key="pr_live_xxx")
embeddings = OpenAIEmbeddings()
vectorstore = Chroma(embedding_function=embeddings, persist_directory="./polaris_db")

def ingest_latest():
    """Pull latest briefs and add to vector DB with metadata."""
    feed = client.feed(per_page=20)
    docs = []
    for brief in feed.briefs:
        docs.append({
            "page_content": f"{brief.headline}\n\n{brief.summary}",
            "metadata": {
                "source": "polaris",
                "brief_id": brief.id,
                "category": brief.category,
                "confidence": brief.confidence,
                "bias_score": brief.bias_score,
                "published_at": brief.published_at,
                "source_count": brief.source_count,
            }
        })
    vectorstore.add_texts(
        texts=[d["page_content"] for d in docs],
        metadatas=[d["metadata"] for d in docs],
    )
    print(f"Ingested {len(docs)} briefs")

ingest_latest()

Filter by Confidence at Retrieval

Not all sources are equal. Use the confidence score to filter at retrieval time.

python
# Only retrieve high-confidence documents
results = vectorstore.similarity_search(
    "What happened with the Fed rate decision?",
    k=5,
    filter={"confidence": {"$gte": 0.7}}
)

for doc in results:
    conf = doc.metadata.get("confidence", 0)
    print(f"[{conf:.0%}] {doc.page_content[:100]}...")

Schedule Continuous Ingestion

python
import schedule, time

schedule.every(10).minutes.do(ingest_latest)

while True:
    schedule.run_pending()
    time.sleep(1)

Or use a GitHub Action, cron job, or the Polaris webhook system to push new briefs to your pipeline as they publish.

Get Started