Skip to content

RAG-Based AI Assistant with pgvector

Portfolio Best Practices

This project was built as part of the Datalumina AI Engineering program — a production-focused curriculum designed to bridge backend engineering with modern Generative AI.

Project Summary

Program: Datalumina AI Engineering Duration: Oct 2025 – Mar 2026 Role: AI Engineer

Key Outcomes:

  • End-to-end RAG application deployed in a production-like environment
  • Full LLM observability with Langfuse tracing
  • Type-safe AI pipelines using Pydantic
  • Sub-second retrieval from PostgreSQL/pgvector
  • Modular FastAPI backend ready for team integration

Challenge

Teams often have vast amounts of internal data — documents, knowledge bases, support tickets — but no reliable way to query it conversationally. Generic LLMs hallucinate or lack context. The challenge was building a system that answers real questions from real organizational data with accuracy that teams can depend on.

Approach

Built a Retrieval-Augmented Generation (RAG) pipeline that embeds and indexes documents into a PostgreSQL database using the pgvector extension, enabling semantic similarity search. The application retrieves the most relevant context before passing it to an LLM, drastically reducing hallucinations.

The backend was built with FastAPI and Python, with PydanticAI ensuring type safety throughout the AI pipeline. Langfuse was integrated for full LLM observability — tracing every request, prompt, and response.

Results & Impact

  • Accurate, grounded responses based on retrieved organizational data
  • Full observability: every LLM call traced and logged via Langfuse
  • Type-safe pipeline preventing silent failures in production
  • Clean, modular architecture that a team can extend and maintain
  • Production engineering practices applied throughout (not just a prototype)

Solution Overview

Architecture Diagram

End-to-end RAG architecture with retrieval, LLM orchestration, and observability

Tech Stack

  • Python & FastAPI
  • PostgreSQL with pgvector (vector similarity search)
  • PydanticAI (type-safe AI pipelines)
  • LangChain (LLM orchestration)
  • Langfuse (LLM observability and tracing)
  • OpenAI / Claude (LLM providers)
  • Docker (containerization)
  • GitHub Actions (CI/CD)

Additional Context

  • Timeline: 6 months (as part of Datalumina program)
  • Role: AI Engineer (solo)
  • Focus: Production-grade reliability, not just prototyping
  • Engineering practices: Pydantic type safety, structured logging, modular design
  • Interested in similar work?


    I'd love to discuss how a RAG-based solution could work for your team's data and workflows.

    Email Me