Brand Image
  • Home
  • About
  • Resume
  • Portfolio
  • Blogs
  1. Home
  2. Portfolio
  3. Career vault
CareerVault

CareerVault

  • Chat UI - Second Question: Maintaining deep conversational context across multiple turns using LlamaIndex's CondensePlusContextChatEngine.

    Chat UI - Second Question: Maintaining deep conversational context across multiple turns using LlamaIndex's CondensePlusContextChatEngine.

  • CareerVault: An Interactive RAG Portfolio with Persistent Memory

    CareerVault: An Interactive RAG Portfolio with Persistent Memory

  • Hybrid Retrieval Pipeline: Combining Vector Search (MiniLM-L6) with Cross-Encoder Reranking (MS-Marco) for high-precision context injection.

    Hybrid Retrieval Pipeline: Combining Vector Search (MiniLM-L6) with Cross-Encoder Reranking (MS-Marco) for high-precision context injection.

  • Serverless Architecture: Deployed on Google Cloud Run with Firestore for persistent, server-side chat history storage.

    Serverless Architecture: Deployed on Google Cloud Run with Firestore for persistent, server-side chat history storage.

  • Chat UI - Default Screen: A polished,interface built with Next.js and Tailwind CSS.

    Chat UI - Default Screen: A polished,interface built with Next.js and Tailwind CSS.

  • Chat UI - First Question: Engaging the user with suggested query chips to reduce cognitive load.

    Chat UI - First Question: Engaging the user with suggested query chips to reduce cognitive load.

  • Chat UI - Second Question: Maintaining deep conversational context across multiple turns using LlamaIndex's CondensePlusContextChatEngine.

    Chat UI - Second Question: Maintaining deep conversational context across multiple turns using LlamaIndex's CondensePlusContextChatEngine.

  • CareerVault: An Interactive RAG Portfolio with Persistent Memory

    CareerVault: An Interactive RAG Portfolio with Persistent Memory

    Project Description

    A production-ready RAG application that serves as an intelligent 'Digital Twin,' answering queries about my professional history using a hybrid retrieval pipeline and serverless state management.

    Responsibilities
    • Architected a serverless RAG pipeline on Google Cloud Run, utilizing LlamaIndex's 'CondensePlusContextChatEngine' to maintain deep conversational context across multiple turns.
    • Engineered a robust state management system using Firestore to persist chat history server-side, enabling both long-term memory for the user and offline quality analysis for the developer.
    • Optimized Docker container sizing by over 70% (from 5.5GB to ~1.5GB) by enforcing CPU-only PyTorch wheels and implementing a multi-stage build process that excludes heavy CUDA GPU drivers.
    • Solved the 'Serverless Cold Start' problem (typically 40s+ for LLM apps) by deploying a Cloud Scheduler 'Warmer Bot' that pings the service every 15 minutes, ensuring sub-second response times.
    • Implemented a two-stage retrieval strategy: broad semantic search using 'all-MiniLM-L6-v2' followed by a precision pass using the 'ms-marco-MiniLM-L-2-v2' Cross-Encoder to minimize hallucination.
    • Built a polished, terminal-themed UI using Next.js and Tailwind CSS, featuring a custom 'useSession' hook that generates and persists UUIDs client-side to maintain chat history across page reloads.
    • Configured a production-grade 'autoscaling.knative.dev' strategy with a max-instance limit of 5, balancing high availability with strict cost controls.
    • Integrated a 'Typewriter' effect and 'Suggested Query Chips' to guide user interaction and reduce the cognitive load of the 'Blank Page Problem' common in chatbot interfaces.
    Related Links
    • Chat with Me
    Technology
    DockerFastAPIFirestoreLlamaIndexNext.jsPython