Hey there 👋
We’re
and - two AI practitioners with 15+ years of combined experience building production-grade ML systems in the wild.Over the years, we’ve built RAG pipelines, AI Agents, Recommender systems, MLOps platforms, LLM-based agents, and everything in between. All in production!
⚠️ Important: Want hands-on help as you build?
Join our live weekly Code Walkthrough + Q&A Hours (Sundays 16:00–17:30 CET) for code walkthroughs, architecture insights, real-world debugging tips, and guidance. Register here → https://topmate.io/shantanuladhwe/1657955
Seats are limited. More details at the end of this blog.
Note: All Phase 1 code + weekly blog posts are completely free. But Code walkthrough + Q&A hours is paid where we go deeper and deliver serious value.
Now, we’re opening that experience to the world - through a hands-on, open-source, no-hype learning initiative we call
The Mother of AI Project
This isn’t a typical AI course or bootcamp.
It’s a build-first, learn-by-doing, production-ready AI roadmap - made up of multiple focused phases, each guiding you through building a complete system from scratch.
The Project Phases
Phase 1 – RAG Systems : Zero to Hero
Build a personalized AI research assistant from scratch
Ingest 1000+ research papers, chunk, embed, and run hybrid search + LLM-based RAG + Monitoring
(This is what we’re starting with - full details below)
Phase 2 – AI Agents + Tool Use + Monitoring (new project)
Move beyond static RAGs.
Build decision-making agents with memory, planning, and tool use.
Multi-step reasoning, better interactivity, and deeper retrieval.
Phase 3 – Recommendation Systems (new subproject)
Build real-time content-based or hybrid multi-stage recommender systems.
Cover ranking, user personalization, feedback loops, and metrics.
Phase 4 – MLOps + LLMOps (continued)
Take your models to production
Make use of the most popular Cloud services & pipelines here.
CI/CD, evaluation harnesses, fine-tuning, prompt versioning, data pipelines, security, and testing - Alles!
Phase 5 – Full App Integration + Cloud Deployment (continued)
From one of the most suitable project above!
Bring everything together.
Containerization, API orchestration, front-end demo, cloud deployment (AWS/GCP), CI/CD, IaaS, cost optimization, etc.
Phase 6 – Monitoring + Alerting Mastery (continued)
Build reliable systems that never go silent.
Logging, tracing, drift detection, alerts, and incident-ready dashboards.
Each phase isn’t a toy demo . it’s a production-grade system you’d be proud to showcase.
You’ll learn how to think and build like a real-world AI/ML engineer - using tools that teams actually use in production: Docker, FastAPI, Airflow, Ollama, LangGraph, OpenSearch, Langfuse, and many more.
(we only teach what’s truly useful and community-backed).
💼 Everything you build is CV-worthy!
We’ll not only walk you through best practices -
we’ll show you how to extend each system, adapt it for your own use case, and communicate it effectively on your resume or portfolio.
Who is this for?
If you’re tired of surface-level tutorials and want to go deeper - this is for you.
Whether you’re:
A student working toward your first real AI project
A Data Scientist looking to level up your software and infra skills
A Data/Backend/Software Engineer curious about LLMs, RAG, or Agents
A working AI/ML Engineer who wants to learn GenAI systems, for real
Anyone who is looking into entering this industry by learning great projects!
Pre-requisites: Good Python knowledge and understanding of software programming.
This will accelerate your path - with structure, hands-on examples, and mentorship from people who’ve built this in the wild.
Next: dive into Phase 1: RAG Systems (Zero → Hero)
Phase 1: Build Your Own AI Research Assistant
We’re kicking off the Mother of AI Project with what we believe is one of the most impactful AI skills right now:
RAG (Retrieval-Augmented Generation) - used in everything from chatbots to search, internal knowledge bases, agents, and assistants.
But we’re not stopping at “just” RAG.
You’ll build a complete research assistant that:
Runs automated data ingestion pipelines
Downloads and Parse 100+ academic PDFs (e.g. arXiv papers) through APIs
Search across papers with keywords and embeddings
Answer questions using local or foundational LLMs (Ollama, OpenAI, etc.)
Show sources, give feedback, and maintain privacy
A complete real production grade system!
This is what we call the ArXiv Paper Curator - an AI system that turns the flood of new AI research into structured, searchable knowledge.
What you’ll build (Technical Breakdown)
You will build from scratch - a fully local with API integration, production-grade RAG system with:
Data Ingestion: Auto-download PDFs daily from arXiv using Airflow
Dual Parsing: Extract structured content via GROBID + Docling fallback
Metadata Storage: Store authors, titles, abstracts, etc. metadata in PostgreSQL
Search Engine: Use OpenSearch with BM25 + semantic vectors (hybrid)
Chunking Engine: Semantic-aware chunking (evaluate different chunking)
Embedding Store: SentenceTransformers + LlamaIndex indexing
RAG Pipeline: Query expansion + retrieval + prompt templating
Local LLM: Answer questions using Ollama or API (LLaMA3, OpenAI, etc.)
Observability: Use Langfuse for prompt versioning, tracing, quality
Evaluation: RAGAS metrics, nDCG scoring, accuracy, latency tracking
Frontend: Ask questions and explore results via Streamlit or Gradio
FastAPI Backend: Async API server for integration and extensions
Dev Best Practices: uv, ruff, pre-commit, pydantic, pytest, logging, etc.
📆 The 6-Week Learning Plan
Every Thursday, we release a new blog, notebook, and GitHub commit.
You follow along, build that week’s part. Here’s the weekly breakdown:
Week 1 (Aug 7): Infrastructure & API setup
What You'll Build:
FastAPI skeleton project setup
Complete Docker Compose stack orchestrating all services
FastAPI application with health checks and basic endpoints
PostgreSQL and OpenSearch containers with proper networking
Ollama container for local LLM inference
Mock data pipeline for testing without external dependencies
Key Learning Outcomes:
Understanding microservices architecture for AI applications
Setting up development environments with hot-reloading
Implementing async FastAPI endpoints with proper error handling
Container networking and service discovery
Environment configuration and secrets management
🔗 [Notebook + Blog – Coming Soon]
Week 2 (Aug 14): Ingestion pipeline
What You'll Build:
arXiv API client with rate limiting and retry logic
Automated metadata fetcher for academic papers
Parallel PDF downloader with progress tracking
GROBID integration for scientific PDF parsing
Docling fallback for robust document processing
Airflow DAGs for orchestrating daily ingestion
Key Learning Outcomes:
Working with external APIs respectfully (rate limiting)
Handling large file downloads efficiently
Implementing fallback patterns for reliability
Understanding PDF parsing challenges in academic documents
Building fault-tolerant data pipelines
Async/await patterns for I/O operations
🔗 [Notebook + Blog – Coming Soon]
🔍 Week 3 (Aug 21): Search infrastructure
What You'll Build:
OpenSearch mapping optimized for academic content
PostgreSQL schema with JSONB for flexible metadata
Dual-storage strategy implementation
Custom analyzers for scientific terminology
Multi-field search with BM25 scoring
Category, author, and date range filtering
Search result ranking and relevance tuning - e.g. latest papers.
Key Learning Outcomes:
Designing schemas for dual-storage systems
OpenSearch analyzers and tokenizers for academic text
Building complex search queries programmatically
Understanding BM25 and relevance scoring
Implementing faceted search and filters
Performance optimization for search operations
Filters, date range, and relevance scoring
🔗 [Notebook + Blog – Coming Soon]
Week 4 (Aug 28): Chunking + retrieval evaluation
What You'll Build:
Context-aware chunking that preserves paper structure
Chunk size optimization based on retrieval performance
nDCG, precision, and recall metrics implementation
Query expansion for better recall
Key Learning Outcomes:
Why naive chunking fails for academic documents
Preserving semantic boundaries in technical text
Implementing evaluation metrics for retrieval systems
Testing different chunking strategies
Understanding the precision-recall tradeoff
Query expansion techniques that work
🔗 [Notebook + Blog – Coming Soon]
Week 5 (Sept 4): Full RAG system + prompts
What You'll Build:
End-to-end question answering system
LlamaIndex integration for RAG orchestration
Source tracking with paragraph-level citations
Prompt templates optimized for academic content
Context window management for long papers
Truncation strategies that preserve meaning
Answer generation with confidence scores
Key Learning Outcomes:
Implementing production RAG pipelines
Prompt engineering for factual accuracy
Managing context windows effectively
Handling multi-document synthesis
Building citation systems users trust
Balancing comprehensiveness with conciseness
🔗 [Notebook + Blog – Coming Soon]
Week 6 (Sept 11): Observability + versioning
What You'll Build:
Langfuse integration for complete observability
Prompt versioning and A/B testing framework
Performance monitoring dashboards
Request tracing from question to answer
Gradio interface for easy interaction
Caching layer for common queries
Production deployment configurations
Key Learning Outcomes:
Implementing observability for LLM applications
Data-driven prompt optimization
Building intuitive research interfaces
Performance profiling and optimization
Caching strategies for AI systems
Production deployment best practices
Note: Above all is subject to change as the project progresses.
GitHub repo: Coming soon - all notebooks, code, and updates
Important! Code walkthrough + Q&A hours!
We’re opening up code walkthrough + Q&A Hours
Every Sunday from 16:00-17:30 CET - where you can learn directly from two AI/ML engineers with 15+ years of combined production experience.
Not just another Zoom session. This is where the real learning happens.
Here’s what you’ll get:
✅ Live walkthrough of that week’s code
✅ Deeper insights into design tradeoffs, infrastructure, and architecture
✅ Debugging support on your implementation
✅ Stories from production - when building RAGs, Agents, MLOps systems
✅ Interactive Q&A and one-on-one guidance during the session
✅ Career tips and a lot of tips to upgrade this project furher
This is where our best insights come out - the kind that doesn’t fit into a blog post.
⚠️ Seats are limited and it’s intentionally priced low to stay accessible.
Register now for Office Hours → https://topmate.io/shantanuladhwe/1657955
Note: All Phase 1 code + weekly blog posts are completely free. But Code walkthrough + Q&A hours is paid where we go deeper and deliver serious value.
What to do now?
✅ Subscribe to
for weekly updates✅ Register for Office Hours before slots fill up
✅ Bookmark this blog - we update links as the weeks go
✅ Get your hands dirty - every week’s notebook will be ready to run
The future of AI systems is being built - you should be building them too.
Let’s go 💪
Very excited to join this amazing journey! 🚀😃
I subscribed and booked my seat. but clueless on how the learning will start and where to meet up for the Q/A. if someone can respond, thanks in advance