Tirth Logs

Monday, 1 September 2025

RAG vs Fine-Tuning: When to Pick Which?

The rapid evolution of large language models (LLMs) has made them increasingly useful across industries. However, when tailoring these models to specific domains or tasks, two strategies dominate the conversation: Retrieval-Augmented Generation (RAG) and Fine-Tuning. While both enhance model performance, they serve different purposes and come with distinct trade-offs. Choosing the right approach depends on your data, use case, and long-term scalability needs.

What is Retrieval-Augmented Generation (RAG)?

RAG combines a pretrained LLM with an external knowledge source (e.g., a vector database). Instead of relying solely on what the model has memorized, RAG retrieves relevant documents or passages at runtime and uses them to ground the model’s responses.

Key advantages:

Dynamic knowledge updates without retraining.
Reduced risk of hallucination by grounding answers in retrieved context.
Cost-effective, as no model weights are modified.

Best suited for:

Knowledge-intensive applications (e.g., legal, healthcare, enterprise search).
Use cases requiring frequent updates to knowledge.
Situations where domain-specific expertise is too broad or fast-changing to fine-tune effectively.

What is Fine-Tuning?

Fine-tuning involves training an LLM further on domain-specific datasets, modifying its weights to adapt to new tasks or knowledge. This approach hardwires the new knowledge or style into the model itself.

Key advantages:

Produces highly specialized models aligned with domain-specific tasks.
Reduces reliance on retrieval infrastructure.
Better for style adaptation (e.g., customer service tone, brand voice).

Best suited for:

Narrow, well-defined tasks (e.g., classification, structured outputs).
Domains where the knowledge base changes slowly.
Scenarios requiring consistent behavior without runtime dependencies.

RAG vs Fine-Tuning: A Comparison

Aspect	RAG	Fine-Tuning
Adaptability	Easily updated with new data	Requires retraining for updates
Cost	Cheaper (no retraining)	Higher (compute + data labeling)
Knowledge freshness	Always current with updated retrieval DB	Becomes outdated over time
Specialization	General-purpose with external grounding	Deeply tailored to a domain/task
Infrastructure	Needs a retriever + vector database	Model-only, but larger upfront effort

When to Pick Which?

Pick RAG if your system must stay current with fast-evolving knowledge, such as customer support, regulatory compliance, or research-based queries.
Pick Fine-Tuning if you need a tightly controlled, domain-specific model with consistent outputs, especially where knowledge is stable and tasks are repetitive.

In many real-world systems, the best solution isn’t a binary choice. Hybrid approaches—using RAG for knowledge freshness and fine-tuning for stylistic alignment—are emerging as the most effective strategy.

Closing Thoughts

The decision between RAG and fine-tuning comes down to the balance between knowledge freshness and domain specialization. By understanding the strengths and trade-offs of both, organizations can design LLM-powered systems that are accurate, cost-effective, and scalable.

Self-Improving AI: The Role of Feedback Loops in Model Alignment

Artificial Intelligence is no longer just about training a model once and deploying it. The frontier of AI research is shifting toward self-improving systems models that continuously refine themselves based on feedback. At the heart of this evolution lies the concept of feedback loops, which are central to aligning AI systems with human values, goals, and safety standards.

Why Feedback Loops Matter

In traditional machine learning, improvement happens through retraining with curated datasets. But AI systems in the real world need dynamic adaptation. They must handle changing data, evolving objectives, and unforeseen edge cases. Feedback loops act as the mechanism by which models can:

Identify errors (detect misalignments or harmful outputs).
Adapt behaviors (reweight predictions or modify responses).
Learn continuously (update policies or embeddings without full retraining).

This mimics how humans learn: not just from one-time training but from ongoing correction and real-world reinforcement.

Types of Feedback Loops in AI Alignment

Human-in-the-Loop (HITL):
Models adjust based on direct human feedback (e.g., Reinforcement Learning with Human Feedback, RLHF).
Self-Reflection Loops:
The model critiques its own outputs before finalizing responses, reducing hallucinations or logical errors.
Environmental Feedback:
Interaction with the environment (e.g., in robotics or simulations) provides reinforcement signals.
Peer AI Feedback:
Multiple models cross-verify and refine each other’s outputs, creating a collaborative alignment ecosystem.

Self-Improving AI in Practice

Chatbots and LLMs: They use human feedback to improve conversational alignment and reduce harmful biases.
Recommendation Systems: Algorithms refine predictions based on click-through rates and user engagement loops.
Autonomous Vehicles: Continuous feedback from real-world driving helps adjust navigation and safety protocols.

Challenges of Feedback-Driven Alignment

Bias Amplification: Feedback loops can reinforce existing biases if not carefully managed.
Overfitting to Feedback: Models may over-optimize for short-term signals (e.g., likes, clicks) instead of long-term goals.
Scalability: Human feedback doesn’t scale infinitely; automated or hybrid feedback systems are necessary.

The Road Ahead

For AI to become truly safe, aligned, and adaptive, it must leverage feedback loops as the engine of self-improvement. Future research is likely to explore:

Hybrid feedback frameworks combining human, synthetic, and environmental signals.
Meta-learning systems that learn how to learn from feedback more efficiently.
Alignment diagnostics that monitor feedback loops to prevent drift or harmful optimization.

In essence, feedback loops are the nervous system of self-improving AI. They don’t just help models learn faster; they ensure that learning is aligned with human intent, which may well define the next decade of AI progress.

Image Classification with CNN Feature Extraction and Traditional ML Algorithms

Introduction

Deep learning dominates image classification today, but traditional machine learning algorithms are far from obsolete. In fact, one powerful approach combines the best of both worlds:

Use Convolutional Neural Networks (CNNs) to extract robust, high-level image features.
Feed those features into classical ML algorithms (like SVM, Random Forest, or Logistic Regression) for classification.

This hybrid pipeline often yields competitive results—especially when datasets are small, compute resources are limited, or interpretability matters.

Step 1: Why CNNs for Feature Extraction?

CNNs are excellent at automatically learning spatial hierarchies of features:

Early layers detect edges and textures.
Middle layers capture shapes and patterns.
Deeper layers represent complex objects and semantics.

Instead of training a full CNN end-to-end, we can freeze a pretrained CNN (like VGG, ResNet, or MobileNet) and use its output as a feature vector for each image.

Example:

Input: 224×224 image.
Pass through ResNet-50 up to the penultimate layer.
Output: A 2048-dimensional feature vector.

This vector becomes the input for traditional ML algorithms.

Step 2: Feeding Features to ML Algorithms

Once you have CNN-extracted features, you can apply:

Support Vector Machines (SVMs)
- Strong in high-dimensional spaces.
- Works well with smaller datasets.
- Example: Linear SVM on ResNet features often beats end-to-end training with limited data.
Random Forests / Gradient Boosted Trees
- Handle non-linear relationships.
- Provide feature importance insights.
- Often robust to noise.
Logistic Regression / k-NN
- Simple, fast baselines.
- Logistic Regression is interpretable; k-NN is flexible but slower at scale.
Ensemble Approaches
- Combine multiple classifiers for improved accuracy.
- Example: SVM + Random Forest voting scheme.

Step 3: When to Use This Hybrid Approach?

Small to Medium Datasets

Training a deep CNN end-to-end may overfit.
Pretrained CNN features + SVM generalize better.

Limited Compute

Extract features once, train lightweight ML models.
Much cheaper than GPU-heavy end-to-end fine-tuning.

Explainability Needs

Random Forests or Logistic Regression provide insights into classification decisions.
Useful in regulated domains (healthcare, finance).

Case Example

Dataset: 10,000 chest X-rays (binary classification: pneumonia vs normal).
CNN: Pretrained DenseNet-121 (feature extraction only).
ML Classifier: Linear SVM.
Result: Comparable accuracy to full fine-tuning, but trained in hours instead of days.

Pros and Cons

Pros:

Faster training and experimentation.
Works well with limited data.
Leverages both deep feature richness and ML simplicity.

Cons:

Might underperform compared to full fine-tuned CNNs on very large datasets.
Feature extraction step adds an extra pipeline stage.
Requires careful feature dimensionality reduction (PCA, t-SNE) for certain ML algorithms.

Conclusion

This hybrid strategy CNN for feature extraction + traditional ML for classification—is a pragmatic, resource-efficient alternative to pure deep learning. It’s particularly suited for domains where data is scarce, compute is limited, or interpretability is valued.

In essence: let CNNs do what they’re best at (feature extraction), and let ML algorithms do what they’re best at (classification).

MCP & ADK: Foundations for Real-World AI Agents

Intro: Infrastructure Over Flash

Yes, LLMs grab headlines. But what slipperier, less-talked-about systems power real applications? That’s where MCP (Model Context Protocol) and ADK (Agent Development Kit) come in—practical, serious plumbing for AI’s real world use.

MCP: The Protocol That Makes Models Play Nice

Think of the Model Context Protocol as a standardized language LLMs use to interface with tools, databases, and APIs. It’s not flashy, but it’s essential.

Core Strengths:

Interoperability: Enables developers to swap models without rebuilding connectors—few things matter more for scalability.
Security & Governance: Only approved tools or datasets get exposed; context is controlled, not wild.
Scalable Integration: No need for bespoke integrations—every model talks the same language.

MCP transforms LLMs from isolated responders into context-aware collaborators.

ADK (Agent Development Kit): Google’s Practical Toolkit for AI Agents

The Agent Development Kit (ADK) is Google’s open-source, code-centric framework for building AI agents that can collaborate, interact, and deploy reliably.

Key Features (based on official documentation):

Modular frameworks for multi-agent systems (like orchestrating sub-agents) Google GitHubDataCampLinkedIn.
Model-agnostic design: optimized for Gemini, but also works with others via LiteLLM and Vertex AI DataCampLinkedIn+1.
Rich tool integration: pre-built tools (search, code execution), custom functions, and third-party libraries like LangChain DataCampLinkedIn.
Workflow orchestration: supports sequential, parallel, looping agents, plus LLM-guided routing Google GitHubLinkedIn.
Developer tools: CLI, Web UI, debugging, evaluation, streaming capabilities (text, audio, video) DataCampLinkedIn+1.
Deployment flexibility: containerize anywhere or use Vertex AI Agent Builder for managed runs Google GitHubLinkedIneducationnext.in.

In short: ADK makes agentic architecture feel like software engineering—clean, testable, versioned, deployable.

Why This Combo Works

Individually, MCP and ADK are solid. Together, they’re formidable:

MCP ensures all models and tools speak the same language.
ADK equips developers with the tools to build, test, and deploy agent systems using that protocol.

Imagine the legacy synergy of TCP/IP and modern dev stacks but for AI agents. It’s not glamorous but foundational. And that's what lasts.

Look Ahead: What’s Next

Cross-model compatibility: Swap GPT, Gemini, Claude with zero friction.
AI-native products: Applications designed from day one around agents that rely on MCP/ADK.
Broader ecosystem builds: Expect open-source SDKs, workflows, evaluation suites based on these standards.

Conclusion

AI’s future isn’t about bigger models it’s about smarter infrastructure. MCP and ADK don’t make the headlines but they’ll determine who builds the real workhorse agent systems over the next decade. Solid, repeatable, secure just the way traditional engineers like us prefer.

Deep Techie — “Mixture-of-Recursions: Smart, Lean, Mean”

Intro

Forget brute-force Transformers. MoR is the covert agent that decides which tokens get a full interrogation and which just get a nod. It's efficiency with a spine.

What’s Broken in Transformers

All tokens endure the full 24-layer gauntlet even “the”, “and”, “lol.” Pure waste.
Layers stack unique weights → bloated models and budget nightmares.
No VIP exits: even done tokens linger, sucking GPU juice and dragging latency.
KV cache explodes with every token at every layer.
Zero internal reasoning or dynamic depth. MediumarXiv

MoR’s Three Smart Moves

Parameter Sharing via Recursion
Instead of a unique layer stack, reuse one block like a loop. Fewer weights, same brain. MediumarXiv
Adaptive Token Routing
A tiny router judges each token’s IQ. Easy ones leave early; complex ones stick around. Token-level compute efficiency. MediumarXiv
Selective KV Caching
Only tokens still “in the game” clog up memory. Optionally, reuse the first pass’s KV to cut more slack (with some trade-offs). MediumarXiv

Benchmarks That Dazzle

At 1.7B parameters, MoR matches vanilla Transformer validation loss with 1/3 the unique weights. Medium
Peek into the Pareto frontier: MoR rides equal FLOPs to lower perplexity and better few-shot accuracy while spiffing up throughput by ~2×. arXiv+1

Routing Modes: Subtle Differences

Expert-choice: Each layer checks who continues dynamic but needs careful loss balancing.
Token-choice: Tokens pick their total recursion upfront—simple but less flexible. MediumarXiv

Trade-Off Signals

Token-choice lacks in-flight flexibility.
KV reuse saves memory but slightly dent accuracy.
Routing is fixed post-training; you can’t tweak on the fly.
Not golden for teeny models (<135M parameters). Medium

Takeaway

MoR isn’t Transformer-smashing rebellion it’s practical evolution. Smart compute, tight models, smarter exits. Finally, modern AI brains that stop overthinking.

Saturday, 23 August 2025

My First CodeChef ML Problem: “Hello World” for Data Science 😏

So today I decided to dip my toes into the world of CodeChef ML problems. Big dreams, high hopes… and what did I get? A nice little warm-up that felt suspiciously like scikit-learn 101 in disguise.

Here’s the “menu”:

Sentiment Analysis → vectorize text, train a model, predict. Easy-peasy.
Clustering → KMeans + silhouette score. Classic, safe, predictable.
Dimensionality Reduction → PCA + classifier comparison. Check, check, check.

Translation: I basically solved the “Hello World” for ML. No messy datasets, no weird metrics, no advanced tricks. Just the bread-and-butter stuff every ML newbie (and probably a lot of pros) has done a million times in tutorials.

Now, don’t get me wrong, it’s a perfect intro if you’re just starting out. But if CodeChef wants to live up to the hype, I’m ready for the chaos:

Feature engineering nightmares
Noisy, messy datasets
Custom metrics and multi-label problems
Ensemble stacking and meta-learning

But for now… congrats to me, I survived Entry-Level ML Bootcamp. Next level, please, I’m ready.

TL;DR: CodeChef’s first ML problem = “Hello World” for data science. Cute, but let’s crank up the difficulty already. 😎

Wednesday, 13 August 2025

Backend Glory, Frontend Shame: My 24-Hour Hackathon Story

Built a fully functional backend with a lame-ass frontend in 24 hours on 2 hours of sleep — proof that delirium is just accelerated creativity.

Every bug I fixed uncovered two more, pretty sure I’ve accidentally implemented mitosis in my software.

Commit #0 – The Overconfidence Push

When I signed up for this hackathon, I thought:

“We’ll keep it simple. Maybe a small web app, nice UI, minimal features. Easy win.”

Two hours in, “keep it simple” turned into designing a backend that could probably run half the internet.

Commit #5 – The All-Nighter Debug Session

Backend issues hit us like a freight train.
We spent the whole night wrestling with API endpoints that just wouldn’t behave.

One of my teammates had been stuck on a single bug for two hours straight.
Five cups of tea later, I swooped in and fixed it in just 15 minutes.

Right after that victory, a third person (not the poor guy stuck on the bug) casually dropped this gem:

“I helped you clear the 1st round. 2nd round is up to you guys.”

Which, translated from Hackathon-ese, basically means:

“I’m done, you’re on your own now.”

Commit #12 – Backend Nirvana, Frontend Tragedy

The backend was glorious fast, secure, and rock solid.
The frontend? Looked like it was made by someone’s pet hamster walking on a keyboard.
We knew the imbalance was bad, but fixing the frontend felt like trying to paint the Mona Lisa with a broom at that point.

Commit #17 – The SE Reality Check

Before this hackathon, we thought Software Engineering was an “easy” subject. All the models looked the same in the notes boxes, arrows, and buzzwords.
Turns out, SE is the difference between a clean, focused project and a caffeine-fueled mess.
We lacked any real strategic planning no timelines, no role allocation, no clear architecture. Basically, we were building a house without a floor plan.

Commit #22 – The “We Already Lost” Hour

The final sixty minutes of the hackathon weren’t spent frantically coding, debugging, or racing to add features.
They were spent scrolling Instagram Reels, a silent, collective acknowledgment that our fate was sealed.
It wasn’t that our backend wasn’t good. No, it was critical-level flawless.
Every API endpoint was optimized to the point of smugness, database queries executed with the speed of a caffeine-fueled squirrel, and the architecture was so clean it could’ve been in a textbook.
But here’s the truth no one says out loud: in most hackathons, a perfect backend is like a genius hiding behind a curtain.
The judges rarely peek. They’re looking for flash, something shiny to click, swipe, or tap in the demo.
If your masterpiece lives behind the scenes, you might as well have built a particle accelerator in your garage. Impressive, sure… but if you can’t wrap it in pastel colors and rounded buttons, it’s invisible.
Our assigned “guides” and I use that term generously, were essentially interns armed with a laminated cheat sheet.
They nodded in predictable intervals, delivering pre-approved motivational lines like NPCs in a badly written RPG.
If mentorship was supposed to be part of the hackathon experience, ours felt more like watching a YouTube tutorial in 144p, all filler, no insight.
By that last hour, stress gave way to a kind of zen resignation.
We weren’t going to slap together a panic-UI just to score points.
If we were going to lose, we’d lose with the backend untouched, a perfectly functional monument to code quality that no one in the judging room would truly appreciate.

It wasn’t that our backend wasn’t good. No, it was architecturally unassailable, engineered by our backend guy who was operating on a completely different plane of existence.

While the rest of us were still trying to remember which API did what, he was casually orchestrating database schemas like a symphony conductor.
Endpoints were not just functional — they were elegant, almost poetic in how they handled requests.
He wrote queries so efficient they could probably return results from a database that hadn’t even been created yet.

By the time he was done, the backend wasn’t just working — it felt alive, self-aware, and possibly judging our frontend for not living up to its standards.
If hackathons had a “Backend Hall of Fame,” this would have been inducted on day one, behind velvet ropes, with a sign saying “Do Not Touch — It Just Works.”

Postmortem

Hackathons sell themselves as innovation incubators, but more often, they’re presentation contests wearing hoodies.
The scoreboard doesn’t care if your backend could survive a DDoS attack from the gods themselves. If the judges can’t see it, it doesn’t exist.
And that’s the irony: the very events that are supposed to celebrate building great software often reward building great-looking software.
It’s a reminder that, in tech (and life), substance matters, but presentation decides the prize.
So yes, we walked away without a trophy, but we also walked away with the most bulletproof backend of the entire event.
And in the unspoken world of developers, that’s worth far more than a laminated “Winner” certificate signed by three people who couldn’t tell JSON from XML.