Nithin Mohan TK

LLM Routing and Load Balancing: Optimizing Cost and Performance Across Model Fleets

Posted on May 1, 2016 by Nithin Mohan TK 18 min read

Introduction: LLM routing and load balancing are critical for building cost-effective, reliable AI systems at scale. Not every query needs GPT-4—many can be handled by smaller, faster, cheaper models with equivalent quality. Intelligent routing analyzes incoming requests and directs them to the most appropriate model based on complexity, cost constraints, latency requirements, and current system […]

Read more →

Retrieval Evaluation Metrics: Measuring What Matters in Search and RAG Systems

Posted on April 1, 2016 by Nithin Mohan TK 18 min read

Introduction: Retrieval evaluation is the foundation of building effective RAG systems and search applications. Without proper metrics, you’re flying blind—unable to tell if your retrieval improvements actually help or hurt end-user experience. This guide covers the essential metrics for evaluating retrieval systems: precision and recall at various cutoffs, Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative […]

Read more →

Visual Studio 2015 Update 2–Download

Posted on March 31, 2016 by Nithin Mohan TK 1 min read

Today Microsoft has released Update 2 for Visual Studio 2015. Visual Studio 2015 Update 2 includes a variety of capability improvements and bug fixes. To find out what’s new, see the Visual Studio 2015 Update 2 Release Notes. For a list of fixed bugs and known issues, see the Visual Studio 2015 Update 2 MSDN […]

Read more →

Visual Studio Code – download

Posted on March 10, 2016 by Nithin Mohan TK 1 min read

Visual Studio Code is free open source editor from Microsoft. Download: Visual Studio Code for Windows Visual Studio Code for Mac OS Visual Studio Code for Linux Release notes

Read more →

Prompt Debugging Techniques: Systematic Approaches to Fixing LLM Failures

Posted on March 1, 2016 by Nithin Mohan TK 20 min read

Introduction: Prompt debugging is an essential skill for building reliable LLM applications. When prompts fail—producing incorrect outputs, hallucinations, or inconsistent results—systematic debugging techniques help identify and fix the root cause. Unlike traditional software debugging where you can step through code, prompt debugging requires understanding how language models interpret instructions and where they commonly fail. This […]

Read more →

Batch Inference Optimization: Maximizing Throughput and Minimizing Costs

Posted on February 1, 2016 by Nithin Mohan TK 18 min read

Introduction: Batch inference optimization is critical for cost-effective LLM deployment at scale. Processing requests individually wastes GPU resources—the model loads weights once but processes only a single sequence. Batching multiple requests together amortizes this overhead, dramatically improving throughput and reducing per-request costs. This guide covers the techniques that make batch inference efficient: dynamic batching strategies, […]

Read more →

Searching in

Author: Nithin Mohan TK

LLM Routing and Load Balancing: Optimizing Cost and Performance Across Model Fleets

Retrieval Evaluation Metrics: Measuring What Matters in Search and RAG Systems

Visual Studio 2015 Update 2–Download

Visual Studio Code – download

Prompt Debugging Techniques: Systematic Approaches to Fixing LLM Failures

Batch Inference Optimization: Maximizing Throughput and Minimizing Costs