Nithin Mohan TK

LLM Testing Strategies: Unit Tests, Evaluation Metrics, and Regression Testing

Posted on May 10, 2022 by Nithin Mohan TK 18 min read

Introduction: Testing LLM applications is fundamentally different from testing traditional software. Outputs are non-deterministic, quality is subjective, and edge cases are infinite. You can’t simply assert that output equals expected—you need to evaluate whether outputs are good enough across multiple dimensions. Yet many teams skip testing entirely or rely solely on manual spot-checking. This guide […]

Read more →

Agent Memory Patterns: Building Persistent Context for AI Agents

Posted on April 15, 2022 by Nithin Mohan TK 19 min read

Introduction: Memory is what transforms a stateless LLM into a persistent, context-aware agent. Without memory, every interaction starts from scratch—the agent forgets previous conversations, learned preferences, and accumulated knowledge. But implementing memory for agents is more complex than simply storing chat history. You need short-term memory for the current task, long-term memory for persistent knowledge, […]

Read more →

Structured Generation Techniques: Getting Reliable JSON from LLMs

Posted on March 10, 2022 by Nithin Mohan TK 15 min read

Introduction: Getting LLMs to output valid JSON, XML, or other structured formats is surprisingly difficult. Models hallucinate extra fields, forget closing brackets, and produce malformed output that breaks downstream systems. Prompt engineering helps but doesn’t guarantee valid output. This guide covers techniques for reliable structured generation: using native JSON mode and structured outputs, constrained decoding […]

Read more →

LLM Caching Strategies: Reducing Costs and Latency with Smart Response Caching

Posted on February 15, 2022 by Nithin Mohan TK 14 min read

Introduction: LLM API calls are expensive and slow. A single GPT-4 request can cost $0.03-0.12 and take 2-10 seconds. When users ask similar questions repeatedly, you’re paying for the same computation over and over. Caching solves this by storing responses and returning them instantly for matching requests. But LLM caching is harder than traditional caching—users […]

Read more →

Embedding Model Selection: Choosing the Right Model for Your RAG System

Posted on January 20, 2022 by Nithin Mohan TK 11 min read

Introduction: Choosing the right embedding model is critical for RAG systems, semantic search, and similarity applications. The wrong choice leads to poor retrieval quality, high costs, or unacceptable latency. OpenAI’s text-embedding-3-small is cheap and fast but may miss nuanced similarities. Cohere’s embed-v3 excels at multilingual content. Open-source models like BGE and E5 offer privacy and […]

Read more →

Supercharge Your Cloud Infrastructure with Amazon CDK v2: Python Power and Seamless Migration from CDK v1!

Posted on December 25, 2021 by Nithin Mohan TK 6 min read

Imagine how efficient your cloud operations could be if you could use your familiar programming languages to define your cloud infrastructure? Interestingly, Amazon’s Cloud Development Kit (CDK) makes this possible. Developers can leverage high-level components to define their infrastructure in code, simplifying the process and giving them more control. This blog will delve into the […]

Read more →

Searching in

Author: Nithin Mohan TK

LLM Testing Strategies: Unit Tests, Evaluation Metrics, and Regression Testing

Agent Memory Patterns: Building Persistent Context for AI Agents

Structured Generation Techniques: Getting Reliable JSON from LLMs

LLM Caching Strategies: Reducing Costs and Latency with Smart Response Caching

Embedding Model Selection: Choosing the Right Model for Your RAG System

Supercharge Your Cloud Infrastructure with Amazon CDK v2: Python Power and Seamless Migration from CDK v1!