Technology Engineering – Page 24 – C4: Container, Code, Cloud & Context

Structured Generation Techniques: Getting Reliable JSON from LLMs

Posted on March 10, 2022 by Nithin Mohan TK 15 min read

Introduction: Getting LLMs to output valid JSON, XML, or other structured formats is surprisingly difficult. Models hallucinate extra fields, forget closing brackets, and produce malformed output that breaks downstream systems. Prompt engineering helps but doesn’t guarantee valid output. This guide covers techniques for reliable structured generation: using native JSON mode and structured outputs, constrained decoding […]

Read more →

LLM Caching Strategies: Reducing Costs and Latency with Smart Response Caching

Posted on February 15, 2022 by Nithin Mohan TK 14 min read

Introduction: LLM API calls are expensive and slow. A single GPT-4 request can cost $0.03-0.12 and take 2-10 seconds. When users ask similar questions repeatedly, you’re paying for the same computation over and over. Caching solves this by storing responses and returning them instantly for matching requests. But LLM caching is harder than traditional caching—users […]

Read more →

Embedding Model Selection: Choosing the Right Model for Your RAG System

Posted on January 20, 2022 by Nithin Mohan TK 11 min read

Introduction: Choosing the right embedding model is critical for RAG systems, semantic search, and similarity applications. The wrong choice leads to poor retrieval quality, high costs, or unacceptable latency. OpenAI’s text-embedding-3-small is cheap and fast but may miss nuanced similarities. Cohere’s embed-v3 excels at multilingual content. Open-source models like BGE and E5 offer privacy and […]

Read more →

Chain-of-Thought Prompting: Unlocking LLM Reasoning with Step-by-Step Thinking

Posted on December 15, 2021 by Nithin Mohan TK 16 min read

Introduction: Chain-of-thought (CoT) prompting dramatically improves LLM performance on complex reasoning tasks. Instead of asking for a direct answer, you prompt the model to show its reasoning step by step. This simple technique can boost accuracy on math problems from 17% to 78%, and similar gains appear across logical reasoning, code generation, and multi-step analysis. […]

Read more →

Tool Use Patterns: Building LLM Agents That Can Take Action

Posted on November 10, 2021 by Nithin Mohan TK 15 min read

Introduction: Tool use transforms LLMs from text generators into capable agents that can search the web, query databases, execute code, and interact with APIs. But implementing tool use well is tricky—models hallucinate tool calls, pass invalid arguments, and struggle with multi-step tool chains. The difference between a demo and production system lies in robust tool […]

Read more →

Retrieval Augmented Generation Patterns: Building RAG Systems That Actually Work

Posted on October 5, 2021 by Nithin Mohan TK 14 min read

Introduction: Retrieval Augmented Generation (RAG) grounds LLM responses in your actual data, reducing hallucinations and enabling knowledge that wasn’t in the training set. But naive RAG—embed documents, retrieve top-k, stuff into prompt—often disappoints. Retrieval misses relevant documents, context windows overflow, and the model ignores important information buried in long contexts. This guide covers advanced RAG […]

Read more →

Searching in

Category: Technology Engineering

Structured Generation Techniques: Getting Reliable JSON from LLMs

LLM Caching Strategies: Reducing Costs and Latency with Smart Response Caching

Embedding Model Selection: Choosing the Right Model for Your RAG System

Chain-of-Thought Prompting: Unlocking LLM Reasoning with Step-by-Step Thinking

Tool Use Patterns: Building LLM Agents That Can Take Action

Retrieval Augmented Generation Patterns: Building RAG Systems That Actually Work