Technology Engineering – Page 32 – C4: Container, Code, Cloud & Context

LLM Evaluation Metrics: Measuring Quality in Non-Deterministic Systems

Posted on February 1, 2018 by Nithin Mohan TK 18 min read

Introduction: Evaluating LLM outputs is fundamentally different from traditional ML metrics. You can’t just compute accuracy when there’s no single correct answer, and human evaluation doesn’t scale. This guide covers the full spectrum of LLM evaluation: automated metrics like BLEU, ROUGE, and BERTScore for measuring similarity; semantic metrics that capture meaning beyond surface-level matching; LLM-as-judge […]

Read more →

Vector Database Optimization: Scaling Semantic Search to Millions of Embeddings

Posted on January 1, 2018 by Nithin Mohan TK 18 min read

Introduction: Vector databases are the backbone of modern AI applications—powering semantic search, RAG systems, and recommendation engines. But as your vector collection grows from thousands to millions of embeddings, naive approaches break down. Query latency spikes, memory costs explode, and recall accuracy degrades. This guide covers practical optimization strategies: choosing the right index type for […]

Read more →

RAG Patterns: Advanced Retrieval Augmented Generation Strategies

Posted on December 1, 2017 by Nithin Mohan TK 20 min read

Introduction: Retrieval Augmented Generation (RAG) has become the standard pattern for grounding LLM responses in factual, up-to-date information. But basic RAG—retrieve chunks, stuff into prompt, generate—often falls short in production. Queries get misunderstood, irrelevant chunks pollute context, and answers lack coherence. This guide covers advanced RAG patterns that address these challenges: query transformation to improve […]

Read more →

Embedding Dimensionality Reduction: Compressing Vectors Without Losing Semantics

Posted on November 1, 2017 by Nithin Mohan TK 17 min read

Introduction: High-dimensional embeddings from models like OpenAI’s text-embedding-3-large (3072 dimensions) or Cohere’s embed-v3 (1024 dimensions) deliver excellent semantic understanding but come with costs: more storage, slower similarity computations, and higher memory usage. For many applications, you can reduce dimensions significantly while preserving most of the semantic information. This guide covers practical dimensionality reduction techniques: PCA […]

Read more →

LLM Latency Optimization: Techniques for Sub-Second Response Times

Posted on October 1, 2017 by Nithin Mohan TK 18 min read

Introduction: LLM latency is the silent killer of user experience. Even the most accurate model becomes frustrating when users wait seconds for each response. The challenge is that LLM inference is inherently slow—autoregressive generation means each token depends on all previous tokens. This guide covers practical techniques for reducing perceived and actual latency: streaming responses […]

Read more →

Agentic Workflow Patterns: Building Autonomous AI Systems That Plan, Act, and Learn

Posted on September 1, 2017 by Nithin Mohan TK 24 min read

Introduction: Agentic workflows represent a paradigm shift from simple prompt-response patterns to autonomous, goal-directed AI systems. Unlike traditional LLM applications where the model responds once and stops, agentic systems can plan multi-step solutions, execute actions, observe results, and iterate until the goal is achieved. This guide covers the core patterns that make agentic systems work: […]

Read more →

Searching in

Category: Technology Engineering

LLM Evaluation Metrics: Measuring Quality in Non-Deterministic Systems

Vector Database Optimization: Scaling Semantic Search to Millions of Embeddings

RAG Patterns: Advanced Retrieval Augmented Generation Strategies

Embedding Dimensionality Reduction: Compressing Vectors Without Losing Semantics

LLM Latency Optimization: Techniques for Sub-Second Response Times

Agentic Workflow Patterns: Building Autonomous AI Systems That Plan, Act, and Learn