Technology Engineering – Page 8 – C4: Container, Code, Cloud & Context

The Rise of GitOps: Automating Deployment and Improving Reliability

Posted on February 16, 2025 by Nithin Mohan TK 11 min read

GitOps is a relatively new approach to software delivery that has been gaining popularity in recent years. It is a set of practices for managing and deploying infrastructure and applications using Git as the single source of truth. In this blog post, we will explore the concept of GitOps, its key benefits, and some examples […]

Read more →

Batch Inference Optimization: Maximizing Throughput and Minimizing Costs

Posted on February 8, 2025 by Nithin Mohan TK 18 min read

Introduction: Batch inference optimization is critical for cost-effective LLM deployment at scale. Processing requests individually wastes GPU resources—the model loads weights once but processes only a single sequence. Batching multiple requests together amortizes this overhead, dramatically improving throughput and reducing per-request costs. This guide covers the techniques that make batch inference efficient: dynamic batching strategies, […]

Read more →

GitOps with a comparison between Flux and ArgoCD and which one is better for use in Azure AKS

Posted on February 6, 2025 by Nithin Mohan TK 4 min read

GitOps has emerged as a powerful paradigm for managing Kubernetes clusters and deploying applications. Two popular tools for implementing GitOps in Kubernetes are Flux and ArgoCD. Both tools have similar functionalities, but they differ in terms of their architecture, ease of use, and integration with cloud platforms like Azure AKS. In this blog, we will […]

Read more →

LLM Monitoring and Alerting: Building Observability for Production AI Systems

Posted on February 3, 2025 by Nithin Mohan TK 20 min read

Introduction: LLM monitoring is essential for maintaining reliable, cost-effective AI applications in production. Unlike traditional software where errors are obvious, LLM failures can be subtle—degraded output quality, increased hallucinations, or slowly rising costs that go unnoticed until the monthly bill arrives. Effective monitoring tracks latency, token usage, error rates, output quality, and cost metrics in […]

Read more →

Structured Output from LLMs: JSON Mode, Function Calling, and Pydantic Patterns (Part 1 of 2)

Posted on February 2, 2025 by Nithin Mohan TK 12 min read

Introduction: Getting reliable, structured data from LLMs is one of the most practical challenges in building AI applications. Whether you’re extracting entities from text, generating API parameters, or building data pipelines, you need JSON that actually parses and validates against your schema. This guide covers the evolution of structured output techniques—from prompt engineering hacks to […]

Read more →

Context Compression Techniques: Fitting More Information into Limited Token Budgets

Posted on January 28, 2025 by Nithin Mohan TK 3 min read

Introduction: Context window limits are one of the most frustrating constraints when building LLM applications. You have a 100-page document but only 8K tokens of context. You want to include conversation history but it’s eating into your prompt budget. Context compression techniques solve this by reducing the token count while preserving the information that matters. […]

Read more →

Searching in

Category: Technology Engineering

The Rise of GitOps: Automating Deployment and Improving Reliability

Batch Inference Optimization: Maximizing Throughput and Minimizing Costs

GitOps with a comparison between Flux and ArgoCD and which one is better for use in Azure AKS

LLM Monitoring and Alerting: Building Observability for Production AI Systems

Structured Output from LLMs: JSON Mode, Function Calling, and Pydantic Patterns (Part 1 of 2)

Context Compression Techniques: Fitting More Information into Limited Token Budgets