Nithin Mohan TK

Ollama: The Complete Guide to Running Open Source LLMs Locally

Posted on July 10, 2024 by Nithin Mohan TK 6 min read

Introduction: Ollama has revolutionized how developers run large language models locally. With a simple command-line interface and seamless hardware acceleration, you can have Llama 3.2, Mistral, or CodeLlama running on your laptop in minutes—no cloud API keys, no usage costs, complete privacy. Built on llama.cpp, Ollama abstracts away the complexity of model quantization, memory management, […]

Read more →

LLM Output Parsing: From Raw Text to Typed Objects

Posted on July 10, 2024 by Nithin Mohan TK 9 min read

Introduction: LLMs generate text, but applications need structured data. Parsing LLM output reliably is surprisingly tricky—models don’t always follow instructions, JSON can be malformed, and edge cases abound. This guide covers robust output parsing strategies: using JSON mode for guaranteed valid JSON, Pydantic for type-safe parsing, handling partial and streaming outputs, implementing retry logic for […]

Read more →

Cost Optimization for AI Workloads: Tracking and Reducing LLM Costs

Posted on July 8, 2024 by Nithin Mohan TK 5 min read

Last quarter, our LLM costs hit $12,000. In a single month. We had no idea where the money was going. No tracking, no budgets, no alerts. That’s when I realized: cost optimization isn’t optional for AI workloads—it’s survival. Here’s how we cut costs by 65% without sacrificing quality. Figure 1: Cost Optimization Architecture The $12,000 […]

Read more →

Conversation State Management: Context Tracking, Slot Filling, and Dialog Flow

Posted on July 8, 2024 by Nithin Mohan TK 15 min read

Introduction: Conversational AI applications need to track state across turns—remembering what users said, what information has been collected, and where they are in multi-step workflows. Unlike simple Q&A, task-oriented conversations require slot filling, context tracking, and flow control. This guide covers practical state management patterns: conversation context objects, slot-based information extraction, finite state machines for […]

Read more →

Cloud-Native Machine Learning: Building Scalable Models for Production

Posted on July 6, 2024 by Nithin Mohan TK 5 min read

The journey from experimental machine learning models to production-grade systems represents one of the most challenging transitions in modern software engineering. After spending two decades building distributed systems and watching countless ML projects struggle to move beyond proof-of-concept, I’ve developed a deep appreciation for cloud-native approaches that treat machine learning infrastructure with the same rigor […]

Read more →

GPU Resource Management in Cloud: Optimizing AI Workloads

Posted on June 25, 2024 by Nithin Mohan TK 10 min read

GPU resource management is critical for cost-effective AI workloads. After managing GPU resources for 40+ AI projects, I’ve learned what works. Here’s the complete guide to optimizing GPU resources in the cloud. Figure 1: GPU Resource Management Architecture Why GPU Resource Management Matters GPU resources are expensive and limited: Cost: GPUs are the most expensive […]

Read more →

Searching in

Author: Nithin Mohan TK

Ollama: The Complete Guide to Running Open Source LLMs Locally

LLM Output Parsing: From Raw Text to Typed Objects

Cost Optimization for AI Workloads: Tracking and Reducing LLM Costs

Conversation State Management: Context Tracking, Slot Filling, and Dialog Flow