Implement semantic search using text embeddings for more relevant results than keyword matching.
Tag: ETL
Tips and Tricks – Use dbt for Maintainable Data Transformations
Build modular, tested, documented data transformations with dbt.
Tips and Tricks – Partition Large Tables for Query Performance
Use table partitioning to dramatically speed up queries on large datasets.
Spark Isn’t Magic: What Twenty Years of Data Engineering Taught Me About Distributed Processing
Every few years, a technology emerges that fundamentally changes how we think about data processing. MapReduce did it in 2004. Apache Spark did it in 2014. And after spending two decades building data pipelines across enterprises of every size, I’ve learned that the difference between a successful Spark implementation and a failed one rarely comes… Continue reading
Tips and Tricks – Use Span for Zero-Allocation String Parsing
Eliminate heap allocations when parsing strings by using Span
Building the Modern Data Stack: How Spark, Kafka, and dbt Transformed Data Engineering
The data engineering landscape has undergone a fundamental transformation over the past decade. What once required massive Hadoop clusters and specialized MapReduce expertise has evolved into a sophisticated ecosystem of purpose-built tools that work together seamlessly. Having architected data platforms across multiple industries, I’ve witnessed this evolution firsthand and can attest that understanding how these… Continue reading
Azure Data Factory: A Solutions Architect’s Guide to Enterprise Data Integration
Enterprise data integration has evolved from simple ETL batch jobs to sophisticated orchestration platforms that handle diverse data sources, complex transformations, and real-time processing requirements. Azure Data Factory represents Microsoft’s cloud-native answer to these challenges, providing a fully managed data integration service that scales from simple copy operations to enterprise-grade data pipelines. Having designed and… Continue reading
Tips and Tricks – Apply Strangler Fig Pattern for Legacy Migration
Gradually replace legacy systems by routing traffic to new implementations incrementally.
Tips and Tricks – Implement Domain Events for Loose Coupling
Use domain events to decouple components and enable reactive architectures.