Lambda Architecture

2016. 5. 18. 14:41

Lambda Architecture

A repository dedicated to the Lambda Architecture (LA). We collect and publish examples and good practices around the LA.

Updates

27 Aug 2014 » A RAD Stack: Kafka, Storm, Hadoop, and Druid by Druid Committers
24 Jul 2014 » Deploop: A Lambda Architecture Provisioning Tool by Javi Roman
01 Jul 2014 » Nathan Marz's Big Data book by Michael Hausenblas
30 Jun 2014 » Speed Components by Michael Hausenblas
30 Jun 2014 » Serving Components by Michael Hausenblas
30 Jun 2014 » Batch Components by Michael Hausenblas
22 Jun 2014 » Buildoop: A Lambda Architecture ecosystem builder by Javi Roman
20 Jan 2014 » Lambda Architecture: A state-of-the-art by Pere Ferrera
19 Jan 2014 » An example Lambda Architecture for real-time analysis of hashtags using Trident, Hadoop and Splout SQL by Pere Ferrera
25 Dec 2013 » Twitter Summingbird by Michael Hausenblas
25 Dec 2013 » Lambdoop by Michael Hausenblas
25 Dec 2013 » Issues in Combined Static and Dynamic Data Management by Michael Hausenblas
24 Dec 2013 » Where Polyglot Persistence meets the Lambda Architecture by Michael Hausenblas
11 Dec 2013 » A real-time architecture using Hadoop and Storm by Nathan Bijnens
10 Dec 2013 » Why are we doing this and why are we doing this now? by Michael Hausenblas

What is the Lambda Architecture?

Nathan Marz came up with the term Lambda Architecture (LA) for a generic, scalable and fault-tolerant data processing architecture, based on his experience working on distributed data processing systems at Backtype and Twitter.

The LA aims to satisfy the needs for a robust system that is fault-tolerant, both against hardware failures and human mistakes, being able to serve a wide range of workloads and use cases, and in which low-latency reads and updates are required. The resulting system should be linearly scalable, and it should scale out rather than up.

Here’s how it looks like, from a high-level perspective:

LA overview

All data entering the system is dispatched to both the batch layer and the speed layer for processing.
The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) to pre-compute the batch views.
The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way.
The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.
Any incoming query can be answered by merging results from batch views and real-time views.

Resources

Big Data, book by Nathan Marz and James Warren
Applying the Big Data Lambda Architecture, Dr. Dobb’s article by Michael Hausenblas
The Lambda architecture: principles for architecting realtime Big Data systems, blog post by James Kinley
Lambda Architecture: Achieving Velocity and Volume with Big Data, article by Christian Prokopp
Lambda Architecture with Apache Spark by Michael Hausenblas

Who is behind this?

See the about us section for details.

저작자표시 (새창열림)

'빅데이터' 카테고리의 다른 글

분산 로그 수집기 Fluentd 소개 (0)	2016.06.14
람다 아키텍처(Lambda Architecture) (0)	2016.05.18
Can Spark Streaming survive Chaos Monkey? (0)	2016.05.11
Comparison of Apache Stream Processing Frameworks: Part 2 (0)	2016.05.11
Comparison of Apache Stream Processing Frameworks: Part 1 (0)	2016.05.11

개발자 블로그