Yazılım Çorbası: Yazılım Mimarisi - Lambda Architecture

Giriş

Not : Kappa Mimarisi yazısına bakabilirsiniz

2011 yılında Nathan Marz tarafından teklif edilen bir yaklaşım. Açıklaması şöyle

In 2011, Nathan Marz proposed an important approach to tackling the limitations of the CAP theorem in his blog, it called the Lambda architecture.

CAP teoremine göre Consistency seçersem, DB offline ise veri kaybederim. Availability seçersem de her zaman en son veriyi okuyamam. Lambda mimari ise CAP teoremindeki problemin tanımı şöyle söylüyor

... the use of mutable state in databases and the use of incremental algorithms to update that state. It is the interaction between these problems and the CAP theorem that causes complexity.

Bu mimari şeklen şöyle.

Bu mimaride 3 kısım var

1. Batch Processing yani Hadoop. Hadoop kendi sistemindeki verileri işler ve DB'deki tablolara yazar

2. Stream Processing. Apache Storm veya benzeri bir şey, son batch'ten itibaren gelen verileri işler ve DB'deki farklı tablolara yazar.

3. Interactive Queries . Bu kısım şekilde yok ancak DB'deki iki farklı tabloyu birleştirerek dışarıya sunar.

1. Batch Layer

Batch Layer'daki veri salt okunur ve eklenir. Açıklaması şöyle

Data in the master dataset must hold three properties as follows.
- Data is raw
- Data is immutable
- Data is eternally true

The master dataset is the source of truth. Even if you were to lose all your serving layer datasets and speed layer datasets, you could re-construct your application from the master dataset.

Bu katmanda "batch views" sonuçlar hesaplanıyor. Açıklaması şöyle

The first layer (the batch layer) stores the entire data set and computes batch views. The stored data set is immutable and append-only. New data is continually streamed in and appended to the data set, but old data will always remain unchanged. The batch layer also computes batch views, which are queries or functions on the entire data set. These views can subsequently be queried for low-latency answers to questions of the entire data set. The drawback, however, is that it takes a lot of time to compute these batch views.

Batch Processing sürekli eklenen veriyi iki şekilde işleyebilir. Açıklaması şöyle

Because our master dataset is continually growing, we must have a strategy for managing our batch views when new data becomes available.
- Re-computation algorithms: throwing away the old batch views and re-computing functions over the entire master dataset.
- Incremental algorithms: updating the views directly when new data arrives.

2. Stream Processing veya Speed Layer

Açıklaması şöyle. Burada önemli olan şey "batch views" hesaplaması yeniden yapıldıktan sonra, stream processing layer'daki veriyi temizlemek. Böylece hesaplamaya sıfırdan başlayacaktır. Neticede elimizde "stream views" tabloları olacaktır

The data that streams into the batch layer also streams into the speed layer. The difference is that while the batch layer keeps all of the data since the beginning of its time, the speed layer only cares about the data that has arrived since the last set of batch views completed. The speed layer makes up for the high latency in computing batch views by processing queries on the most recent data that the batch views have yet to take into account.

3. Serving Layer

Açıklaması şöyle. Aslında bu katmak Batch Layer içindeymiş gibi de düşünülebilir. Amacı "batch views" tablolarını dış dünyaya açmak.

The serving layer loads in the batch views and, much like a traditional database, allows for read-only querying on those batch views, providing low-latency responses. As soon as the batch layer has a new set of batch views ready, the serving layer swaps out the now-obsolete set of batch views for the current set.

4. Interactive Queries Layer

Hem "bath views" hem de "stream views" tablolarını birleştirerek dış dünyaya açar.

Lambda Mimarisi Nerede Kullanılır

Big Data kullanan IoT, Machine Learning projelerinde kullanılabilir.

Lambda Mimarinin Dezavantajları

1. İki Farklı Kod Olması

Açıklaması şöyle. Tabii bu durum doğal olarak maliyete yansıyacaktır.

... the challenge of maintaining two separate sets of code to compute views for the batch layer and the speed layer. Both layers operate on the same set — or, in the case of the speed layer, subset — of data, and the questions asked of both layers are similar. However, because the two layers are built on completely different systems (for example, Hadoop or Snowflake for the batch layer, but Storm or Spark for the speed layer), code maintenance for two separate systems can be complicated.

Yazılım Çorbası

10 Şubat 2021 Çarşamba

Yazılım Mimarisi - Lambda Architecture - Big Data İçindir

Hiç yorum yok:

Yorum Gönder

Blog Arşivi