30 Mayıs 2023 Salı

Apache Storm - İlk Stream Processing System

Giriş
Açıklaması şöyle. Batch Processing (Toplu iş) mantıkla çalışan Hadoop yerine Stream Processing (Akış İşleme) mantığını getirdi.
For many experienced engineers, Apache Storm may be the first stream processing system they have ever used. Apache Storm is a distributed stream computing engine written in Clojure, a niche JVM-based language many junior developers may not know. Storm was open-sourced by Twitter in 2011.

In the big-data era dominated by Hadoop, the emergence of Storm blew many developers’ minds in data processing. Traditionally, the way users process data is first to import a large amount of ta into HDFS, and then use batch computing engines such as Hadoop to analyze the data. With Storm, data could be processed on-the-fly, immediately after it flows into the system. With Storm, data processing latency was drastically reduced: Storm users could receive the latest results in just a few seconds without waiting for hours or even days.
Apache Storm'dan Önce
İlk akademik makale 2002 yılında çıkıyor. Daha sonra bazı ürünler takip ediyor. Açıklaması şöyle.
Just a few years after being studied in academia, stream processing technology was adopted by large enterprises. The top three database vendors, Oracle, IBM, and Microsoft, consecutively launched their stream processing solutions known as Oracle CQL, IBM System S, and Microsoft SQLServer StreamInsight. Interestingly, instead of developing a standalone stream processing system, all these vendors have chosen to integrate stream processing functionality into their existing systems.

Apache Storm'un Eksikleri
Açıklaması şöyle. En önemli eksiği SQL arayüzü sağlamaması
Apache Storm was groundbreaking at its time. However, the initial design of Storm was far from perfect. It lacked many basic functionalities that modern stream processing systems, by default, have to provide: state management, exactly-once semantics, dynamic scaling, SQL interface, etc. But it inspired many talented engineers to build next-generation stream processing systems. Just a few years after Storm emerged, many new stream processing systems were invented: Apache Spark Streaming, Apache Flink, Apache Samza, Apache Heron, Apache S4, and many more. Spark Streaming and Flink eventually stand out and become legends in the stream processing field.
Apache Kafka ve Apache Storm İlişkisi
Kafka sadece message broker, storm ise mesajları işleyen kısım. Akış şöyle.
Realtime application -> Kafka -> Storm -> NoSQL -> d3js

Hiç yorum yok:

Yorum Gönder