3 Mart 2021 Çarşamba

Apache Kafka Streams API

Giriş 
Açıklaması şöyle
Kafka Streams is a client library that uses Kafka topics as sources and sinks for the data processing pipeline.
Bir başka açıklama şöyle
Kafka Streams is a simple Java library that enables streaming application development within the Kafka framework.
Açıklaması şöyle
Kafka Streams is a JVM based library for performing streaming transformations on data sourced from Kafka. It is a very good default choice for data processing backed by Kafka due to its simple deployment model, horizontal scalability, resilience to failures and straightforward well documented public API.

The API lets you read data from Kafka then perform a number of transformations on the data including filtering, mapping, aggregation and joining data from multiple topics. The API supports both stateless and stateful transformations. Stateful transformations back up their state into changelog topics in Kafka and cache values local to the processing nodes, typically using RocksDB.
Streams API kullanmak için iki yöntem var. Bunlar şöyle
There are two approaches to writing a Kafka Streams application:
- The high level DSL,
- And the low level Processor API.
Java örneklerini Kafka Streams API yazısına taşıdım

Exactly once semantic support
Açıklaması şöyle
Exactly once semantic support was added to Kafka Streams in the 0.11.0 Kafka release. Enabling exactly once is a simple configuration change setting the streaming configuration parameter processing.guaranteeto exactly_once(the default is at_least_once).

A typical workload for a Kafka Streams application is to read data from one or more partitions, perform some transformations on the data, update a state store (such as a count), then write the result to an output topic. When exactly once semantics is enabled, Kafka Streams atomically updates consumer offsets, local state stores, state store changelog topics and production to output topics all together. If any one of these steps fail, all of the changes are rolled back.

Side effects
The Kafka Streams API lets you perform any action when processing input data, for instance you can write the data directly to a database, or fire off an email. These “side effect” operations are explicitly not covered by the exactly once guarantee. If a stream job fails after processing data, but just prior to writing it back to Kafka, it will be reprocessed the next time the Kafka Streams worker is restarted re-running any side-effect operations such as emailing a customer.

It is best to keep your Kafka Streams transform “pure” with no side effects beyond updating state stores and writing back to Kafka. This way your application will be more resilient (it won’t fail if it can’t contact the email server) and will shorten the length of transactions.

Side effects can then be performed via Kafka Connect Sink connectors or custom consumers.
Enterprise Messaging vs Event Streaming
Enterprise Messaging ve Event Streaming şu açılardan farklıdır
1. Message processing style
2. Message consumption style
3. Access to message history
4. Fine-grained subscription to messages
5. Message delivery guarantees
1. Message Processing Style
Enterprise Messaging uygulamaları her mesajı bireysel olarak el alır. Streaming uygulamaları mesajları bütünsel olarak ele alabilir.

Örnek
Farkları gösteren bir örnek şöyle
For example, if the reading is less than 20, the thermostat turns on the heater. It repeats the same logic the next time a message comes. Even though this is unnecessary, that is how the messaging works.

For example, the thermostat can calculate the average reading over the last minute and decides whether to turn on the heater. The average seems a realistic measure here.
2. Message consumption style
Enterprise Messaging uygulamaları mesajları siler, Streaming uygulamaları mesajları silmez. 

3. Access to message history
Message consumption style maddesinin sonucu olarak mesaj tarihçesi de oluşur veya oluşmaz

4. Fine-grained subscription to messages
Enterprise Messaging uygulamaları mesajlara filtreler koyabilir, Streaming uygulamaları bir partitiondaki tüm mesajları alır ve  kendisi filtreler.

5. Message delivery guarantees
Bazı seviyeler şöyle
- At least once delivery
-Exactly once delivery
- Transactionally coordinated delivery
Açıklaması şöyle
Messaging systems are good at handling exactly once and transactionally coordinated delivery use cases while streaming is good at handling at least once delivery scenarios.
Stream Analytics
Aslında tüm bu farkların söylemeye çalıştığı şey Stream Analytics. Açıklaması şöyle
But, capturing the data streams is only one part of the challenge. Some data processing has to be performed simultaneously with the incoming data to be able to use the results promptly for decision-making. For example, a selection of products in a shopping cart system can be the trigger for a recommendation system to be executed in parallel. This type of requirement creates another building block in the streaming architecture – called Stream Analytics.

Sometimes a single event from the data stream is enough to trigger a predefined business logic. However, it is often necessary to be able to recognize connections between different events in order to run a high-level business process generating real business value. Such a connection can be established between time-shifted similar events by accumulating them over a given period of time. For example, short-term increased demand for a certain product in an online shop system could trigger the start of an additional production line. In other cases, it may be necessary to correlate certain events of different types and merge the data to trigger the corresponding business process. These methods are also known as Windowing and  Joining.
Stream Analytics için iki temel yöntem var. Windowing ve Joining. Şeklen şöyle


Kafka Streams ve CQRS
Kafka Stream'lerini Event Source olarak kullanmak mümkün.

Hiç yorum yok:

Yorum Gönder