Yazılım Çorbası: Apache Beam - Batch and Streaming Data Processing

7 Mart 2023 Salı

Apache Beam - Batch and Streaming Data Processing

Giriş

Şeklen şöyle. Apache Beam farklı dillerde kodlanabilir ve farklı Runner'lar kullanabilir

Gradle

implementation("org.apache.beam:beam-sdks-java-core:2.45.0")
runtimeOnly("org.apache.beam:beam-runners-direct-java:2.45.0")

Örnek

Şöyle yaparız

public class App {
  public static void main(String[] args) {
    PipelineOptions options = PipelineOptionsFactory.create();
    // Create pipeline
    Pipeline p = Pipeline.create(options);
    // Read text data from Sample.txt
    PCollection<String> textData = p.apply(TextIO.read().from("Sample.txt"));
    // Write to the output file with wordcounts as a prefix
    textData.apply(TextIO.write().to("wordcounts"));
    // Run the pipeline
    p.run().waitUntilFinish();
  }
}

Çıktı wordcounts-00000-of-00001 dosyasındadır. Açıklaması şöyle

1. Create a PipelineOption.
2. Create a Pipeline with the option.
3. Add the logic to read data from Sample.txt to the pipeline and get the return value as PCollection, which is an abstraction of dataset in Apache Beam.
4. Add another step to write the return value in the previous step to output file with name starting with wordcounts.
5. Lastly, run and finish the pipeline.

Yazılım Çorbası

7 Mart 2023 Salı

Apache Beam - Batch and Streaming Data Processing

Hiç yorum yok:

Yorum Gönder

Blog Arşivi