7 Mart 2023 Salı

Apache Beam - Batch and Streaming Data Processing

Giriş
Şeklen şöyle. Apache Beam farklı dillerde kodlanabilir ve farklı Runner'lar kullanabilir


Gradle
Şöyle yaparız
implementation("org.apache.beam:beam-sdks-java-core:2.45.0")
runtimeOnly("org.apache.beam:beam-runners-direct-java:2.45.0")
Örnek
Şöyle yaparız
public class App {
  public static void main(String[] args) {
    PipelineOptions options = PipelineOptionsFactory.create();
    // Create pipeline
    Pipeline p = Pipeline.create(options);
    // Read text data from Sample.txt
    PCollection<String> textData = p.apply(TextIO.read().from("Sample.txt"));
    // Write to the output file with wordcounts as a prefix
    textData.apply(TextIO.write().to("wordcounts"));
    // Run the pipeline
    p.run().waitUntilFinish();
  }
}
Çıktı wordcounts-00000-of-00001 dosyasındadır. Açıklaması şöyle
1. Create a PipelineOption.
2. Create a Pipeline with the option.
3. Add the logic to read data from Sample.txt to the pipeline and get the return value as PCollection, which is an abstraction of dataset in Apache Beam.
4. Add another step to write the return value in the previous step to output file with name starting with wordcounts.
5. Lastly, run and finish the pipeline.



Hiç yorum yok:

Yorum Gönder