Yazılım Çorbası: Protobuf

Giriş
proto2 ve proto3 sürümleri var aralarında haylice söz dizimi farklılığı mevcut.

Protobuf gRPC için de kullanılabilir.

Neden JSON Değil?
Designing Data Intensive Applications kitabından bazı cümleler şöyle.

1. Byte array göndermek sıkıntı. Byte array genellikle Base64 olarak gönderiliyor.

2. JSON çok yer tutuyor. Sırf bunun için JSON için kodekler geliştirilmiş.

JSON distinguishes strings and numbers, but it doesn’t distinguish integers and floating-point numbers, and it doesn’t specify a precision. This is a problem when dealing with large numbers; for example, integers greater than 253 cannot be exactly represented in an IEEE 754 double-precision floatingpoint number, so such numbers become inaccurate when parsed in a language that uses floating-point numbers (such as JavaScript). An example of numbers larger than 253 occurs on Twitter, which uses a 64-bit number to identify each tweet. The JSON returned by Twitter’s API includes tweet IDs twice, once as a JSON number and once as a decimal string, to work around the fact that the numbers are not correctly parsed by JavaScript applications.

JSON and XML have good support for Unicode character strings (i.e., humanreadable text), but they don’t support binary strings (sequences of bytes without a character encoding). Binary strings are a useful feature, so people get around this limitation by encoding the binary data as text using Base64. The schema is then used to indicate that the value should be interpreted as Base64-encoded. This works, but it’s somewhat hacky and increases the data size by 33%.

JSON is less verbose than XML, but both still use a lot of space compared to binary formats. This observation led to the development of a profusion of binary encodings for JSON (MessagePack, BSON, BJSON, UBJSON, BISON, and Smile, to name a few)
and for XML (WBXML and Fast Infoset, for example). These formats have been adopted in various niches, but none of them are as widely adopted as the textual versions of JSON and XML.

JSON alternatifleri şöyle

1. Protocol Buffers (protobuf)
2. MessagePack
3. BSON (Binary JSON)
4. Apache Avro

Apache Thrift ve Protocol Buffers

Designing Data Intensive Applications kitabından bazı cümleler şöyle.

Apache Thrift and Protocol Buffers (protobuf) are binary encoding libraries that are based on the same principle. Protocol Buffers was originally developed at Google, Thrift was originally developed at Facebook, and both were made open source in 2007–08.
Both Thrift and Protocol Buffers require a schema for any data that is encoded.

Thrift and Protocol Buffers each come with a code generation tool that takes a schema definition like the ones shown here, and produces classes that implement the schema in various programming languages. Your application code can call this generated code to encode or decode records of the schema.

Field tags and schema evolution

Designing Data Intensive Applications kitabından bazı cümleler şöyle.

... an encoded record is just the concatenation of its encoded fields. Each field is identified by its tag number and annotated with a datatype (e.g., string or integer). If a field value is not set, it is simply omitted from the encoded record. From this you can see that field tags are critical to the meaning of the encoded data. You can change the
name of a field in the schema, since the encoded data never refers to field names, but you cannot change a field’s tag, since that would make all existing encoded data invalid.

You can add new fields to the schema, provided that you give each field a new tag number. If old code (which doesn’t know about the new tag numbers you added) tries to read data written by new code, including a new field with a tag number it
doesn’t recognize, it can simply ignore that field. The datatype annotation allows the parser to determine how many bytes it needs to skip. This maintains forward compatibility: old code can read records that were written by new code.

What about backward compatibility? As long as each field has a unique tag number, new code can always read old data, because the tag numbers still have the same meaning. The only detail is that if you add a new field, you cannot make it required.
If you were to add a field and make it required, that check would fail if new code read data written by old code, because the old code will not have written the new field that you added. Therefore, to maintain backward compatibility, every field you add after the initial deployment of the schema must be optional or have a default value.

Removing a field is just like adding a field, with backward and forward compatibility concerns reversed. That means you can only remove a field that is optional (a required field can never be removed), and you can never use the same tag number
again (because you may still have data written somewhere that includes the old tag number, and that field must be ignored by new code).

Şeklen şöyle. Burada üye alan isminin taşınmadığı, dolayısıyla sadece tag number gönderildiği görülebilir. Böylece istemci ve sunucu farklı isimler kullansa dahi sorun olmuyor.

Apache Avro

Apache Avro yazısına taşıdım.

Maven

protobuf-java library için şöyle yaparız

<dependencies>
  <dependency>
    <groupId>com.google.protobuf</groupId>
    <artifactId>protobuf-java</artifactId>
    <version>3.12.2</version>
  </dependency>
</dependencies>

Kod üretmek için protoc komutunu veya plugin kullanmak gerekiyor.

protoc komutu

protoc komutu yazısına taşıdım

Maven plugin

Örnek - java çıktısı

Şöyle yaparız. Bu tek plugin değil. Bir başka plugin de burada. Plugin'in "mvn generate-sources" komutu ile çalışıyor. Ya da default life cycle içinde de çalışıyor. Yani "mvn compile" ile

<plugin>
  <groupId>com.github.os72</groupId>
  <artifactId>protoc-jar-maven-plugin</artifactId>
  <version>3.11.4</version>
  <executions>
    <execution>
     <phase>generate-sources</phase>
      <goals>
        <goal>run</goal>
       </goals>
      <configuration>
        <inputDirectories>
          <include>${project.basedir}/src/main/Protobuf</include>
        </inputDirectories>
        <outputTargets>
          <outputTarget>
            <type>java</type>
            <addSources>main</addSources>
            <outputDirectory>
              ${project.basedir}/target/generated-sources/protobuf
            </outputDirectory>
        </outputTarget>
      </outputTargets>
    </configuration>
  </execution>
  </executions>
</plugin>

Açıklaması şöyle.

The Protobuf classes will be generated during the generate-sources phase. The plugin will look for proto files in the src/main/protobuf folder and the generated code will be created in the target/generated-sources/protobuf folder.

To generate the class in the target folder run mvn clean generate-sources

Not :

Üretile kodun şu metodları önemli

SerializeToString

parseFrom

Protobuf Örnekleri

Protobuf Dosyası yazısına taşıdım

Yazılım Çorbası

6 Ağustos 2020 Perşembe

Protobuf

Hiç yorum yok:

Yorum Gönder

Blog Arşivi