Yazılım Çorbası: Nisan 2023

25 Nisan 2023 Salı

OpenTelemetry Collector - Sidecar

Örnek

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: sidecar
spec:
  mode: sidecar
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
      batch:
    exporters:
      logging:
      otlp:
        endpoint: "<path_to_central_collector>:4317"
    service:
      telemetry:
        logs:
          level: "debug"
      pipelines:
        traces:
          receivers: [otlp]
          processors: []
          exporters: [logging, otlp]

instrumentation.yaml şöyledir

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: java-instrumentation
spec:
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    type: always_on
  java:

deployment.yaml şöyledir

apiVersion: apps/v1
kind: Deployment
metadata:
  name: petclinic
  labels:
    app: petclinic
spec:
  replicas: 1
  selector:
    matchLabels:
      app: petclinic
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/inject-java: 'true'
        sidecar.opentelemetry.io/inject: 'sidecar'
      labels:
        app: petclinic
    spec:
      containers:
      - name: petclinic
        image: <path_to_petclinic_image>
        ports:
        - containerPort: 8080

Açıklaması şöyle

To enable the instrumentation, we need to update the deployment file and add annotations to it. This way we tell the OpenTelemetry Operator to inject the sidecar and the java-instrumentation to our application.

petclinic-svc.yaml şöyledir

apiVersion: v1
kind: Service
metadata:
  name: petclinic-service
spec:
  selector:
    app: petclinic
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

OpenTelemetry Backend

Giriş

Açıklaması şöyle

Even though OpenTelemetry does not provides their own backend, by using it, we are not tied to any tool or vendor, since it is vendor agnostic. Not only can we use any programming language we want, but we can also pick and choose the storage backend and also easily switch to another backend/vendor, by just configure another exporter.

Honeycomb, Lightstep, New Relic, Tempo (Grafana Cloud) gibi bir sürü şey olabilir

Jaeger ve Zipkin

Açıklaması şöyle

Jaeger and Zipkin predate OpenTelemetry, so each has its trace transport format. They do provide integration with the OpenTelemetry format, though.

Jaeger

Açıklaması şöyle

Jaeger inspired by Dapper and OpenZipkin, is a distributed tracing platform created by Uber Technologies and can be used for monitoring microservices based distributed systems.

Örnek

Jaeger'ı çalıştırmak için şöyle yaparız

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -p 5775:5775/udp \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.32

Örnek - Docker Compose ve Jaeger

Şöyle yaparız

version: "3"

services:
  jaeger:
    image: jaegertracing/all-in-one:1.37           #1
    environment:
      - COLLECTOR_OTLP_ENABLED=true                #2
    ports:
      - "16686:16686"                              #3

Açıklaması şöyle

1. Use the all-in-one image
2. Very important: enable the collector in OpenTelemetry format
3. Expose the UI port

OpenTelemetry Collector

Deployment

İki yöntem var

1. Sidecar

2. A Central (Gateway) OpenTelemetry Collector

1. Sidecar

Açıklaması şöyle

In this scenario, the OpenTelemetry instrumented application sends the data to a (collector) agent that resides together with the application. This agent will then offload responsibility and handle all the trace data from the instrumented application.

The collector can be deployed as an agent via a sidecar which can be configured to send data directly to the storage backend.

Şeklen şöyle

2. A Central (Gateway) OpenTelemetry Collector

Açıklaması şöyle

You can also decide to send the data to another OpenTelemetry collector and from the (central) collector send the data further to the storage backend. In this configuration, we have a central OpenTelemetry collector that is deployed using the deployment mode, which comes with many advantages like auto scaling.

Collector Bileşenleri

Şeklen şöyle

OpenTelemetry Collector üzerinde 3 tane bileşen var. Bunlar

1. Receivers

2. Processors

3. Exporters

OpenTelemetry Protocol

Açıklaması şöyle

OpenTelemetry Protocol (OTLP) specification describes the encoding, transport, and delivery mechanism of telemetry data between telemetry sources, intermediate nodes such as collectors and telemetry backends.

Each language SDK provides an OTLP exporter you can configure to export data over OTLP. The OpenTelemetry SDK then transforms events into OTLP data.

1. Receivers

Açıklaması şöyle

A receiver, which can be push or pull based, is how data gets into the collector. The OpenTelemetry collector can receive telemetry data in multiple formats.

Örnek - gRPC

Şöyle yaparız

otlp:
  protocols:
    grpc:
      endpoint: "0.0.0.0:4317"

Örnek - gRPC + Http

Şöyle yaparız

otlp:
  protocols:
    grpc:
    http:

2. Processors

Açıklaması şöyle

Processors are run on data between being received and being exported. Processors are optional though some are recommended

batch Processor

Açıklaması şöyle

The batch processor accepts spans, metrics, or logs and places them into batches. Batching helps better compress the data and reduce the number of outgoing connections required to transmit the data. This processor supports both size and time based batching.

Örnek

Açıklaması şöyle

Configuring a processor does not enable it. Processors are enabled via pipelines within the service section.

Şöyle yaparız

service:
  traces:
    receivers: [opencensus, jaeger]
    processors: [batch]
    exporters: [opencensus, zipkin]
  ...
processors:
  batch:

3. Exporters

Açıklaması şöyle

In order to visualise and analyse the telemetry you will need to use an exporter. An exporter is a component of OpenTelemetry and is how data gets sent to different systems/back-ends.

Açıklaması şöyle. Yani OpenTelemetry protocol (OTLP) formatındaki veriyi başka bir formata çevirir.

Generally, an exporter translates the internal format into another defined format, so you can send different types of data to different backends. For example, you can send metrics (i.e Prometheus) to one backend and traces to another.

Açıklaması şöyle

- The Jaeger exporter is used to send data to Jaeger.

The Logging exporter is very useful when troubleshooting as it exports data to the console

Console Exporter

Açıklaması şöyle

A common exporter to start with and that is very useful for development and debugging tasks is the console exporter.

Örnek

Açıklaması şöyle

In the exporters section, you can add more destinations. For example, if you would also like to send trace data to Grafana Tempo, just add these lines to the central_collector.yaml file.

Şöyle yaparız

pipelines:
  traces:
    receivers: [otlp]
    processors: []
    exporters: [logging, otlp]

exporters:
  logging:
  otlp:
   endpoint: "<tempo_endpoint>"
   headers:
     authorization: Basic <api_token>

4. Extensions

Açıklaması şöyle

Extensions are available primarily for tasks that do not involve processing telemetry data. Examples of extensions include health monitoring, service discovery, and data forwarding. Extensions are optional.

Örnek

Şöyle yaparız

extensions:
  health_check:
  pprof:
  zpages:
  memory_ballast:
    size_mib: 512

24 Nisan 2023 Pazartesi

RAID Disk - Redundant Array of Independent Disks

Giriş

RAID bir sürü işi görebilir. Açıklaması şöyle. Konfigürasyona göre redundancy için kullanılabilir.

In general, the purpose of a RAID, depending on the chosen Raid level, provides a different balance among the key goals data redundancy, availability, performance and capacity.

Açıklaması şöyle.

RAID is not a backup mechanism; it's a redundancy mechanism
...
The main advantage of a redundant system is that it will not go down completely when a complete disk failure happens – the mirror allows you to continue using the NAS without interruption while the array is rebuilding.

RAID 5 ve RAID 6 var

Hardware RAID ve Software RAID

LSI, DELL, HP gibi üreticiler Hardware RAID sağlıyorlar. Açıklaması şöyle

Q : RAID with different drive types possible?
A : Hardware RAID controllers from LSI, DELL, HP etc. does not allow mixing disks with different interfaces (eg: SATA and SAS) in a single array. What you can do is to create two different arrays, each for a specific interface protocol - a SATA array and a SAS one, for example.

Software RAID does not share this limitation - basically any block device (even a loopback device) can be part of any arrays. However, mixing different disk technologies is generally discouraged to avoid an unbalanced array (performance wise). For cache drives, as ZFS L2ARC or LVM dm-cache, things are different - here you actually want a faster drive. So, for example, using an NVMe cache in front of a SATA array is perfectly fine.

RAID 6

Kırmızı/Mavi ışık yanıyorsa bir disk senkronizasyonu kaçırmıştır ve "Rebuild" işleminde olabilir.

22 Nisan 2023 Cumartesi

Yazılım Mimarisi - Strangler (Sarmaşık) Örüntüsü

Giriş

Not : Strangler için OpenAPI ya da eski adıyla Swagger kullanılabilir.

Açıklaması şöyle

The name refers to strangler vines that grow around trees, gradually building up a solid structure that eventually is able to completely replace the tree that they started growing around. The strangler pattern for microservices means to gradually and strategically build a "mesh" of microservices around an existing monolith, replacing certain functions as needed, and over time potentially replacing the monolithic application entirely.

Stranger örüntüsü API'yi değiştirirken testlerin bozulmasını da engelleyebilir. Açıklaması şöyle

build an anti-corruption layer, or a facade, or a proxy between your tests and the SUT, so you can change the API of the SUT without having to change too many parts of your tests. That will allow you to keep the tests as they are for now. Later, when you have some time for cleaning up, you may decide to migrate the tests to the new API one-by-one.

This approach is also known as strangler pattern and can often be used to gradually swap out legacy components by components with a new design, not only for tests.

Şeklen şöyle. Burada ilk hafta Strangler örüntüsü istekler halen eski sisteme yönlendiriyor. Daha sonraki haftalarda micro servislere yönlendiriyor

6 Nisan 2023 Perşembe

Google Cloud True Time

Giriş

Dağıtık bir ağda zaman senkronizasyonunu sağlamak zor bir iş. Google bunu True Time ile sağlıyor. Google True Time, Google Spanner gibi veri tabanlarında kullanılıyor. Açıklaması şöyle

Google created a distributed SQL database called Spanner, it relies on something called True Time for very strong consistency of transactions across nodes. Google knows that time is uncertain, so True Time defines a bounded and small uncertainty of time window where transactions can not be ordered definitely. True Time works as a Global Time across Google datacenters.

True Time is expressed as a time interval [earliest, latest]. It exposes an API called now() whose value lies in this interval. The uncertainty interval varies between 1 ms to 7 ms — note that the maximum uncertainty has a tight upper bound.

The APIs TT.before(t) or TT.earliest() and TT.after(t) or TT.latest() take a timestamp as input and answers whether the given timestamp is before or after the current uncertainty interval.

The relation between TT.earliest(), TT.latest() and absolute time of an event is:

TT.earliest ≤ Absolute Time of current event ≤ TT.latest

Google bunu nasıl sağlıyor?

Açıklaması şöyle

Google does this magic by couple of tricks:

Optimized Infrastructure: Google infra runs on specially designed private network. They have optimized the network over time, it has a lot of redundancy of connections across datacenters and failure handling mechanisms built in. It does not mean network partition don’t happen or things don’t go wrong — however the possibility of such incidents and communication latency reduces a lot.

Using own clocks: True Time does not rely on external NTP pools or servers. Rather, Google datacenters are equipped with GPS receivers and Atomic clocks. See the below picture of such an installation:

AWS Time Sync Service

Açıklaması şöyle

Inspired from Google True Time, AWS also manages its own fleet of Atomic clocks and GPS clock receivers. Any EC2 server can connect to these time references via NTP using Chrony daemon for more accurate time rather than connecting to external NTP pools or time servers over NTP. More details can be found here. Leap second smearing is also handled by Amazon Time Sync Service.

4 Nisan 2023 Salı

Amazon Web Service (AWS) DynamoDB - Hem Key-value Hem de Document-Oriented Çalışabilir

Giriş

Bir NoSQL veri tabanıdır, ancak multi-model'i destekler yani key-value system ve document store olarak kullanılabilir.

1. Key-value store olarak Cassandra, HBase, Redis ile rakip

2. Document DB olarak MongoDB ile rakip.

Burada bir tane yazı var. Sorry, Amazon DynamoDB is not column-based, and it is not OLAP

DynamoDB ilk olarak 2012 yılında piyasaya çıktı

Özellikleri şöyle

Açıklaması şöyle

DynamoDB is Amazon's answer to MongoDB, a NoSQL database that works on JSON documents. These databases rely heavily on nested data and do not enforce any strict schema unless the developer turns that option on. That means that DynamoDB is great for high-volume sites like a CMS or mobile apps with a lot of traffic. For example, both Major League Baseball and Duolingo make use of DynamoDB.

Pricing Model

1. throughput model : Kullanım miktarı tahmin edilir ve bu aşılamaz.

2. on-demand pricing model : Kullanım miktarına göre fiyatlandırılır

Yanlış Kullanım

1. Normalizing Data

DynamoDB bir SQL veri tabanı değildir. Bu yüzden denormalized şekilde kullanılmalıdır

2. Single Table Design

Single Table Design (STD) için açıklaması şöyle

In DynamoDB, you are charged for the capacity throughput and indexes of each table.

The more tables you have the more you will end up paying (especially if each table has several secondary indexes).

The STD instead encourages grouping all of your (related) data entities in one table.

Replication

Açıklaması şöyle

Departing from the traditional SQL-based offerings, DynamoDB offers a persistence model where the information is spread into partitions with a dual consistency approach.

A write operation first saves the updated data to a persistence node. It is then synchronously copied to another persistence node. Only at this point, the operation is confirmed to the caller.

There is an asynchronous process that copies it from the second persistence node to a third one.

This means you have the redundancy of the data being persisted into 3 nodes, each located in a separate AZ. At the same time, you do not need to wait for all 3 nodes to save before returning the operation, which helps to maintain the latency at a lower level.

When retrieving data you have two choices: eventually consistent and strongly consistent.

If you opt for eventually consistent, your operation will be directed to any of the 3 nodes. If it happens to be the asynchronously copied one, there is a chance the information you will retrieve will be outdated when compared to the main node.

In contrast, the strongly consistent mode will only be directed to the main node.

MongoDB vs DynamoDB

- MongoDB en büyük belge büyüklüğü olarak 16 MB'yi destekler. Bu DynamoDB'de 400 KB

- MongoDB C++ ile geliştirilmiştir. DynamoDB Java ile geliştirilmiştir

Primary Key

Açıklaması şöyle

... it only allows three data types for primary keys: string, number, and binary. (It does support many different data types for other attributes within a table.)

Açıklaması şöyle

DynamoDB has a weird take on the concept of a primary key. You will have two keys to identify specific data:
Primary Key = Partition Key + Sort Key

Şeklen şöyle

Örnek

Tablo şöyle olabilir

  PRIMARY_KEY SORT_KEY      OTHER_INFO
1. ORDER#1234 PRODUCT#1   ProductName,Price etc
2. ORDER#1234 INVOICE#1     InvoiceDate, PaymentInfo
3. ORDER#1234 CUSTOMER#1 CustomerName, ShippingAddress

Value Type

Açıklaması şöyle

Dynamo DB stores the value in a JSON serialized format

Sütun Tipleri

Açıklaması şöyle

DynamoDB supports many different data types for attributes within a table. They can be categorized as follows:

1. Scalar Types – A scalar type can represent exactly one value. The scalar types are number, string, binary, Boolean, and null.

2. Document Types – A document type can represent a complex structure with nested attributes, such as you would find in a JSON document. The document types are list and map.

3. Set Types – A set type can represent multiple scalar values. The set types are string set, number set, and binary set.

Global Secondary Index

Eğer istediğimiz veri Primary Key dışındaysa bu index kullanılır. Tek problem her indeksin tek başına bir tablo olması

DynamoDB Disadvantages

Açıklaması şöyle

Size limit — item can only reach 400KB in size
Limited querying options (limited number of indices)
Throttling on burst throughput (and hot keys in certain situations)

create-table

Örnek

Şöyle yaparız

aws dynamodb --endpoint-url=http://localhost:4566 create-table \
    --table-name Music \
    --attribute-definitions \
        AttributeName=Artist,AttributeType=S \
        AttributeName=SongTitle,AttributeType=S \
    --key-schema \
        AttributeName=Artist,KeyType=HASH \
        AttributeName=SongTitle,KeyType=RANGE \
--provisioned-throughput \
        ReadCapacityUnits=10,WriteCapacityUnits=5

describe-table

Şöyle yaparız

aws --endpoint-url=http://localhost:4566 dynamodb describe-table 
  --table-name Music | grep TableStatus

put-item

Şöyle yaparız

aws --endpoint-url=http://localhost:4566 dynamodb put-item \
  --table-name Music  \
  --item \
  '{"Artist": {"S": "No One You Know"}, "SongTitle": {"S": "Call Me Today"},
    "AlbumTitle": {"S": "Somewhat Famous"}, "Awards": {"N": "1"}}'

scan

Şöyle yaparız

aws dynamodb scan --endpoint-url=http://localhost:4566 --table-name Music

Örnek

Açıklaması şöyle

Because DynamoDB is not relational and does not enforce ACID by default, it must use a modified version of standard SQL. Amazon has developed a query language called PartiQL which uses many SQL concepts but is built for highly nested data. The query below takes advantage of the key-value underpinnings of DynamoDB in a relatively SQL standard way.

Şöyle yaparız

UPDATE
    Music
SET
    AwardsWon = 1
SET
    AwardDetail = { 'Grammys': [ 2020, 2018 ] }
WHERE
    Artist = 'Acme Band'
    AND SongTitle = 'PartiQL Rocks'

3 Nisan 2023 Pazartesi

Cache Stratejileri - Cache Access Patterns Refresh-ahead

Giriş

Şeklen şöyle

Açıklaması şöyle

.. it refreshes the cache data before its expiration time,it is done for hot-data, the data we expect to be requested in the near future.

Approach
1. Supposed the cached data’s expiration time is 60 seconds and the refresh-ahead factor is 0.5.
2. If the cached object is accessed after 60 seconds, Coherence will perform a synchronous read from the cache store to refresh its value.
3. If the cached data is accessed after 30 seconds, said 35th second, the cache returns the data and asynchronously refreshes the data.