25 Ekim 2024 Cuma

Data Migration Strategy

Bir çözüm burada
Step 1: Dual Writes
Her iki veri tabanına da yazılır

Step 2: Migrate Old Data
Eski veri, yeni veri tabanına taşınır.

Step 3: Changing Read Paths
Artık yeni veri tabanından okuma yapılabilir

Step 4: Changing Write Paths
Eski veri tabanına yazma işlevi söndürülür

17 Ekim 2024 Perşembe

Law Of Demeter - A'nın C'ye Erişimi Yönetmesi

Giriş
Bu yöntemde çoğunlukla A nesnesine C'ye erişim için yeni metodlar ekleniyor. Açıklaması şöyle
Instead, give clients restricted access to operations on the collection through messages that you implement. (Smalltalk Best Practice Patterns, Kent Beck)

Instead, offer methods that provide limited, meaningful access to the information in the collections. (Implementation Patterns, Kent Beck)
Örnek
Şu kod yerine
a.getB().getItems()
Şöyle yaparız.
class A {
  private B b;

  void doSomething() {
    b.update();
  }

  Items getItems() {
    return b.getItems();
  }
}
Şöyle kodlarız.
a.getItems()
Örnek 
Reservation->Show->Rows->Seats sınıfların erişmek yerine
int selectedRow =...;
int selectedSeat = ...;
if (show.getRows().get(selectedRow).getSeats().get(selectedSeat)
 .getReservationStatus()) {
{...}
Şöyle yaparız.
int selectedRow = ...;
int selectedSeat = ...;
if (show.isSeatReserved(selectedRow, selectedSeat)) {...}
Örnek
Şöyle yaparız
public class Order {
  private final MutableList<LineItem> lineItems = Lists.mutable.empty();

  public Order addLineItem(String name, double value) {
    this.lineItems.add(new LineItem(name, value));
    return this;
  }

  public void forEachLineItem(Procedure<LineItem> procedure) {
    this.lineItems.forEach(procedure);
  }

  public int totalLineItemCount() {
    return this.lineItems.size();
  }

  public int countOfLineItem(String name) {
    return this.lineItems.count(lineItem -> lineItem.name().equals(name));
  }

  public double totalOrderValue() {
    return this.lineItems.sumOfDouble(LineItem::value);
  }
}

public record LineItem(String name, double value) {}
Bu kodu synchronized hale getirmek istersek karşımızı bir başka problem çıkıyor. 
1. Yeni bir SynchronizedOrder yazmak
2. lineItems nesnesini synchronized  hale getirmek
3. CopyOnWrite yöntemini kullanmak

Bu durumda kod şöyle olur
public record Order(ImmutableBag<LineItem> lineItems) {
  public Order() {
    this(Bags.immutable.empty());
  }

  public Order addLineItem(String name, double value) {
    return new Order(lineItems.newWith(new LineItem(name, value)));
  }

  public double totalOrderValue() {
    return this.lineItems.sumOfDouble(LineItem::value);
  }
}

public record LineItem(String name, double value) {}




15 Ekim 2024 Salı

AWS Athena

Ad-hoc queries on S3 with Some - CTAS. 

Açıklaması şöyle. Burada detaylı bir örnek var
AWS Athena enabled the option to analyze the unstructured, semi-structured, and structured data stored in Amazon S3 using simple SQL queries. In addition, query results can be saved and used for further reference.

11 Ekim 2024 Cuma

Table-Augmented Generation - TAG

RAG vs TAG
Açıklaması şöyle
Current AI methods for querying databases, such as Text2SQL and Retrieval-Augmented Generation (RAG), fall significantly short. These models are limited by their design, either only interpreting natural language as SQL queries or relying on simple lookups that fail to capture the complexity of real-world questions.

Why does this matter? Using Natural Language to query SQL databases is the new norm ever since LLMs started capturing the limelight! Businesses today are drowning in data but starving for insights. The inability of existing methods to effectively leverage both AI’s semantic reasoning and databases’ computational power is a major bottleneck in making data…

Azure Cosmos Veri Tabanı

Giriş
Açıklaması şöyle
Azure Cosmos DB is a multi-model database service that supports various data models, including key-value, column-family, document, and graph.
Graph model
Açıklaması şöyle
Its graph model is based on the Gremlin API, which allows Cosmos DB to store and query graph data. While Cosmos DB can handle graph data, it is not a native graph database like Neo4j.
Graph Query Language
Açıklaması şöyle
Cosmos DB uses the Gremlin query language for graph operations. While Gremlin is also a graph traversal language, it is more general and less expressive compared to Cypher, especially for complex graph queries. Neo4j’s Cypher language often allows for simpler, more readable graph queries.
ACID Transactions
Açıklaması şöyle
Cosmos DB offers multi-model ACID transactions at the partition level but may not provide the same level of fine-grained ACID transaction support for graph-specific operations as Neo4j does, especially when transactions span multiple partitions.


4 Ekim 2024 Cuma

AWS RDS - Relational Database Service

Giriş
MySql, PostgreSQL gibi ilişkisel veri tabanı sağlar. Açıklaması şöyle
Amazon RDS is a web service that makes it easier to set up, operate, and scale a relational database in the cloud.
Açıklaması şöyle
Amazon RDS is one of the most basic AWS database services, used mainly for offloading your database management operations to a platform. Therefore, it is used for small or medium enterprises where the data volume is limited, and the functionalities required for company operations are not too complex.

Amazon RDS supports database engines such as MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server. It comes with workflows to secure your RDS instance using SSH and offers a straightforward cloud console for connecting.
Benefits
  1. Amazon RDS is the most inexpensive service, thanks to its ease of usage and lack of complexity.
  2. It is highly scalable and allows you to scale up to 32 vCPUs and 244 Gb of RAM.
  3. This service is also easy to use and pretty fast.
Desteklenen ilişkisel veri tabanları şöyle
1. Aurora
2. Postgres SQL
3. MySQL Server
4. SQL Server
5. Oracle
6. Maria DB
RDS Özellikleri
Bunlar şöyle
- Multiple Availability Zones
- Optional Read Replicas
- Automatic Instance Backups
Replication - Postgres
Açıklaması şöyle
With RDS you can choose up to two replicas located in separate availability zones, providing one primary instance (writer) and the other two stand-by (reader) instances.

The communication between the primary and the stand-by instances is done synchronously to guarantee that no data is lost.

You have a specific reader endpoint that can help with the read latency, while directing the writes to the primary. If the primary is no longer fit to receive writes, a failover will take place and one of the stand-by instances will be promoted as the new primary within 35–60 seconds. During this time attempts to write are expected to fail.

By choosing this approach,you can achieve the redundancy needed, with expected uptimes of 99.95%.

Scaling RDS
Bunlar şöyle
- Scale Manually 
- Scale Automatically
Performance Insights
AWS RDS tarafından sağlanır, en fazla performans kullanan 10 tane SQL cümlesini gösterir.

Örnek - PostgreSQL
IP adresi "Publicly accessible" true olmalı. Inbound rule çalışmıyor. Silip tekrar yaratmak gerekebilir. Bir video burada

Örnek - MySQL
Şeklen şöyle. IP adresi "Publicly accessible" true olmalı. 

10 Ocak 2024 Çarşamba

Google Cloud - gcloud Kurulum

Windows
GoogleCloudSDKInstaller.exe ile kuruluyor

Google Cloud - gcloud components seçeneği

components install
Örnek
Şöyle yaparız
gcloud components install gke-gcloud-auth-plugin

Google Cloud - gcloud projects seçeneği

projects undelete
Örnek
Şöyle yaparız
gcloud projects undelete example-foo-bar-1
projects list
Projeleri listeler
Örnek
Şöyle yaparız
gcloud projects list
Proje seçmek veya değiştirmek için şöyle yaparız
gcloud config set project <project_id>

Google Cloud - gcloud auth seçeneği

Giriş
"gcloud auth" komutunun açıklaması şöyle
manage oauth2 credentials for the Google Cloud CLI
auth activate-service-account seçeneği
activate-service-account yazısına taşıdım

auth configure docker seçeneği
Örnek
Şöyle yaparız
stage("Docker Image") {
  withEnv(['http_proxy=', 'https_proxy=']) {
    sh "gcloud auth activate-service-account --key-file=.gcp/product-foo.json"
sh "gcloud auth configure-docker" // Build image and upload to GCP sh "docker build -t us.gcr.io/product-foo/oracle-db docker/uc-custdb/oracle-db"
sh "docker push us.gcr.io/product-foo/oracle-db:latest"
// Build image and upload to GCP int basetime = currentBuild.startTimeInMillis / 1000 sh "docker build -t us.gcr.io/product-foo/uc-farm docker/uc-farm"
sh "docker tag us.gcr.io/product-foo/uc-farm:latest us.gcr.io/product-foo/uc-farm:$basetime"
sh "docker push us.gcr.io/product-foo/uc-farm:$basetime"
sh "docker push us.gcr.io/product-foo/uc-farm:latest"
} }
auth list seçeneği
Hesap ismlerini gösterir. Etkin olunan hesap ismi yanında * karakteri vardır
Örnek
Şöyle yaparız
gcloud auth list
      Credentialed Accounts
ACTIVE  ACCOUNT
*       myname.mylastname@foo.com

To set the active account, run:
    $ gcloud config set account `ACCOUNT`
auth login seçeneği
Yeni bir hesap ile login olmak içindir
Örnek
Şöyle yaparız
gcloud auth login myaccount@gmail.com
Örnek
Şöyle yaparız
gcloud init
gcloud auth application-default login

2 Ocak 2024 Salı

AWS Redshift - OLAP Datawarehouse İçindir

Giriş
Açıklaması şöyle. Yani AWS Aurora'dan daha gelişmiş.
Like Amazon Aurora, Amazon Redshift is used by large enterprises. However, Redshift is more complex, can handle more data, and is referred to as a data warehouse. This is because Redshift is built for OLAP (Online Analytical Processing).

Furthermore, Redshift can scale up to Petabytes of data and supports up to 60 user-defined databases per cluster. On the other hand, Aurora can only scale to terabytes and support up to 40. Besides this, the security and the maintenance of both the database services are pretty much the same.

A few use cases of Amazon Redshift are creating machine models for forecasting operations, optimizing your company's business intelligence, and increasing developer productivity.
Benefits
  1. Redshift possesses the highest scaling capabilities amongst the three options we've examined.
  2. Its performance is much faster and more durable.
  3. Amazon Redshift can also handle a more significant amount of data and analyze it within a shorter period.
Redshift vs PostgreSQL
Açıklaması şöyle. Altta PostgreSQL 8 kullanıyor.
I mean, as much as I love AWS services, setting up Redshift as our data warehouse was a mistake and Postgres would have been a much better alternative.

Let’s be honest, unless you have massive amounts of data, more than hundreds of To’s of data, all these fancy data warehouses like Redshift just aren’t worth the cost. Redshift isn’t open source, so you can’t have a complete mini-data stack on your local computer for testing purposes. Plus, Redshift, being built on top of Postgres 8, sometimes lacks the cool features that the newer releases of Postgres have.

I know Postgres is a transactional database, but I think it’s a solid first approach for a data warehouse. If you’re dealing with tables with less than 50 million rows and under 10 terabytes of data (which is the case for most startups), Postgres might outperform Redshift. And the best part is, you can have it up and running on your local computer, making it incredibly convenient for quick iterations.

Örnek
Açıklaması şöyle
Redshift supports some SQL functions and queries which would generally only be necessary with large data warehouse applications. For example, PERCENTILE_CONT computes a linear interpolation to return a percentile.
Şöyle yaparız
SELECT
    TOP 10 salesid,
    sum(pricepaid),
    percentile_cont(0.6) WITHIN GROUP (
        ORDER BY 
            salesid
    ),
    median (salesid)
FROM
    sales
GROUP BY
    salesid,
    pricepaid;