Schema etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster
Schema etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

29 Mart 2021 Pazartesi

OLTP vs OLAP

Giriş
Veri tabanları iki farklı iş için tasarlanır. Açıklaması şöyle
As you probably know, databases are divided into two types: Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP).
Bu ikisinin farklılıklar şöyle
OLTP and OLAP describe two very different data processing methods, and, therefore, they have different database requirements.

Characteristics OLTP OLAP
Data queried in one request Small (a few rows) Large
Real-time update         Yes         No
Transactions         Yes         No
Concurrency         High         Low
Query pattern         Similar Varies a lot
Özetlersek açıklaması şöyle
OLTP systems are born to handle transaction data: they enable the real-time execution of large transactions with many concurrent queries and require fast response times. On the other hand, OLAP requires running complex queries on large numbers of records. OLAP systems have high throughput but do not offer low query latencies or high queries per second (QPS) like the OLTP systems.
1. OLTP
Açıklaması şöyle
Years ago, databases made little distinction between OLTP and OLAP. Instead, one database processed both types of requests. However, as the data volume grew, it became difficult to process two types of workloads in a single database. Most significantly, the different workload types interfered with each other.

Thus, to meet the special needs of OLAP workloads, people designed a separate database that only processed OLAP workloads. They exported data from OLTP databases to OLAP databases, and processed the OLAP workloads there. Separating the OLTP and OLAP workloads resolved the conflicts between the two workloads, but it also introduced external data replication. During the replication, it was hard to ensure that data was consistent and in real time.
2. OLAP - Eski/Arşiv Veri
OLAP eski ve arşiv veriyle uğraşır. Transaction miktarı azdır ve daha çok sorgu yapar. Sorgular karmaşıktır ve aggregation miktarı çoktur.

OLAP ismindeki online eskiden kalan bir kelime. Açıklaması şöyle
It is simply a remnant of olden times, when it was used in contrast to batch processing. "Online" here means "interactive", that is, requests to the database are processed as they come and responses are given more or less immediately, or at least as soon as they are available. Batch processing would collect requests into, well, batches, and execute them on schedule; responses would be given after the entire batch execution (e.g. next morning).
Örnek
Bir örnek şöyle
PostGre is usually used in OLTP scenarios with a bit of OLAP.
That is fine as long as you understand the difference between them.
For example, if a public-facing endpoint contains a very complex query that loads a lot of data and does complex operations, it’s probably a good indicator that it should be treated as OLAP and that endpoint refactored to adopt a different strategy.
Örnek
Açıklaması şöyle
Redshift supports some SQL functions and queries which would generally only be necessary with large data warehouse applications. For example, PERCENTILE_CONT computes a linear interpolation to return a percentile.
Şöyle yaparız
SELECT
    TOP 10 salesid,
    sum(pricepaid),
    percentile_cont(0.6) WITHIN GROUP (
        ORDER BY 
            salesid
    ),
    median (salesid)
FROM
    sales
GROUP BY
    salesid,
    pricepaid;
Örnek
OLAP veri tabanlarında one-to-many olması beklenen ilişkiler bile çok fazla normalize edilmiş olabilir.
Normalization Nedir yazısına bakabilirsiniz.

Aşağıdaki örnekte User, Department isimli iki farlı tablo var. Daha sonra bu iki farklı tabloyu birleştiren UserDepartmentTable tablosu var.

UsersTable
UserID  FirstName   LastName
234     John        Doe
516     Jane        Doe
123     Foo         Bar
DepartmentsTable
DepartmentID   Name
1              Sales
2              HR
3              IT
UserDepartmentTable
UserDepartmentID   UserID   Department
1                  234      2
2                  516      2
3                  123      1
RequestTable
RequestID   UserID   <...>
1           516      blah
2           516      blah
3           234      blah


11 Mart 2015 Çarşamba

Veri Tabanı Şeması Tasarımı

Giriş
Veri tabanı şeması (DB Schema) hazırlamak için aldığım notlar aşağıda.

Referential Integrity
Bazı projelerde hız için referential integrity 'den var geçiliyor ve hiç foreign key kullanılmıyor. Bu durumda ilişkileri uygulamanın yönetmesi gerekiyor.

Surrogate Key
Açıklaması şöyle. Yani doğal birincil anahtardan (primary key) emin değilsek veya bu değer çok uzunsa, doğal birincil anahtar yerine kullanılabilir
A surrogate key is a form of primary key. There are two competing ideas among database administrators about how primary keys should be structured.

One philosophy is that of "natural keys". This philosophy says that when your data already has an unique identifier, use it.

The other philosophy is that of "surrogate keys". Adherents to this philosophy believe that natural keys are often either not as unique and immutable as you assume them to be, or much longer than required. So you should use an additional ID column as primary key for each table which contains auto-generated values which are guaranteed to be unique, also known as a surrogate key.

Which of those philosophies is correct is besides the point. But fact is that a column which isn't the primary key isn't a surrogate key...
Oracle'da sequence ile yapılır. Örneğin tüm alanlar primary key ise kullanılabilir.

Örnek
Kötü bir surrogate key örneği şöyle
My workplace's database has a pattern that I've not seen before. Every column that is intended to be a key, whether primary or foreign, ends in _SK. This is shorthand for "surrogate key". It appears to be an informal way to tell the developer that said column is safe to use in joins and won't have any type mismatches or unexpected behavior. For example, our table of dates has many columns that represent the date. DATE_PLAIN is the typical SQL DATE variable that shows data ISO style (e.g. 2022-10-30), DATE_VENDOR puts the date in the style that our vendor uses (e.g. 44300... their epoch is weird), and DATE_SK is always an INT that uses the familiar 20221030 format. By reading these column names, the pattern immediately tells the developer that DATE_SK is the one that you want for joins. He who uses either of the other two options in joins will run in to type mismatches and trouble (I learned that the hard way, e.g. our vendor inconsistently stored their dates as INT and DECIMAL).

This strikes me as a remarkably good idea, which raises the question of why I've not seen it before. Is it a known anti-pattern?

Weak Entity
Varlığı bir başka nesneye bağlı olan satırdır. Tanımı şöyledir.
1.  It is existence-dependent on another entity, i.e., 
    it cannot exist without the entity with which it has a relationship.

2.  It inherits at least part of it's primary key from the entity to which 
    it is related. 


    i.e. -> A weak entity's primary key must be a composite key that includes 
       the primary key of the entity on which it is existence-dependent.
Örneğin OrderItem , Order olmadan mevcut olamaz.

OLAP
OLTP vs OLAP yazısına taşıdım