18 Şubat 2026 Çarşamba

Data Models

Giriş
Yazıyı (10 Data Models Every Data Engineer Must Know (Before They Break Production)) ilk olarak burada gördüm.

10. 10. Star Schema: The Legacy Workhorse (That Fails at Scale)
Açıklaması şöyle.
Star schemas are intuitive and analyst-friendly, but at scale they become a performance bottleneck, especially with massive fact tables, high-cardinality dimensions, and near-real-time workloads.
9. Snowflake Schema: Over-Engineered & Slow
Açıklaması şöyle.
Snowflake schemas optimize storage, not query performance. In modern analytics (cloud OLAP, dashboards, ad-hoc queries), compute is the bottleneck, not disk. Excessive normalization explodes join depth and kills latency.
8. Data Vault: The Enterprise Monster (When You Need Auditability)
Açıklaması şöyle.
Data Vault excels at auditability, lineage, and full historization, critical for regulated industries (banking, healthcare). But its multi-layer architecture makes it fundamentally unsuited for low-latency analytics.
7. Wide-Column Stores (Cassandra, Bigtable) for Time-Series Chaos
Açıklaması şöyle. 
Wide-column databases dominate high-velocity ingest (IoT, metrics, logs) where writes never stop. But they sacrifice query flexibility, no joins, limited filtering, and rigid access patterns. You win on writes, lose on exploration.
6. Graph Models (Neo4j, TigerGraph) for Hidden Relationships
Açıklaması şöyle.
When insight lives in relationships (fraud rings, social influence, network hops), relational joins collapse under recursive depth. Graph databases treat relationships as first-class citizens, making multi-hop traversals fast and natural.
5. Streaming Event Sourcing (Kafka + CDC)
Açıklaması şöyle.
Batch ETL is fundamentally incompatible with real-time systems. CDC turns database mutations into immutable events, enabling near-zero-latency pipelines, replayable state, and system-wide consistency across microservices.
4. Columnar Storage (Parquet, Delta Lake) for Cheap, Fast Analytics
Parquet bir örnek
Açıklaması şöyle.
Row-based databases are optimized for point lookups, not scans. Analytics workloads read a few columns across billions of rows, exactly what columnar storage is built for. The result: orders-of-magnitude faster queries at a fraction of the cost.
Örnek
Şöyle yaparız
CREATE TABLE sales_parquet (
    order_id BIGINT,
    region   STRING,
    amount   DECIMAL(10,2),
    order_ts TIMESTAMP
)
USING PARQUET
PARTITIONED BY (region, order_date);

SELECT
    region,
    SUM(amount) AS total_sales
FROM sales_parquet
WHERE order_date = '2025-12-25'
  AND region = 'US'
GROUP BY region;
Açıklaması şöyle. 
Why this is fast
- Only amount and region columns are read
- Only the order_date=2025-12-25 and US partitions are scanned
- All other files are skipped entirely
3. Multi-Model Hybrids (When SQL + NoSQL Collide)
Açıklaması şöyle. Burada veri tabanının JSONB sütunları desteklemesi önemli
Real-world data is rarely one shape. Modern apps mix relational facts, semi-structured JSON, and relationships. Multi-model databases let you query everything in one place, without forcing awkward ETL or duplicating data.
2. Reverse ETL (Operational Analytics) to Put Data Back in Apps

1. The Unified Serving Layer (The Future of Production Data)
One dataset. Many engines. Zero rewrites. Açıklaması şöyle
Modern data stacks fracture data across OLTP, OLAP, search, and streaming systems, creating sync lag and duplicated logic. A Unified Serving Layer uses one logical data layer (Iceberg/Hudi/Delta) with multiple access modes: SQL analytics, near-real-time reads, ML, and even graph/search workloads.



7 Temmuz 2025 Pazartesi

Hot Row Contention

Giriş
Açıklaması şöyle. Yani aynı satıra çok fazla istek gelmesi ve bu isteklerin mecburen beklemesi
At its core, hot row contention arises from how databases manage concurrent data modifications. In a typical relational database (like MySQL, PostgreSQL, or SQL Server), when a transaction needs to update a row, it acquires an exclusive lock on that row. This lock prevents other transactions from modifying the same row simultaneously, ensuring data integrity and consistency (the “I” and “C” in ACID).

When many concurrent transactions converge on the exact same row — our “hot row”— they are forced to queue up, waiting for the current lock holder to finish.
Bazı Çözümler
1. Append-Only Ledger Model: Prioritizing Writes
2. Internal Sharding of Hot Accounts: Divide and Conquer
3. (In-Memory) Buffers and Batching: Absorbing the Spikes
çıklaması şöyle. Yani aynı satıra çok fazla istek gelmesi ve bu isteklerin mecburen beklemesi
This technique involves intercepting incoming transactions and temporarily holding them in a fast in-memory buffer or a dedicated caching system (like Redis) instead of writing each one directly to the main database. These buffered transactions are then flushed to the persistent database in larger, consolidated batches.
4. Event-Driven Architecture (CQRS): Ultimate Separation of Concerns
Açıklaması şöyle. Yani aynı satıra çok fazla istek gelmesi ve bu isteklerin mecburen beklemesi
This architectural pattern addresses contention by making the write path highly optimized for appending events, which is inherently less contentious. Read paths query dedicated data models that don’t compete with write operations. This separation allows write and read workloads to be scaled independently to a very high degree.
5. Before overhauling your architecture — Optimistic Locking (OCC)

12 Haziran 2025 Perşembe

TCP Handshake - Maximum Segment Size (MSS)

Maximum Segment Size (MSS)
MSS iki taraf arasındaki bağlantıda, bir IP paketine sığdırılabilecek en büyük TCP paketi anlamına gelir. MSS sadece TCP'de vardır. UDP'de yoktur. Açıklaması şöyle
.. and then there's TCP MSS, which helps in case of TCP, but of course not with UDP nor ICMP.

Using the MSS field in the TCP header (only in the SYN and SYN-ACK packets of the initial 3-way handshake), hosts can signal to their peers how large a TCP payload is acceptable to receive.

TCP MSS negotiation can be a blessing, but also a nuisance, as it helps to hide MTU problems until something with large UDP packets comes along and fails at the "all hosts on the common L2 segment need to use the same MTU" criterium
Her iki taraf ta MSS değerini Bildirir ve Küçük Olanı Kullanılır
Açıklaması şöyle 
both hosts announce the MSS independently in the SYN and the SYN/ACK packets and the smaller of the two is chosen for all segments exchanged during the entire duration of the connection.
Otomatik MSS Hesaplama
Dynamic Path MTU Discovery özelliği etkinse, IP seviyesinde en büyük MTU değeri biliniyor demektir. MSS bu MTU değeri kullanılarak hesaplanır.

Örnek
Örneğin Maximum Transmission Unit 1500 byte kabul edilirse, 20 byte IP ve 20 byte TCP zarflaması çıkarılırsa MSS 1460 byte olur.
MSS = 1500 - 20 - 20
MSS = 1460 bytes of TCP data

Kendi arayüzlerimizin MTU değerlerini öğrenmek için şöyle yaparız

MSS Değerini Kodla Atamak
Şöyle yaparız
int mss = 1200; // Desired MSS value
// Set MSS for the socket
if (setsockopt(sockfd, IPPROTO_TCP, TCP_MAXSEG, &mss, sizeof(mss)) < 0) {
  perror("Setting MSS failed");
  close(sockfd);
  return 1;
}
Kodla Atanırsa MSS Her Zaman MTU'da Küçük Olmalı
Açıklaması şöyle. MSS değeri MTU'dan küçük olmalı. Eğer MTU'dan büyük TCP paketleri kullanılırsak çok fazla fragmentation olur ve verim düşer.
TCP itself uses the MSS to determine the segment size, which should fit the MTU, but I have seen people do stupid things like set the MSS to much larger than the MTU (thinking it will increase the speed, but the effect is the opposite). That forces IP to create fragments prior to sending.
MSS Olarak 1460
Açıklaması şöyle. 1460 eskidendi artık bu değer 1448 oldu
The value 1460 was only common in the late 20th century because Ethernet was common, Ethernet frames have a standard 1500 byte payload capacity (which becomes the IP MTU), and IP and TCP headers were both 20 bytes long in those days. However, around the turn of the 21st century, networks had gotten fast enough that TCP needed to add the 12-byte TCP Timestamp option to protect against wrapped TCP sequence numbers, so typical TCP headers are 32 bytes long now, resulting in a typical 1448 byte TCP MSS on a standard 1500 byte MTU Ethernet network.
Diğer MSS Değerleri
Açıklaması şöyle
On networks with higher path MTUs than 1500 (example: data center networks that use nonstandard 6k or 9k jumbo Ethernet frames), the MSS will be larger. On networks with lower path MTUs than 1500 (example: PPPoE, common on DSL, has 8 additional bytes of overhead for an MTU of 1492), the MSS will be lower.
En Büyük MSS Değeri
Açıklaması şöyle. IP zarfındaki alanını alabileceği en büyük değer 65,535.
The Total Length field in the IP header is 16 bit and thus an IP packet (and therefore TCP packet) can not be larger than 65535 bytes. The TCP payload is actually even smaller since you have to subtract the TCP header from maximum packet size of the IP packet. 
IP zarfındaki bu değer kullanılan TCP zarfına göre biraz daha küçülüp 
65,495 - 40 byte daha küçük
veya
65,483 - 52 byte daha küçük olabiliyor.  Açıklaması şöyle
IPv4's max datagram size (the largest MTU it can fill up) is 2^16 bytes (i.e. 64KiB or 65535 bytes). So the max TCP MSS by today's standards is 65,483 bytes with TCP timestamps on, or 65,495 with them disabled.

3 Haziran 2025 Salı

ISO 9001

Giriş
Bir çok firma ISO 9001:2000 belgesi alıyor. 

ISO 9001 sadece yapılan işi tarif eden bir süreç olmasını gerektiriyor. İş için kullanılan araçlarla ilgilenmiyor. CMMI ile kıyaslanınca daha yüzeysel.

Bir başka açıklama şöyle
While ISO 9001 and the CMMI for Development provide road maps of good quality practice, the IEEE software and systems engineering standards provide more detailed "how-to" information and guidance.

18 Nisan 2025 Cuma

Modular Multiplicative Inverse

Giriş
Açıklaması şöyle
1. M seçimi : Pick a modulus M, which should be one more than the maximum value the field can hold.
In this case, since the max is 255, we choose M = 256.
2. P seçimi : Pick a number P that’s coprime with M (i.e., they share no common factors except 1).
Let’s go with P = 9.
3. Q seçimi : Now, a number Q such that (P × Q) mod M = 1. Q = 57
Örnek
encoded_value = (original_value * P) % M
original_value = (encoded_value * Q) % M

195 değeri için
219= (195 * 9) % 256
195 = (219 * 57) % 256

7 Ocak 2025 Salı

aws ce - Cost Expolorer Seçeneği

Örnek
Şöyle yaparız
aws ce get-reservation-utilization
Açıklaması şöyle
In AWS, use Cost Explorer to view your reserved instance utilization and identify opportunities for optimization:

10 Aralık 2024 Salı

RocksDB - Embedded Database

WAL
Açıklaması şöyle
By default, RocksDB stores all the writes in a WAL along with the memtable. We turned the WAL off given that our use case was self-healing in nature and no data could have been lost.
multiGet
Açıklaması şöyle
RocksDB gives various ways to read data from the DB. You could either fire a get() command or a multiGet() command. multiGet() is more efficient than multiple get() calls in a loop for several reasons such as lesser thread contention on filter/index cache, lesser number of internal method calls, and better parallelization on the IO for different data blocks.
Autocloseable
Açıklaması şöyle
Every class of RocksDB implements an Autocloseable either directly or indirectly. You need to call the close() on RocksDB’s java objects explicitly (or use try-with-resources) whenever you are done using them to release the actual memory held by RocksDB’s C++ objects. Failure to do so can lead to memory leaks.