9 Nisan 2026 Perşembe

Distributed Lock Source of Truth Olabilir mi?

Giriş
Soru şöyle
You have a distributed lock to prevent two users from booking the same hotel room.

Lock expires in 5 seconds. Your DB write takes 6 seconds under load.

Two users got confirmed bookins for the same room. How? What is the process to fix this issue.
Aslında şuna dikkat etmek lazım.
Lock ≠ correctness.
If your DB allows duplicates, your system will eventually produce them.
The real fix lives in atomic writes + constraints, not just distributed locks.
Yani lock aslında işlemi en baştan yapmamak için. Eğer iki işlem başlarsa bir tanesi başarısız olmalı.

Açıklaması şöyle
This is a correctness question. And at the Senior to Principal level, this is exactly what interviewers are testing for: do you understand the difference between coordination and actual data integrity?

If you are preparing for system design interviews right now, this is the kind of failure-mode thinking that matters a lot in strong loops.

Now, let us break this one down properly.

[1] How did both users get confirmed bookings?

The timeline usually looks like this:

- User A acquires the distributed lock for Room 101
- Lock lease is valid for 5 seconds
- User A starts the DB write to mark the room as booked
- Under load, that DB write takes 6 seconds
- At second 5, the lock expires before User A finishes
- User B now acquires the same lock because the lock service thinks it is free
- User B also starts a booking write
- Both flows eventually return success, and both users get confirmations

So what actually failed here? The system assumed the distributed lock was the source of truth. A lease-based lock only gives you temporary coordination.

If the critical section takes longer than the lease, another actor can enter while the first one is still working.

I cover fundamentals like locking, transactions, consistency, retries, idempotency, and failure handling in much more depth inside my System Design Fundamentals Guide for Senior to Principal engineers.

You can check it out here: puneetpatwari.in

[2] The deeper bug is usually not the lock itself

A lot of candidates stop at “increase the lock timeout.” That is not the real fix. The deeper issue is that your final correctness guarantee is missing at the database layer.

Because even if the lock expires, the database should still protect the invariant: “Only one valid booking can exist for this room for this date range.”

If both writes succeeded, it usually means one of these is true:
- no proper uniqueness or exclusion constraint existed
- booking availability was checked outside the final transaction
- writes were not serialized with row-level locking
- confirmation was sent before durable conflict detection finished

The lock helped reduce contention.
But the DB failed to enforce correctness.

[3] What is the right process to fix it

I would fix this in 4 steps.

1. Reconstruct the exact race
Check lock acquire time, lock expiry time, DB commit time, and confirmation event time for both users.

2. Move the invariant to the database

For hotel booking, correctness should be enforced with transactional logic such as:
- row-level locking on the inventory row
- atomic reserve-if-available update
- or exclusion/uniqueness constraints depending on data model

3. Treat the distributed lock as an optimization.
It can reduce hot contention, but it should never be the only thing preventing double booking.

4. Fix the confirmation path
Only send “booking confirmed” after the transaction commits successfully and conflict checks have passed.

5] If you still want to use distributed locks, do it safely

If a distributed lock stays in the design, I would add:
- lease renewal or heartbeats for long critical sections
- fencing tokens so stale lock holders cannot keep writing
- alerts when p99 DB latency gets too close to lock TTL
- idempotency keys so retries do not create duplicate booking flows

A good rule of thumb is simple: If your lock TTL is 5 seconds and your write path can take 6 seconds under load, your design is already telling you it is unsafe.

8 Nisan 2026 Çarşamba

Correlation Id vs Trace Id

Giriş
Açıklaması şöyle
I often noticed that some developers do not really understand the difference between traceId and correlationId. I saw this so often that I decided to write this post.

At first they look similar.
Both are IDs.
Both appear in logs.
Both help during incidents.

But they answer different questions.

traceId answers:
"How did this specific execution path go through the system?"

correlationId answers:
"Which logs and events belong to the same business story?"

That difference becomes obvious once async enters the picture 

Example:

A user places an order.

The system does this:

1. Order Service creates the order
2. Payment Service charges the card
3. Kafka event is published
4. Billing Worker creates invoice
5. Email Service sends confirmation

Now imagine the logs:

Order created
correlationId=ORDER-8472
traceId=T1

Payment charged
correlationId=ORDER-8472
traceId=T1

Billing started from Kafka consumer
correlationId=ORDER-8472
traceId=T2

Email sending failed
correlationId=ORDER-8472
traceId=T3

This is the key point 

One correlationId
Multiple traceIds

Why?

Because the business flow is one.
But the technical executions are split.

The HTTP request is one execution.
Kafka consumer is another.
Retry later can be another.
Email worker can be another too.

So:

correlationId helps you reconstruct the whole story.
traceId helps you inspect one exact path in detail.

That is why using correlationId instead of tracing is a mistake.
You may connect logs, but you still do not get spans, timing hierarchy, or where exactly latency exploded.

And using only traceId is also not enough.
In distributed async systems, tracing often shows fragments. Correlation is what lets you stitch them back together 🧩

How I usually use them during incidents:

1. Start with correlationId
Find everything related to the same order, job, or user flow.

2. Then drill into traceId
Open the exact failing execution and inspect where it slowed down or broke.

Simple version:

traceId = the path
correlationId = the story

Have you seen teams mix these two and then realize the difference only during a production incident? 

Fencing Tokens

Giriş
Açıklaması şöyle
Distributed systems concept: Fencing Tokens
You designed a fancy distributed locking algorithm just to find that an old primary is able to overwrite data!

The problem:
- Node A holds the lock, and is doing some work.
- Node A gets disconnected/unresponsive/crashes, and resume execution after its lease expires ("true" time)
- Node B, in the meantime, acquired the lock and wrote some data.
- Node A resume executions, thinking their lock is still valid
- Node A overwrites the data written by Node B, even tho it doesn't have the lock anymore.

That's were fencing token comes in: when a node acquires the lock, it gets a token with a monotonically increasing number. When the node tries to write data, it must include the token. If the token is outdated (i.e., lower than the current token), the write is rejected, preventing stale nodes from overwriting newer data.

Fencing tokens are used in a variety of systems, like etcd

The big takeaway is that you can't rely on just the client to know whether they are in their right. The target resource must have a gating mechanism to verify that the request makes sense.


JSON Web Token - JWT ve Hemen Logout

Giriş
Eğer tamamen stateless çalışıyorsak hemen logout mümkün değil. Ancak sunucu tarafına biraz state eklersek bazı çözümler elde ederiz.

1. Short-lived access tokens
- Keep access tokens valid for 5 to 15 minutes
- This limits the damage window
- Very common and simple

2. Refresh token revocation
- Store refresh tokens in DB or Redis
- On logout, delete or mark them revoked
- This is the most common real-world pattern

3. Token blacklist / denylist
- Store revoked JWT IDs or token hashes until they expire
- Check this list on every request
- Useful for high-risk logout or compromised accounts
- But now auth is no longer fully stateless

4. Token versioning
- Store a tokenVersion or sessionVersion on the user record
- Include that version in the JWT
- On logout-all-devices or password reset, increment the version
- Old tokens stop working once the version mismatches

26 Mart 2026 Perşembe

Yazılım Mimarisi - Idempotency ve Phantom Write

Giriş
Açıklaması şöyle
You typically implement idempotency like this:
  1. Check if request already processed (via key / timestamp / PK)
  2. If not → write data
  3. If yes → skip
Eğer check işlemi atomic değilse problem oluyor.

Failure Mode 1: The TTL Expiry Trap
Açıklaması şöyle
The most common idempotency implementation stores a request key with a time-to-live (TTL) — typically 24 or 48 hours. The assumption is that any duplicate will arrive within that window. In practice, this assumption frequently breaks.
Açıklaması şöyle
The fix: Never use TTL-only idempotency for operations with unbounded retry windows. Instead, use a database-backed idempotency store with a three-state model (IN_PROGRESS, COMPLETED, FAILED) where the expires_at column drives a cleanup job for storage management — not correctness. The cleanup window should be set significantly longer than your worst-case replay window (7 days minimum for Kafka-based systems).
Failure Mode 2: The Partial Execution Ghost
Açıklaması şöyle
A request arrives, the system writes the idempotency key with status IN_PROGRESS, begins processing, writes half the data, and crashes — JVM OOM, container eviction, network partition. The idempotency key is now in IN_PROGRESS state. When the retry arrives, the system faces an impossible decision: did the original operation complete or not?
Açıklaması şöyle
The fix: Wrap both the business logic and the idempotency state transition in a single database transaction. If the transaction rolls back, both the business data and the idempotency status roll back together. For stale IN_PROGRESS keys (where the original processor is likely dead), use a configurable timeout threshold to reclaim and re-execute safely.
Failure Mode 3: The Concurrent Check Race
Burada check koşulu atomic değil. Açıklaması şöyle
The fix: Use INSERT ... ON CONFLICT DO NOTHING (PostgreSQL 9.5+) to make the check-and-claim atomic. If the RETURNING clause yields no rows, the key already existed — fetch its status with SELECT ... FOR UPDATE. For non-blocking behavior, SELECT ... FOR UPDATE SKIP LOCKED lets the second instance return 409 Conflict immediately rather than waiting.
Failure Mode 4: The Layer Mismatch
Açıklaması şöyle
The fix: Propagate a correlation ID from the original request as a Kafka header, and have every downstream consumer enforce its own idempotency barrier using that ID as the deduplication key.
Spring Boot + SQL Server
Kod şöyle. Burada 
Partial Execution tek transaction ile çözülüyor.
The Concurrent Check Race, DuplicateKeyException ile çözülüyor. Eğer Postgres kullanıyor olsaydık exception yerine SQL'in kaç tane satırı değiştirdiğine bakacaktır
- The Layer Mismatch sorunu outbox pattern ile çözülüyor.
@Service
@RequiredArgsConstructor
public class IdempotentService {
  private final JdbcTemplate jdbc;
  public record Response(String result) {}

  @Transactional
  public Response handleRequest(String idempotencyKey, String payload) {
    try {
      // Attempt barrier insert (atomic)
      // SQL Server:
      // INSERT INTO idempotency_table (idempotency_key, status)
      // VALUES (?, 'IN_PROGRESS')
      jdbc.update(
        "INSERT INTO idempotency_table (idempotency_key, status) VALUES (?, 'IN_PROGRESS')",
        idempotencyKey
      );

      // First request owns the key → perform business logic
      String result = doBusinessLogic(payload);

      // Insert into outbox for async processing
      // SQL Server:
      // INSERT INTO outbox_table (idempotency_key, payload) VALUES (?, ?)
      jdbc.update(
        "INSERT INTO outbox_table (idempotency_key, payload) VALUES (?, ?)",
        idempotencyKey, result
      );

      // Mark barrier as completed and store result
      // SQL Server:
      // UPDATE idempotency_table SET status='COMPLETED', response=? WHERE idempotency_key=?
      jdbc.update(
        "UPDATE idempotency_table SET status='COMPLETED', response=? WHERE idempotency_key=?",
        result, idempotencyKey
      );
      return new Response(result);
     } catch (DuplicateKeyException ex) {
      // Barrier row already exists → handle duplicate
       // SQL Server:
       // SELECT * FROM idempotency_table WITH (UPDLOCK, ROWLOCK) WHERE idempotency_key=?
       IdempotencyRecord record = jdbc.queryForObject(
         "SELECT status, response FROM idempotency_table WITH (UPDLOCK, ROWLOCK) WHERE idempotency_key=?",
         (rs, rowNum) -> new IdempotencyRecord(rs.getString("status"), rs.getString("response")),
         idempotencyKey
       );

       switch (record.status) {
         case "COMPLETED":
           // Return cached result
           return new Response(record.response);
         case "IN_PROGRESS":
           // Someone else is working → can wait or throw 409
           throw new IllegalStateException("Request is already in progress");
         case "FAILED":
           // Previous attempt failed → allow retry
           throw new IllegalStateException("Previous attempt failed, safe to retry");
         default:
           throw new IllegalStateException("Unknown barrier state: " + record.status);
         }
      }
  }

  private String doBusinessLogic(String payload) {
    // your domain logic here
    return "processed:" + payload;
  }

  private static class IdempotencyRecord {
      final String status;
      final String response;
      IdempotencyRecord(String status, String response) {
        this.status = status;
        this.response = response;
      }
  }
}
Eğer hem SQL Server hem de Postgres için çalışsın istiyorsak şöyle yaparızz
    
    
@Service
@RequiredArgsConstructor
public class IdempotentService {

    private final JdbcTemplate jdbc;

    public record Response(String result) {}

    @Transactional
    public Response handleRequest(String idempotencyKey, String payload) {
        boolean isWinner = false;

        try {
            // --------------------------
            // Attempt atomic barrier insert
            // --------------------------
            // Postgres:
            // INSERT INTO idempotency_table (idempotency_key, status)
            // VALUES (?, 'IN_PROGRESS')
            // ON CONFLICT DO NOTHING
            //
            // SQL Server:
            // INSERT INTO idempotency_table (idempotency_key, status)
            // VALUES (?, 'IN_PROGRESS')
            int rows = jdbc.update(
                    "INSERT INTO idempotency_table (idempotency_key, status) VALUES (?, 'IN_PROGRESS')",
                    idempotencyKey
            );

            // Postgres: rows == 1 → winner
            // SQL Server: INSERT succeeded → winner
            isWinner = rows == 1;

        } catch (DuplicateKeyException ex) {
            // SQL Server only: duplicate → loser
            isWinner = false;
        }

        if (isWinner) {
            // --------------------------
            // Winner executes business logic
            // --------------------------
            String result = doBusinessLogic(payload);

            // Insert into outbox (side effect)
            // INSERT INTO outbox_table (idempotency_key, payload) VALUES (?, ?)
            jdbc.update(
                    "INSERT INTO outbox_table (idempotency_key, payload) VALUES (?, ?)",
                    idempotencyKey, result
            );

            // Mark barrier as completed + store response
            // UPDATE idempotency_table SET status='COMPLETED', response=? WHERE idempotency_key=?
            jdbc.update(
                    "UPDATE idempotency_table SET status='COMPLETED', response=? WHERE idempotency_key=?",
                    result, idempotencyKey
            );

            return new Response(result);
        } else {
            // --------------------------
            // Loser reads existing row safely
            // --------------------------
            // SQL Server: SELECT ... WITH (UPDLOCK, ROWLOCK) WHERE idempotency_key=?
            // Postgres: SELECT * FROM idempotency_table WHERE idempotency_key=?
            IdempotencyRecord record = jdbc.queryForObject(
                    "SELECT status, response FROM idempotency_table " +
                            (isPostgres() ? "" : "WITH (UPDLOCK, ROWLOCK) ") +
                            "WHERE idempotency_key=?",
                    (rs, rowNum) -> new IdempotencyRecord(rs.getString("status"), rs.getString("response")),
                    idempotencyKey
            );

            switch (record.status) {
                case "COMPLETED":
                    return new Response(record.response);
                case "IN_PROGRESS":
                    throw new IllegalStateException("Request already in progress");
                case "FAILED":
                    throw new IllegalStateException("Previous attempt failed, safe to retry");
                default:
                    throw new IllegalStateException("Unknown barrier state: " + record.status);
            }
        }
    }

    private boolean isPostgres() {
        // Detect DB type from DataSource or JdbcTemplate if needed
        return true; // placeholder, implement detection
    }

    private String doBusinessLogic(String payload) {
        return "processed:" + payload;
    }

    private static class IdempotencyRecord {
        final String status;
        final String response;

        IdempotencyRecord(String status, String response) {
            this.status = status;
            this.response = response;
        }
    }
}


25 Mart 2026 Çarşamba

Claude

Giriş
Bir örnek burada. Şeklen şöyle



1. Claude.md Dosyası
Ana kontrol dosyası. Örneğin 
- Asla main brach'i kullanma 

2. CLAUDE.local.md Dosyası
Açıklaması şöyle.
CLAUDE.local.md is useful for notes you do not want to commit but still want to apply in the current project.

3. subdirectories 
Açıklaması şöyle
- CLAUDE.md files inside subdirectories are not all loaded up front, but only when Claude Code actually reads content from those directories
- When multiple CLAUDE.md files are active at the same time, a nearest-scope rule usually applies, meaning instructions closer to the current task and narrower in scope take priority
- Within the same layer, rules that are more explicit and more specific are also more likely to be followed consistently than vague general statements
4. .claude Dizini

.claude/commands
tekrar eden işleri otomatikleştirme

4.1 .claude/rules
proje kuralları (test, naming, vs.)

Komutlar
/init
Başlangıç CLAUDE.md dosyasını yaratır.

/reflection for Regular Retrospectives
Açıklaması şöyle
At the end of each session, you can ask Claude Code to summarize what from that round of collaboration is worth adding to CLAUDE.md, and then turn those points into more stable project rules.
/skill-creator
Açıklaması şöyle.
A skill isn't a prompt. You don't type it. You build it once, describe what it does and when to use it, and Claude recognises when to fire it on its own. The right context appears, the skill runs. You do nothing.
Özel bir skill yapılandırmak için bu komutu kullanırız. Açıklaması şöyle.
You describe what you need, it helps you draft the skill, then runs a test (one session with the skill, one without) and opens a browser window so you can compare the results. Then it optimises automatically based on your feedback so the skill triggers when it should.

23 Mart 2026 Pazartesi

Cache Stratejileri Sunumu

Summary

  • In real systems:
    • 80% → Cache-Aside + Eviction
    • High-scale → Add these:
      • Stampede protection
      • Two-level cache
      • Event invalidation
  • Spring mainly supports:
    • Cache-Aside (natively)
    • Partial Write-Through
    • Eviction patterns
  • @Cacheable, @CachePut, @CacheEvict are mainly Cache-Aside tools
  • Advanced patterns require custom logic or cache provider features
  • High-scale systems often combine:
    • Cache-Aside + Eviction
    • Two-Level Cache
    • Stampede Protection
    • Event-Driven Invalidation
  • Spring annotations alone are not enough for advanced caching—you end up:
    • Using Caffeine / Redis features directly
    • Or writing custom cache layers

Read-Heavy Strategies

  • Cache-Aside - Implemented by App
  • Read-Through - Implemented by Cache Provider
  • Refresh-Ahead - Implemented by Cache Provider

Write-Heavy Strategies

  • Write-Through - Implemented by Cache Provider
  • Write-Behind (aka Write-Back) - Implemented by Cache Provider
  • Write-Around - Implemented by App

1. Cache-Aside (Lazy Loading)

App reads from cache → if miss → load from DB → put in cache. Cache is not responsible for loading; application does it.

@Service
public class UserService {
    @Cacheable(value = "users", key = "#id")
    public User getUser(Long id) {
        return userRepository.findById(id)
                .orElseThrow();
    }
}

2. Write-Through

Write goes to cache and DB synchronously. Cache always up-to-date.

@CachePut(value = "users", key = "#user.id")
public User saveUser(User user) {
    return userRepository.save(user);
}

3. Read-Through

Cache itself loads data (app doesn’t call DB directly). App only talks to cache provider. Cache abstracts loading logic. Provider like Hazelcast / Redis with loader.

4. Write-Behind

Write goes to cache → DB updated asynchronously later. Very fast writes.

public void saveUser(User user) {
    cache.put(user.getId(), user);

    asyncExecutor.submit(() -> {
        userRepository.save(user);
    });
}

5. Refresh-Ahead

Cache refreshes entries before expiration to avoid cache miss spikes. Not supported via Spring annotations.

Caffeine.newBuilder()
    .refreshAfterWrite(Duration.ofMinutes(5))
    .build(key -> loadFromDb(key));

6. Cache Eviction / Invalidation

Explicitly remove/update cache when data changes.

@CacheEvict(value = "users", key = "#id")
public void deleteUser(Long id) {
    userRepository.deleteById(id);
}

7. Write-Around

Writes go directly to DB, cache updated only on read. Prevents cache from being updated on writes. Cache becomes stale by design. Relies on future reads to populate.

@Service
public class OrderService {

    @Autowired
    private OrderRepository orderRepository;

    @Autowired
    private CacheManager cacheManager;

    public void createOrder(Order order) {
        orderRepository.save(order); // cache not updated
    }

    @Cacheable(value = "userOrders", key = "#userId")
    public List getOrdersForUser(Long userId) {
        return orderRepository.findByUserId(userId);
    }
}

8. Negative Caching Control

Cache “not found” results. Example: user not found → cache null. Prevents repeated DB hits. Key insight: unless="#result == null" avoids caching null values.

@Cacheable(value = "users", key = "#id", unless = "#result == null")
public User getUser(Long id) {
    return userRepository.findById(id).orElse(null);
}

9. Two-Level Cache

L1 (in-memory) + L2 (distributed like Redis). L1: Caffeine, L2: Redis. Must combine manually.

10. Cache Stampede Protection (Önbellek yığılması)

Prevent many threads from hitting DB on same miss. Only one thread fetches DB; others wait or use cache.

11. Read-Repair

If stale data detected → fix cache during read. Not supported via @Cacheable.

public User getUser(Long id) {
    User cached = cache.get(id);

    if (cached != null && isStale(cached)) {
        User fresh = userRepository.findById(id).orElse(null);
        cache.put(id, fresh); // repair
        return fresh;
    }

    if (cached != null) {
        return cached;
    }

    User fresh = userRepository.findById(id).orElse(null);
    cache.put(id, fresh);
    return fresh;
}

12. Event-Driven Cache Invalidation

Use events (Kafka, etc.) to invalidate/update cache entries.