13 Nisan 2026 Pazartesi

Production Issues Troubleshoot

Bazı problemler şöyle
Here are 15 real production scenario-based questions:

1. Your Spring Boot service CPU suddenly spikes to 90% in production. How will you investigate and fix it?

2. After deployment, your service starts throwing intermittent 500 errors. How will you debug this issue?

3. One microservice goes down and causes a chain failure in other services. How will you prevent this in future?

4. Your API response time increased from 200ms to 3 seconds after a new release. How will you identify the root cause?

5. Database connections are getting exhausted under load. What steps will you take to fix this?

6. A third-party service you depend on is timing out frequently. How will you handle this in your system?

7. You observe duplicate transactions happening in your system. How will you prevent this?

8. Logs are too large and distributed, making debugging difficult. How will you improve observability?

9. Memory usage keeps increasing and your service crashes after some time. How will you detect and fix memory leaks?

10. Your microservice works fine locally but fails in production. How will you approach debugging?

11. A new deployment breaks one feature but works for others. How will you safely roll back?

12. Traffic suddenly spikes 5x during peak hours and your service becomes slow. How will you scale?

13. Inter-service communication is failing due to network latency. How will you optimize it?

14. You need to trace a single request across multiple services during a failure. How will you implement tracing?

15. A bug in one service causes inconsistent data across multiple services. How will you handle data consistency?
Bazı problemle şöyle
Your Spring Boot service runs flawlessly in development, but crashes every night at 2am in production. Walk me through your debugging approach."

Most candidates respond:
‣ I would check the logs.
‣ I would restart the service.
‣ I would increase memory?
‣ Interview over.

Here is what interviewers are actually evaluating:

Step 1: Identify the pattern
2am is consistent. Not random. Not traffic-driven. This indicates a scheduled trigger or resource exhaustion. First question: what executes at 2am? Batch jobs? Scheduled tasks? Cron jobs?

Step 2: Analyze memory behavior before failure
Inspect JVM metrics and heap usage trends. If memory steadily increases from 10pm to 2am before crashing, it signals a memory leak not a functional bug or infrastructure issue.

Step 3: Diagnose the leak
Enable GC logs. Capture heap dumps. Identify objects with abnormal growth unclosed connections, static collections, or uncleared ThreadLocal variables. Even a single unclosed DB connection inside a loop can bring down the service.

Step 4: Validate connection pool utilization
HikariCP default pool size is 10. If a batch process consumes all connections without releasing them, subsequent requests block. By 2am, the pool is exhausted and the service becomes unresponsive.

 Solution: enforce connection timeouts and use proper try-with-resources patterns.

Step 5: Monitor with APM tools
Use Prometheus & Grafana, New Relic, or Datadog. Configure proactive alerts instead of reactive fixes. If heap usage exceeds 80% at 1am, alerts should trigger before failure occurs. That is production-grade engineering.

The gap between 12 LPA and 35 LPA is not defined by frameworks. It is defined by understanding what breaks at 3am and why.
Cpu Spike
Bir başka örnek burada

Database connections are getting exhausted under load
Örnek şöyle
@Service
public class UserService {
  @Autowired
  private JdbcTemplate jdbcTemplate; // OK

  @Transactional
  public void updateUsers(List<User> users) {
    users.forEach(user -> 
      jdbcTemplate.update(
        "UPDATE users SET last_login = ? WHERE id = ?",
        LocalDateTime.now(), user.getId()
      )
    );
 

  @Async
  @Transactional
  public void asycnUpdateUser(User user) {
    jdbcTemplate.update(
      "UPDATE users SET last_login = ? WHERE id = ?",
      LocalDateTime.now(), user.getId()
    );
  }
}
Açıklaması şöyle
Async threads can scale independently, but database connections cannot. This quickly overwhelms the connection pool.

9 Nisan 2026 Perşembe

Distributed Lock Source of Truth Olabilir mi?

Giriş
Soru şöyle
You have a distributed lock to prevent two users from booking the same hotel room.

Lock expires in 5 seconds. Your DB write takes 6 seconds under load.

Two users got confirmed bookins for the same room. How? What is the process to fix this issue.
Aslında şuna dikkat etmek lazım.
Lock ≠ correctness.
If your DB allows duplicates, your system will eventually produce them.
The real fix lives in atomic writes + constraints, not just distributed locks.
Yani lock aslında işlemi en baştan yapmamak için. Eğer iki işlem başlarsa bir tanesi başarısız olmalı.

Açıklaması şöyle
This is a correctness question. And at the Senior to Principal level, this is exactly what interviewers are testing for: do you understand the difference between coordination and actual data integrity?

If you are preparing for system design interviews right now, this is the kind of failure-mode thinking that matters a lot in strong loops.

Now, let us break this one down properly.

[1] How did both users get confirmed bookings?

The timeline usually looks like this:

- User A acquires the distributed lock for Room 101
- Lock lease is valid for 5 seconds
- User A starts the DB write to mark the room as booked
- Under load, that DB write takes 6 seconds
- At second 5, the lock expires before User A finishes
- User B now acquires the same lock because the lock service thinks it is free
- User B also starts a booking write
- Both flows eventually return success, and both users get confirmations

So what actually failed here? The system assumed the distributed lock was the source of truth. A lease-based lock only gives you temporary coordination.

If the critical section takes longer than the lease, another actor can enter while the first one is still working.

I cover fundamentals like locking, transactions, consistency, retries, idempotency, and failure handling in much more depth inside my System Design Fundamentals Guide for Senior to Principal engineers.

You can check it out here: puneetpatwari.in

[2] The deeper bug is usually not the lock itself

A lot of candidates stop at “increase the lock timeout.” That is not the real fix. The deeper issue is that your final correctness guarantee is missing at the database layer.

Because even if the lock expires, the database should still protect the invariant: “Only one valid booking can exist for this room for this date range.”

If both writes succeeded, it usually means one of these is true:
- no proper uniqueness or exclusion constraint existed
- booking availability was checked outside the final transaction
- writes were not serialized with row-level locking
- confirmation was sent before durable conflict detection finished

The lock helped reduce contention.
But the DB failed to enforce correctness.

[3] What is the right process to fix it

I would fix this in 4 steps.

1. Reconstruct the exact race
Check lock acquire time, lock expiry time, DB commit time, and confirmation event time for both users.

2. Move the invariant to the database

For hotel booking, correctness should be enforced with transactional logic such as:
- row-level locking on the inventory row
- atomic reserve-if-available update
- or exclusion/uniqueness constraints depending on data model

3. Treat the distributed lock as an optimization.
It can reduce hot contention, but it should never be the only thing preventing double booking.

4. Fix the confirmation path
Only send “booking confirmed” after the transaction commits successfully and conflict checks have passed.

5] If you still want to use distributed locks, do it safely

If a distributed lock stays in the design, I would add:
- lease renewal or heartbeats for long critical sections
- fencing tokens so stale lock holders cannot keep writing
- alerts when p99 DB latency gets too close to lock TTL
- idempotency keys so retries do not create duplicate booking flows

A good rule of thumb is simple: If your lock TTL is 5 seconds and your write path can take 6 seconds under load, your design is already telling you it is unsafe.

8 Nisan 2026 Çarşamba

Correlation Id vs Trace Id

Giriş
Açıklaması şöyle
I often noticed that some developers do not really understand the difference between traceId and correlationId. I saw this so often that I decided to write this post.

At first they look similar.
Both are IDs.
Both appear in logs.
Both help during incidents.

But they answer different questions.

traceId answers:
"How did this specific execution path go through the system?"

correlationId answers:
"Which logs and events belong to the same business story?"

That difference becomes obvious once async enters the picture 

Example:

A user places an order.

The system does this:

1. Order Service creates the order
2. Payment Service charges the card
3. Kafka event is published
4. Billing Worker creates invoice
5. Email Service sends confirmation

Now imagine the logs:

Order created
correlationId=ORDER-8472
traceId=T1

Payment charged
correlationId=ORDER-8472
traceId=T1

Billing started from Kafka consumer
correlationId=ORDER-8472
traceId=T2

Email sending failed
correlationId=ORDER-8472
traceId=T3

This is the key point 

One correlationId
Multiple traceIds

Why?

Because the business flow is one.
But the technical executions are split.

The HTTP request is one execution.
Kafka consumer is another.
Retry later can be another.
Email worker can be another too.

So:

correlationId helps you reconstruct the whole story.
traceId helps you inspect one exact path in detail.

That is why using correlationId instead of tracing is a mistake.
You may connect logs, but you still do not get spans, timing hierarchy, or where exactly latency exploded.

And using only traceId is also not enough.
In distributed async systems, tracing often shows fragments. Correlation is what lets you stitch them back together 🧩

How I usually use them during incidents:

1. Start with correlationId
Find everything related to the same order, job, or user flow.

2. Then drill into traceId
Open the exact failing execution and inspect where it slowed down or broke.

Simple version:

traceId = the path
correlationId = the story

Have you seen teams mix these two and then realize the difference only during a production incident? 

Fencing Tokens

Giriş
Açıklaması şöyle
Distributed systems concept: Fencing Tokens
You designed a fancy distributed locking algorithm just to find that an old primary is able to overwrite data!

The problem:
- Node A holds the lock, and is doing some work.
- Node A gets disconnected/unresponsive/crashes, and resume execution after its lease expires ("true" time)
- Node B, in the meantime, acquired the lock and wrote some data.
- Node A resume executions, thinking their lock is still valid
- Node A overwrites the data written by Node B, even tho it doesn't have the lock anymore.

That's were fencing token comes in: when a node acquires the lock, it gets a token with a monotonically increasing number. When the node tries to write data, it must include the token. If the token is outdated (i.e., lower than the current token), the write is rejected, preventing stale nodes from overwriting newer data.

Fencing tokens are used in a variety of systems, like etcd

The big takeaway is that you can't rely on just the client to know whether they are in their right. The target resource must have a gating mechanism to verify that the request makes sense.


JSON Web Token - JWT ve Hemen Logout

Giriş
Eğer tamamen stateless çalışıyorsak hemen logout mümkün değil. Ancak sunucu tarafına biraz state eklersek bazı çözümler elde ederiz.

1. Short-lived access tokens
- Keep access tokens valid for 5 to 15 minutes
- This limits the damage window
- Very common and simple

2. Refresh token revocation
- Store refresh tokens in DB or Redis
- On logout, delete or mark them revoked
- This is the most common real-world pattern

3. Token blacklist / denylist
- Store revoked JWT IDs or token hashes until they expire
- Check this list on every request
- Useful for high-risk logout or compromised accounts
- But now auth is no longer fully stateless

4. Token versioning
- Store a tokenVersion or sessionVersion on the user record
- Include that version in the JWT
- On logout-all-devices or password reset, increment the version
- Old tokens stop working once the version mismatches

26 Mart 2026 Perşembe

Yazılım Mimarisi - Idempotency ve Phantom Write

Giriş
Açıklaması şöyle
You typically implement idempotency like this:
  1. Check if request already processed (via key / timestamp / PK)
  2. If not → write data
  3. If yes → skip
Eğer check işlemi atomic değilse problem oluyor.

Failure Mode 1: The TTL Expiry Trap
Açıklaması şöyle
The most common idempotency implementation stores a request key with a time-to-live (TTL) — typically 24 or 48 hours. The assumption is that any duplicate will arrive within that window. In practice, this assumption frequently breaks.
Açıklaması şöyle
The fix: Never use TTL-only idempotency for operations with unbounded retry windows. Instead, use a database-backed idempotency store with a three-state model (IN_PROGRESS, COMPLETED, FAILED) where the expires_at column drives a cleanup job for storage management — not correctness. The cleanup window should be set significantly longer than your worst-case replay window (7 days minimum for Kafka-based systems).
Failure Mode 2: The Partial Execution Ghost
Açıklaması şöyle
A request arrives, the system writes the idempotency key with status IN_PROGRESS, begins processing, writes half the data, and crashes — JVM OOM, container eviction, network partition. The idempotency key is now in IN_PROGRESS state. When the retry arrives, the system faces an impossible decision: did the original operation complete or not?
Açıklaması şöyle
The fix: Wrap both the business logic and the idempotency state transition in a single database transaction. If the transaction rolls back, both the business data and the idempotency status roll back together. For stale IN_PROGRESS keys (where the original processor is likely dead), use a configurable timeout threshold to reclaim and re-execute safely.
Failure Mode 3: The Concurrent Check Race
Burada check koşulu atomic değil. Açıklaması şöyle
The fix: Use INSERT ... ON CONFLICT DO NOTHING (PostgreSQL 9.5+) to make the check-and-claim atomic. If the RETURNING clause yields no rows, the key already existed — fetch its status with SELECT ... FOR UPDATE. For non-blocking behavior, SELECT ... FOR UPDATE SKIP LOCKED lets the second instance return 409 Conflict immediately rather than waiting.
Failure Mode 4: The Layer Mismatch
Açıklaması şöyle
The fix: Propagate a correlation ID from the original request as a Kafka header, and have every downstream consumer enforce its own idempotency barrier using that ID as the deduplication key.
Spring Boot + SQL Server
Kod şöyle. Burada 
Partial Execution tek transaction ile çözülüyor.
The Concurrent Check Race, DuplicateKeyException ile çözülüyor. Eğer Postgres kullanıyor olsaydık exception yerine SQL'in kaç tane satırı değiştirdiğine bakacaktır
- The Layer Mismatch sorunu outbox pattern ile çözülüyor.
@Service
@RequiredArgsConstructor
public class IdempotentService {
  private final JdbcTemplate jdbc;
  public record Response(String result) {}

  @Transactional
  public Response handleRequest(String idempotencyKey, String payload) {
    try {
      // Attempt barrier insert (atomic)
      // SQL Server:
      // INSERT INTO idempotency_table (idempotency_key, status)
      // VALUES (?, 'IN_PROGRESS')
      jdbc.update(
        "INSERT INTO idempotency_table (idempotency_key, status) VALUES (?, 'IN_PROGRESS')",
        idempotencyKey
      );

      // First request owns the key → perform business logic
      String result = doBusinessLogic(payload);

      // Insert into outbox for async processing
      // SQL Server:
      // INSERT INTO outbox_table (idempotency_key, payload) VALUES (?, ?)
      jdbc.update(
        "INSERT INTO outbox_table (idempotency_key, payload) VALUES (?, ?)",
        idempotencyKey, result
      );

      // Mark barrier as completed and store result
      // SQL Server:
      // UPDATE idempotency_table SET status='COMPLETED', response=? WHERE idempotency_key=?
      jdbc.update(
        "UPDATE idempotency_table SET status='COMPLETED', response=? WHERE idempotency_key=?",
        result, idempotencyKey
      );
      return new Response(result);
     } catch (DuplicateKeyException ex) {
      // Barrier row already exists → handle duplicate
       // SQL Server:
       // SELECT * FROM idempotency_table WITH (UPDLOCK, ROWLOCK) WHERE idempotency_key=?
       IdempotencyRecord record = jdbc.queryForObject(
         "SELECT status, response FROM idempotency_table WITH (UPDLOCK, ROWLOCK) WHERE idempotency_key=?",
         (rs, rowNum) -> new IdempotencyRecord(rs.getString("status"), rs.getString("response")),
         idempotencyKey
       );

       switch (record.status) {
         case "COMPLETED":
           // Return cached result
           return new Response(record.response);
         case "IN_PROGRESS":
           // Someone else is working → can wait or throw 409
           throw new IllegalStateException("Request is already in progress");
         case "FAILED":
           // Previous attempt failed → allow retry
           throw new IllegalStateException("Previous attempt failed, safe to retry");
         default:
           throw new IllegalStateException("Unknown barrier state: " + record.status);
         }
      }
  }

  private String doBusinessLogic(String payload) {
    // your domain logic here
    return "processed:" + payload;
  }

  private static class IdempotencyRecord {
      final String status;
      final String response;
      IdempotencyRecord(String status, String response) {
        this.status = status;
        this.response = response;
      }
  }
}
Eğer hem SQL Server hem de Postgres için çalışsın istiyorsak şöyle yaparızz
    
    
@Service
@RequiredArgsConstructor
public class IdempotentService {

    private final JdbcTemplate jdbc;

    public record Response(String result) {}

    @Transactional
    public Response handleRequest(String idempotencyKey, String payload) {
        boolean isWinner = false;

        try {
            // --------------------------
            // Attempt atomic barrier insert
            // --------------------------
            // Postgres:
            // INSERT INTO idempotency_table (idempotency_key, status)
            // VALUES (?, 'IN_PROGRESS')
            // ON CONFLICT DO NOTHING
            //
            // SQL Server:
            // INSERT INTO idempotency_table (idempotency_key, status)
            // VALUES (?, 'IN_PROGRESS')
            int rows = jdbc.update(
                    "INSERT INTO idempotency_table (idempotency_key, status) VALUES (?, 'IN_PROGRESS')",
                    idempotencyKey
            );

            // Postgres: rows == 1 → winner
            // SQL Server: INSERT succeeded → winner
            isWinner = rows == 1;

        } catch (DuplicateKeyException ex) {
            // SQL Server only: duplicate → loser
            isWinner = false;
        }

        if (isWinner) {
            // --------------------------
            // Winner executes business logic
            // --------------------------
            String result = doBusinessLogic(payload);

            // Insert into outbox (side effect)
            // INSERT INTO outbox_table (idempotency_key, payload) VALUES (?, ?)
            jdbc.update(
                    "INSERT INTO outbox_table (idempotency_key, payload) VALUES (?, ?)",
                    idempotencyKey, result
            );

            // Mark barrier as completed + store response
            // UPDATE idempotency_table SET status='COMPLETED', response=? WHERE idempotency_key=?
            jdbc.update(
                    "UPDATE idempotency_table SET status='COMPLETED', response=? WHERE idempotency_key=?",
                    result, idempotencyKey
            );

            return new Response(result);
        } else {
            // --------------------------
            // Loser reads existing row safely
            // --------------------------
            // SQL Server: SELECT ... WITH (UPDLOCK, ROWLOCK) WHERE idempotency_key=?
            // Postgres: SELECT * FROM idempotency_table WHERE idempotency_key=?
            IdempotencyRecord record = jdbc.queryForObject(
                    "SELECT status, response FROM idempotency_table " +
                            (isPostgres() ? "" : "WITH (UPDLOCK, ROWLOCK) ") +
                            "WHERE idempotency_key=?",
                    (rs, rowNum) -> new IdempotencyRecord(rs.getString("status"), rs.getString("response")),
                    idempotencyKey
            );

            switch (record.status) {
                case "COMPLETED":
                    return new Response(record.response);
                case "IN_PROGRESS":
                    throw new IllegalStateException("Request already in progress");
                case "FAILED":
                    throw new IllegalStateException("Previous attempt failed, safe to retry");
                default:
                    throw new IllegalStateException("Unknown barrier state: " + record.status);
            }
        }
    }

    private boolean isPostgres() {
        // Detect DB type from DataSource or JdbcTemplate if needed
        return true; // placeholder, implement detection
    }

    private String doBusinessLogic(String payload) {
        return "processed:" + payload;
    }

    private static class IdempotencyRecord {
        final String status;
        final String response;

        IdempotencyRecord(String status, String response) {
            this.status = status;
            this.response = response;
        }
    }
}


25 Mart 2026 Çarşamba

Claude

Giriş
Bir örnek burada. Şeklen şöyle



1. Claude.md Dosyası
Ana kontrol dosyası. Örneğin 
- Asla main brach'i kullanma 

2. CLAUDE.local.md Dosyası
Açıklaması şöyle.
CLAUDE.local.md is useful for notes you do not want to commit but still want to apply in the current project.

3. subdirectories 
Açıklaması şöyle
- CLAUDE.md files inside subdirectories are not all loaded up front, but only when Claude Code actually reads content from those directories
- When multiple CLAUDE.md files are active at the same time, a nearest-scope rule usually applies, meaning instructions closer to the current task and narrower in scope take priority
- Within the same layer, rules that are more explicit and more specific are also more likely to be followed consistently than vague general statements
4. .claude Dizini

.claude/commands
tekrar eden işleri otomatikleştirme

4.1 .claude/rules
proje kuralları (test, naming, vs.)

Komutlar
/init
Başlangıç CLAUDE.md dosyasını yaratır.

/reflection for Regular Retrospectives
Açıklaması şöyle
At the end of each session, you can ask Claude Code to summarize what from that round of collaboration is worth adding to CLAUDE.md, and then turn those points into more stable project rules.
/skill-creator
Açıklaması şöyle.
A skill isn't a prompt. You don't type it. You build it once, describe what it does and when to use it, and Claude recognises when to fire it on its own. The right context appears, the skill runs. You do nothing.
Özel bir skill yapılandırmak için bu komutu kullanırız. Açıklaması şöyle.
You describe what you need, it helps you draft the skill, then runs a test (one session with the skill, one without) and opens a browser window so you can compare the results. Then it optimises automatically based on your feedback so the skill triggers when it should.

23 Mart 2026 Pazartesi

Cache Stratejileri Sunumu

Summary

  • In real systems:
    • 80% → Cache-Aside + Eviction
    • High-scale → Add these:
      • Stampede protection
      • Two-level cache
      • Event invalidation
  • Spring mainly supports:
    • Cache-Aside (natively)
    • Partial Write-Through
    • Eviction patterns
  • @Cacheable, @CachePut, @CacheEvict are mainly Cache-Aside tools
  • Advanced patterns require custom logic or cache provider features
  • High-scale systems often combine:
    • Cache-Aside + Eviction
    • Two-Level Cache
    • Stampede Protection
    • Event-Driven Invalidation
  • Spring annotations alone are not enough for advanced caching—you end up:
    • Using Caffeine / Redis features directly
    • Or writing custom cache layers

Read-Heavy Strategies

  • Cache-Aside - Implemented by App
  • Read-Through - Implemented by Cache Provider
  • Refresh-Ahead - Implemented by Cache Provider

Write-Heavy Strategies

  • Write-Through - Implemented by Cache Provider
  • Write-Behind (aka Write-Back) - Implemented by Cache Provider
  • Write-Around - Implemented by App

1. Cache-Aside (Lazy Loading)

App reads from cache → if miss → load from DB → put in cache. Cache is not responsible for loading; application does it.

@Service
public class UserService {
    @Cacheable(value = "users", key = "#id")
    public User getUser(Long id) {
        return userRepository.findById(id)
                .orElseThrow();
    }
}

2. Write-Through

Write goes to cache and DB synchronously. Cache always up-to-date.

@CachePut(value = "users", key = "#user.id")
public User saveUser(User user) {
    return userRepository.save(user);
}

3. Read-Through

Cache itself loads data (app doesn’t call DB directly). App only talks to cache provider. Cache abstracts loading logic. Provider like Hazelcast / Redis with loader.

4. Write-Behind

Write goes to cache → DB updated asynchronously later. Very fast writes.

public void saveUser(User user) {
    cache.put(user.getId(), user);

    asyncExecutor.submit(() -> {
        userRepository.save(user);
    });
}

5. Refresh-Ahead

Cache refreshes entries before expiration to avoid cache miss spikes. Not supported via Spring annotations.

Caffeine.newBuilder()
    .refreshAfterWrite(Duration.ofMinutes(5))
    .build(key -> loadFromDb(key));

6. Cache Eviction / Invalidation

Explicitly remove/update cache when data changes.

@CacheEvict(value = "users", key = "#id")
public void deleteUser(Long id) {
    userRepository.deleteById(id);
}

7. Write-Around

Writes go directly to DB, cache updated only on read. Prevents cache from being updated on writes. Cache becomes stale by design. Relies on future reads to populate.

@Service
public class OrderService {

    @Autowired
    private OrderRepository orderRepository;

    @Autowired
    private CacheManager cacheManager;

    public void createOrder(Order order) {
        orderRepository.save(order); // cache not updated
    }

    @Cacheable(value = "userOrders", key = "#userId")
    public List getOrdersForUser(Long userId) {
        return orderRepository.findByUserId(userId);
    }
}

8. Negative Caching Control

Cache “not found” results. Example: user not found → cache null. Prevents repeated DB hits. Key insight: unless="#result == null" avoids caching null values.

@Cacheable(value = "users", key = "#id", unless = "#result == null")
public User getUser(Long id) {
    return userRepository.findById(id).orElse(null);
}

9. Two-Level Cache

L1 (in-memory) + L2 (distributed like Redis). L1: Caffeine, L2: Redis. Must combine manually.

10. Cache Stampede Protection (Önbellek yığılması)

Prevent many threads from hitting DB on same miss. Only one thread fetches DB; others wait or use cache.

11. Read-Repair

If stale data detected → fix cache during read. Not supported via @Cacheable.

public User getUser(Long id) {
    User cached = cache.get(id);

    if (cached != null && isStale(cached)) {
        User fresh = userRepository.findById(id).orElse(null);
        cache.put(id, fresh); // repair
        return fresh;
    }

    if (cached != null) {
        return cached;
    }

    User fresh = userRepository.findById(id).orElse(null);
    cache.put(id, fresh);
    return fresh;
}

12. Event-Driven Cache Invalidation

Use events (Kafka, etc.) to invalidate/update cache entries.

19 Mart 2026 Perşembe

Amazon Web Service (AWS) EventBridge - “Kafka-lite, fully managed, rule-based event routing

Giriş
Akış şöyle
Webhooks → simple HTTP push (external trigger mechanism)
Amazon EventBridge → event router (central nervous system)
AWS Lambda → code runner (brain doing the work)
Burada bir tane örnek mimari var. Bu mimaride Webhook çağrıları direkt mikroservislere tetikleyeceğine önce AWS EventBridge'a giriyorlar. Daha sonra buradan yönlendiriliyorlar. Bu mimarideki önemli noktalar şöyle
The Patterns Nobody Documents
Here’s what I learned building this that you won’t find in AWS documentation.

Pattern 1: Event Normalization at the Edge
Don’t let raw external events onto your bus. Ever. Your webhook handler should transform vendor-specific payloads into domain events. When we integrated PayPal, our services didn’t care. They still received payment.completed events with the same schema.

Pattern 2: Event Versioning from Day One
We screwed this up initially. Six months in, we needed to change the event schema. Half our services were still consuming v1 events. Now every event includes a version field, and EventBridge rules route based on version. Services can migrate on their own schedule.

Pattern 3: Dead Letter Queues for Everything
This saved us during Black Friday. A bug in the inventory service caused it to reject 15% of order.created events. Because we had DLQs configured, those events sat safely in a queue while we fixed the bug, then we replayed them. Zero lost orders.

Pattern 4: Archive Anything That Touches Money
EventBridge archiving is criminally underused. We archive every payment-related event for 90 days. When customers dispute charges, we have perfect audit trails. When the finance team needs transaction reports, we replay archived events. Cost? $47/month for 2.1M archived events.

17 Mart 2026 Salı

MIT lisans

MIT vs LGPL
Açıklaması şöyle
LGPL says, “you can use this code, but if you change it, you must share your changes under the same terms.” MIT says, “Do whatever you want.” One protects the community. The other lets corporations take without giving back.


LGPL - GNU Lesser General Public License

LGPL
GPL'in kısıtlayıcı olduğu düşünüldüğü için LGPL (GNU Lesser General Public License) çıkmıştır.
Açıklaması şöyle
LGPL says, “you can use this code, but if you change it, you must share your changes under the same terms.”
LGPL Kodu Kullanırsak ve Uygulamamızı Dağıtırsak (Distribution)
GPL ile LGPL'in ayrıştığı en önemli nokta bence bu. LGPL yazılımı kullanıyorsak ve kendi ürünümüzü satıyorsak, kaynak kodumuzu açmak zorunda değiliz. Açıklaması şöyle. Eğer kaynak kodumuzu açmak istersek kendi kodumuz da LGPL lisanslı olmalı.
Yes, you can distribute your software without making the source code public and without giving recipients the right to make changes to your software.

The LGPL license explicitly allows such usages of libraries/packages released under that license.

10 Mart 2026 Salı

Medallion Architecture

Giriş
Açıklaması şöyle
Medallion architecture is a data design pattern that organizes data into three layers:

Bronze Layer (Raw):
  • Data ingested in its original format
  • Minimal transformation
  • Append-only historical record
  • No data quality enforcement
Silver Layer (Refined):
  • Cleaned and conformed data
  • Schema enforced
  • Deduplicated
  • Validated
  • Still fairly granular
Gold Layer (Curated):
  • Business-level aggregations
  • Denormalized for consumption
  • Optimized for specific use cases
  • Analytics-ready
Origin: Popularized by Databricks around 2019-2020 as part of the lakehouse pattern.

23 Şubat 2026 Pazartesi

Source code comments ve Decision Context

Giriş
Bu yazı Kod Gözden Geçirmesi - Code Review sürecinden yola çıkarak başladı. Kod Gözden Geçirmesi  sürecinde şöyle bir madde vardı.

1. Source code comments are sufficient :
Yazan cümleler genelde şöyle. İşte burada görecelilik ön plana çıkıyor.
  • If there is a comment, does it explain why the code does what it does?
  • Is each line of the code - in its context - either self-explanatory enough that it does not need a comment, or if not, is it accompanied by a comment which closes that gap?
  • Can the code be changed so it does not need a comment any more?
Emniyet kritik bazı projelerde her satır için comment olması isteniyor. O zaman iş sanki biraz daha kolay. Sadece her satıra bakmak yeterli. Kod şöyle görünüyor.
/* Display an error message */
function display_error_message( $error_message )
{
  /* Display the error message */
  echo $error_message;

  /* Exit the application */
  exit();
}

/* -------------------------------------------------------------------- */

/* Check if the configuration file does not exist, then display an error */
/* message */
if ( !file_exists( 'C:/xampp/htdocs/essentials/configuration.ini' ) ) {
  /* Display an error message */
  display_error_message( 'Error: ...');
}
Yapılması gerekenlere bazı örnek
- Source code conforms to coding standard and is checked by automated tool
- Source code is checked manually by reviewer if automation is not possible
- Source code is checked for memory leaks by a dedicated tool
- Source code is compatible and traceable to SRS

2. The Pattern I Notice in Every High-Quality Codebase
Yüksek kalite kodlarda bir karar yani "neden" açıklaması vardır. Açıklaması şöyle
I've started noticing four types of decision context that great codebases maintain:
...
Without this context, all code looks equally arbitrary.
1. Business context — Why this business rule exists
Örnek şöyle
// Stripe charges 2.9% + $0.30 per transaction
// We pass this through to users on transactions <$10
// For larger transactions, we absorb it (reduces churn by 8%)
const FEE_THRESHOLD = 1000; // in cents
2. Historical context — Why we chose this approach
Örnek şöyle
// We tried async/await here but hit deadlocks under load
// See incident post-mortem: docs/incidents/2024-01-15-deadlock.md
// Synchronous approach is slower but reliable
fn process_batch_sync(items: Vec<Item>) -> Result<()> {
3. Constraint context — What limits our options
Örnek şöyle
// API rate limited to 100 req/min per docs/api-limits.md
// We batch requests to stay under limit with 20% safety margin
const maxRequestsPerMinute = 80
 4. Future context — What we plan to change
Örnek şöyle
// TODO: Move to event-driven architecture
// Blocked on: Kafka cluster provisioning (INFRA-445)
// Timeline: Q2 2024
// This polling approach is temporary
pollForUpdates();




18 Şubat 2026 Çarşamba

Data Models

Giriş
Yazıyı (10 Data Models Every Data Engineer Must Know (Before They Break Production)) ilk olarak burada gördüm.

10. 10. Star Schema: The Legacy Workhorse (That Fails at Scale)
Açıklaması şöyle.
Star schemas are intuitive and analyst-friendly, but at scale they become a performance bottleneck, especially with massive fact tables, high-cardinality dimensions, and near-real-time workloads.
9. Snowflake Schema: Over-Engineered & Slow
Açıklaması şöyle.
Snowflake schemas optimize storage, not query performance. In modern analytics (cloud OLAP, dashboards, ad-hoc queries), compute is the bottleneck, not disk. Excessive normalization explodes join depth and kills latency.
8. Data Vault: The Enterprise Monster (When You Need Auditability)
Açıklaması şöyle.
Data Vault excels at auditability, lineage, and full historization, critical for regulated industries (banking, healthcare). But its multi-layer architecture makes it fundamentally unsuited for low-latency analytics.
7. Wide-Column Stores (Cassandra, Bigtable) for Time-Series Chaos
Açıklaması şöyle. 
Wide-column databases dominate high-velocity ingest (IoT, metrics, logs) where writes never stop. But they sacrifice query flexibility, no joins, limited filtering, and rigid access patterns. You win on writes, lose on exploration.
6. Graph Models (Neo4j, TigerGraph) for Hidden Relationships
Açıklaması şöyle.
When insight lives in relationships (fraud rings, social influence, network hops), relational joins collapse under recursive depth. Graph databases treat relationships as first-class citizens, making multi-hop traversals fast and natural.
5. Streaming Event Sourcing (Kafka + CDC)
Açıklaması şöyle.
Batch ETL is fundamentally incompatible with real-time systems. CDC turns database mutations into immutable events, enabling near-zero-latency pipelines, replayable state, and system-wide consistency across microservices.
4. Columnar Storage (Parquet, Delta Lake) for Cheap, Fast Analytics
Parquet bir örnek
Açıklaması şöyle.
Row-based databases are optimized for point lookups, not scans. Analytics workloads read a few columns across billions of rows, exactly what columnar storage is built for. The result: orders-of-magnitude faster queries at a fraction of the cost.
Örnek
Şöyle yaparız
CREATE TABLE sales_parquet (
    order_id BIGINT,
    region   STRING,
    amount   DECIMAL(10,2),
    order_ts TIMESTAMP
)
USING PARQUET
PARTITIONED BY (region, order_date);

SELECT
    region,
    SUM(amount) AS total_sales
FROM sales_parquet
WHERE order_date = '2025-12-25'
  AND region = 'US'
GROUP BY region;
Açıklaması şöyle. 
Why this is fast
- Only amount and region columns are read
- Only the order_date=2025-12-25 and US partitions are scanned
- All other files are skipped entirely
3. Multi-Model Hybrids (When SQL + NoSQL Collide)
Açıklaması şöyle. Burada veri tabanının JSONB sütunları desteklemesi önemli
Real-world data is rarely one shape. Modern apps mix relational facts, semi-structured JSON, and relationships. Multi-model databases let you query everything in one place, without forcing awkward ETL or duplicating data.
2. Reverse ETL (Operational Analytics) to Put Data Back in Apps

1. The Unified Serving Layer (The Future of Production Data)
One dataset. Many engines. Zero rewrites. Açıklaması şöyle
Modern data stacks fracture data across OLTP, OLAP, search, and streaming systems, creating sync lag and duplicated logic. A Unified Serving Layer uses one logical data layer (Iceberg/Hudi/Delta) with multiple access modes: SQL analytics, near-real-time reads, ML, and even graph/search workloads.