Yazılım Çorbası: Fault Tolerance (Hataya Dayanıklılık) vs High Availability (Yüksek Süreklilik)

Giriş

Açıklaması şöyle.

1. High Available sistemler çökebilirler

2. High Available sistemler yapamadım veya hata oluştu şeklinde cevap verebilirler.

3. Fault Tolerant sistemler bu hatayı aşmanın bir yolunu bulurlar.

In other words, if, for instance, a web request is being processed by your highly available platform, and one of the nodes crashes, that user will probably get a 500 error back from the API, but the system will still be responsive for following requests. In the case of a fault-tolerant platform, the failure will somehow (more on this in a minute) be worked-around and the request will finish correctly, so the user can get a valid response. The second case will most likely take longer, due to the extra steps.

Bir başka daha detaylı açıklama şöyle

Fault tolerance implies zero service interruptions. If there is a failure somewhere the system will instantly switch to the backup solution and service will continue without interruption. High availability, on the other hand, implies that services are, well, highly available but not always available. A system can be highly available but not fault tolerant. I generally consider high availability to be an aspect of fault tolerance. In that, it addresses a certain type of “fault” (availability), but doesn’t necessarily talk about other aspects.

This is somewhat of a contrived example, but basically everyone watches streaming content, so let’s consider a digital rights management service that determines whether a viewer can watch a particular video. The service could be configured to be highly available, in that it will always serve and return queries. However, it may not handle certain backend data correctly and get into a state where it returns errors, or denies all requests. In this case, it would be highly available (it is reachable and is returning an answer), but it is not fault tolerant, because something in the system has caused it to misbehave.

The caveat with this example is that there’s a fine line between a “bug” and fault tolerance. But the idea of fault tolerance is that the system can handle unexpected events gracefully and continue providing an excellent user experience. (If this example is interesting to you, I recommend taking a look at how Netflix’s chaos monkey randomly terminates instances in production to ensure that engineers implement their services to be resilient to instance failures).

Maliyet
Açıklaması şöyle. High Available sistemler, Fault Tolerant sistemler göre daha az maliyetli olurlar
High Availability: Similar to fault tolerance, but more cost effective at the expense of comparatively more, but acceptable downtime. This works on the software side, and uses redundant systems and smart fault detection and correction strategies for it to function.

Yazılım Çorbası

30 Ocak 2023 Pazartesi

Fault Tolerance (Hataya Dayanıklılık) vs High Availability (Yüksek Süreklilik)

Hiç yorum yok:

Yorum Gönder

Blog Arşivi