17 Aralık 2020 Perşembe

Robustness - Dayanıklılık ve Fail Fast (Hemen Hata Verme)

Giriş
Bu yazıda Robustness (Dayanıklılık) beklenmedik bir hatayla karşılaşınca 
- otomatik çözüm bulma, 
- varsayılan bir davranış gösterme 
olarak düşünülüyor.

Bu davranış yerine Fail Fast (Hemen Hata Verme) tercih edilebilir. Fail Fast özellikle hatanın ne gibi sonuçlara yol açacağı tahmin edilemiyorsa iyi bir yöntem.

Bu yöntemde default (varsayılan) davranışlar kodda yer almıyor. Uygulama ya sonlandırılıyor ya da bir özellik, bir ekran vs. kapatılıyor/etkisizleştiriliyor.

Problemin çözümü de bir supervisor'e devrediliyor. Supervisor insan veya bir başka uygulama olabilir.

Açıklaması şöyle
Some people recommend making your software robust by working around problems automatically. This results in the software “failing slowly.” The program continues working right after an error but fails in strange ways later on. A system that fails fast does exactly the opposite: when a problem occurs, it fails immediately and visibly. Failing fast is a nonintuitive
technique: “failing immediately and visibly” sounds like it would make your software more fragile, but it actually makes it more robust. Bugs are easier to find and fix, so fewer go into production.
Beklenen ve Beklenmedik Hatalar
Örnek
Bence güzel bir örnek şöyle
The important part here is the kind of error you encountered. There are errors that are expected, and where you know what to do with them. Typical examples are network errors, e.g. in your web application you need to display an error if the server doesn't respond, and probably give the user a button to retry. You don't want to crash everything for this kind of error that you can cleanly handle.

Another type of error are those that simply make the current job impossible. For example if you need to read 100 different files for a specific job, if any of them fails you don't need to continue, it is impossible to complete the job. So you don't need a try/catch around every file access, you can let the whole thing either succeed completely, or let if fail on any error.

The most important error, and the one this statement is really about is an unexpected error that has put your application into an unknown state. Let's assume we're in an application with multiple threads and shared memory. We have a try/catch around the whole program in each thread that catches anything. Is it safe to just restart the thread if any kind of arbitrary exception is thrown?

The answer is no, because of the shared state. The error could have done anything to the shared memory, and put it into a corrupt state. What you need to do is to get the program into a defined, known good state again. In most programming languages this means crashing the entire program and restarting it. You can't recover from having your application in an unknown state. Any of your assumption might be broken, there might simply be garbage data in some of your state.
Pragmatic Programmer
Bu kitapta Fail Fast yerine Crash Early kelimesi kullanılıyor. Kitaptaki tanım şöyle
One of the benefits of detecting problems as soon as you can is that you can crash earlier, and crashing is often the bet thing you can do. The alternative may be to continue, writing corrupted data to some vital database or commanding the washing machine into its twentieth consecutive spin cycle.
Crash Early davranışı neticesinde olabilecek şeylerin açıklaması şöyle
In these environments, programs are designed to fail, but that failure is managed with supervisors. A supervisor is responsible for running code and knows what to do in case the code fails, which could include cleaning up after it, restarting it, and so on.
Aslında Crash Early kelimesi muhtemelen yanlış kullanılmış ve yanlış anlaşılıyor. Açıklaması şöyle
The recommendation is to let a program terminate its execution ASAP when there is an indication that it cannot safely continue (the term "crash" can also be replaced by "end gracefully", if one prefers this). The important word here is not "crash", but "early" - as soon as such an indication becomes aware in a certain part of the code, the program should not "hope" that later executed parts in the code might still work, but simply end execution, ideally with a full error report. And a common way of ending execution is using a specific exception for this, transport the information where the problem occurred to the outermost scope, where the program should be terminated.

Moreover, the recommendation is not against catching exceptions in general. The recommendation is against the abuse of catching unexpected exceptions to prevent the end of a program. Continuing a program though it is unclear whether this is safe or not can mask severe errors, makes it hard to find the root cause of a problem and has the risk of causing more damage than when the program suddenly stops.
Supervisor ne olabilir. Açıklaması şöyle
Such a supervisor is either a person, which will deal with the failure of a program, or another program running in a separate process, which monitors the activity of other, more complex programs, and can take appropriate actions when one of them "fails".

What this is precisely depends heavily on the kind of program, and the potential costs of a failure. Imagine the failure scenarios for

- a desktop application with some GUI for managing address data in a database

- a malware scanner on your PC

- the software which makes the regular backups for the Stack Exchange sites

- software which does automatic high speed stock trading

- software which runs your favorite search engine or social network

- the software in your newest smart TV or your smartphone

- controller software for an insulin pump

- controller software for steering of an airplane

- monitoring software for a nuclear power plant

I think you can imagine by yourself for which of these examples a human supervisor is enough, or where an "automatic" supervisor is required to keep the system stable even when one of its components fail.
Örnek - Parser
Fail fast yöntemini takip etmeyen PHP kodlarında hata mesajları çok uzun olabiliyor çünkü derleyici çalışmaya devam ediyor. Aslında derleyici beklenmedik bir hata ile karşılaşmış ve ilk hatada çıksa daha iyi olur. Açıklaması şöyle
Please don't get hung up on the fact that I mentioned PHP. one of the features of php is that it keeps trying to work even when there are errors. Most other languages stop the process flow on an error. That's a big reason for a cascade of error messages.
Örnek - Formatlı Dosya Okuma
Açıklaması şöyle. İlk hata bulduğunda uygulama durabilir. Eğer devam ederse, belki yanlış bir veri okunacak
If the first step of parsing a file fails, then stop with an error. Don't carry on passing bad data from one step to the next.


Hiç yorum yok:

Yorum Gönder