Yazılım Çorbası: Unicode Düzlemleri (Plane)

Giriş
Unicode Code Point'leri "plane" (düzlem) ve "block" olarak ayrılmış durumda. 17 tane plane (düzlem) var. Her düzlem 65,535 kod noktası alabilir. 11 tane düzlem şu anda boş. Yani Unicode içinde halen bir sürü boş yer var.

Düzlemler ise block'lardan oluşuyor. Block'ların büyüklüğü sabit değil.

1. Unicode ve ISO/EIC 10646 İlişkisi
Açıklaması şöyle. Yani UCS ve Unicode aynı şeyler.

The Universal Coded Character Set (UCS) is a standard set of characters defined by the International Standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS) (plus amendments to that standard), which is the basis of many character encodings. The latest version contains over 136,000 abstract characters, each identified by an unambiguous name and an integer number called its code point. This ISO/IEC 10646 standard is maintained in conjunction with The Unicode Standard ("Unicode"), and they are code-for-code identical.

2. Basic Multilingual Plane (BMP)

2. Basic Multilingual Plane (BMP) yazısına taşıdım

3. Private-Use Characters

Açıklaması şöyle

U+E000 to U+F8FF is a private-use area. It's reserved to allow systems to store and display characters that are not present in Unicode.

4. Emoji
Emoji'nin Unicode'a girmesinin açıklaması şöyle

Emoji were first developed by major telcos in Japan as an extension to their own text encoding schemes and became popular with the customer base. Afterwards, when Apple invaded the Japanese market with iPhone, people were upset they couldn't use emoji. Apple decided to implement emoji into iPhone, forcing the Unicode Consortium to include it into Unicode as well.

Because of Unicode's philosophy (make sure we can encode all written text), they have included all the existing emoji. This caused a lot of "Japanese" things to be included which were never made for other countries. For example, for years the Japanese flag was the only country flag in Unicode as well.

5. Kategoriler
Her Code Point belli bir kategoriye dahildir. Açıklaması şöyle.

Each Unicode point also has a property called "General Category", that attempts to describes the role of the corresponding symbol in the languages or applications for whose sake it was included in the system. Examples of General Categories are "Lu" (meaning upper-case letter), "Nd" (decimal digit), "Pi" (open-quote punctuation), and "Mn" (non-spacing mark, i.e. a diacritic for the preceding glyph). This division is completely independent of code blocks: the code points with a given General Category generally span many blocks, and do not have to be consecutive, not even within each block.[3]

5.1 Number, Other [No]
Tablo burada. Şunun gibi karakterleri içerir.

"[¼½¾⅐⅑⅒⅓⅔⅕⅖⅗⅘⅙⅚⅛⅜⅝⅞↉]"

Yazılım Çorbası

6 Şubat 2020 Perşembe

Unicode Düzlemleri (Plane)

Hiç yorum yok:

Yorum Gönder

Blog Arşivi