7 Ekim 2020 Çarşamba

Graph Database

Giriş 
Graph database olarak Neo4j sıkça duyulan bir isim. OrientDB de bir graph database olarak kullanılabiliyor.

Graph Database İçinde Supernodes (Celebrity Nodes)
Açıklaması şöyle
A supernode is a node in a graph dataset with unusually high amounts of incoming or outgoing edges. 
Supernode'lar tıkanma noktaları oluşturduklarından için bunları daha rahat yönetmek için temel iki yöntem var. Bunlar şöyle
Option 1: Splitting Up Supernodes
Option 2: Vertex-Centric Indexes
Splitting Up Supernodes
Sharding işlemine benzer. 

Vertex-Centric Indexes
Supernode'a bağlı vertex'leri teker teker dolaşmak yerine bir indeks kullanarak hızlı erişilebilir hale getirmek

Graph İşlerini Relational Database İle Halletmek
Eğer Graph Database kullanamıyorsak ve elimizde sadece Relational Database varsa, bir graph yapısını taklit etmek mümkün. Bunun için 3 tane yol var. Açıklaması şöyle. PostgreSQL için LTREE Extension kullanılabilir
1. Adjacency list: each element has a reference to its immediate parent; root elements have a null parent. To find all descendants of a particular element, one needs to use "WITH RECURSION"-like clause. However, the clause is not supported in Spring Data JPQL. Other options, like @OneToManyself-reference or @NamedEntityGraphs annotations, either make too many separate SELECT calls or don't work recursively (see this post of Juan Vimberg for details). So this approach alone is not enough for our requirements, but let's keep it in mind.

2. Enumerated path (AKA materialized path): every element keeps the information about all its ancestors. For example, if the node with id=n3 has its parent id=n2 and its grandparent id=n1, then n3 has a string "n1.n2.n3", where "." is a delimiter. To naively find all descendants of a node n3, for example, we need to do a SELECT call with LIKE "n1.n2.n3.%". This is an extremely costly operation because we need to scan the whole table of comments and it takes a long time to compare strings. This approach will do, but we need a faster index. 

3. Nested sets & nested intervals: every node has two numbers - left and right; the numbers make up an interval. All descendants of this node have their intervals within their ancestor's intervals. This data structure allows us to find all descendants very quickly, but makes inserts and deletes extremely slow (see this post for details and this post for benchmarks). So, this approach on its own is not enough for us either, but it gives a valuable clue on how to index our data.
Materialized Path
Açıklaması şöyle. Hiyerarşik veriyi ilişkisel veri tabanında saklamayı sağlar.
Materialized paths allow you to "project" three-dimensional datasets (graph objects) into a fundamentally two-dimensional data structure (tables) without losing data 
Örnek
Elimizde şöyle bir tablo olsun
Name – humanoids
Path – /animals/mammals/primates/humanoid
IsGroup – false
Path alanında tutulan veri hiyerarşik ve şöyle
/animals/mammal/primates/humanoids
/animals/mammal/primates/chimpanzees
/animals/birds/parrots
Açıklaması şöyle
Now, if we need to count all species in our single table database that are of type mammals, we can easily accomplish that using the following SQL:
Şöyle yaparız
select count(*) from species where path like '/animals/mammals/%' and IsGroup=false
Açıklaması şöyle
Flipping the above IsGroup value to true returns all groups of species beneath mammals.
Örnek
Elimizde şöyle bir tablo olsunn
The table has columns:

- name (varchar)
- location_type (int) enum values: (1,2,3)
- ancestry (varchar)

id name ancestry
1 root null
5 node '1'
12 node '1/5'
22 leaf '1/5/12'
Parent nesnesi olanları bulmak için şöyle yaparız
SELECT * FROM geolocations
WHERE EXISTS (
   SELECT 1 FROM geolocations g2
   WHERE g2.ancestry = 
      CONCAT(geolocations.ancestry, '/', geolocations.id)
)

Hiç yorum yok:

Yorum Gönder