Yazılım Çorbası: Graph Database

Giriş

Graph database olarak Neo4j sıkça duyulan bir isim. OrientDB de bir graph database olarak kullanılabiliyor.

Graph Database İçinde Supernodes (Celebrity Nodes)

Açıklaması şöyle

A supernode is a node in a graph dataset with unusually high amounts of incoming or outgoing edges.

Supernode'lar tıkanma noktaları oluşturduklarından için bunları daha rahat yönetmek için temel iki yöntem var. Bunlar şöyle

Option 1: Splitting Up Supernodes
Option 2: Vertex-Centric Indexes

Splitting Up Supernodes

Sharding işlemine benzer.

Vertex-Centric Indexes

Supernode'a bağlı vertex'leri teker teker dolaşmak yerine bir indeks kullanarak hızlı erişilebilir hale getirmek

Graph İşlerini Relational Database İle Halletmek

Eğer Graph Database kullanamıyorsak ve elimizde sadece Relational Database varsa, bir graph yapısını taklit etmek mümkün. Bunun için 3 tane yol var. Açıklaması şöyle. PostgreSQL için LTREE Extension kullanılabilir

1. Adjacency list: each element has a reference to its immediate parent; root elements have a null parent. To find all descendants of a particular element, one needs to use "WITH RECURSION"-like clause. However, the clause is not supported in Spring Data JPQL. Other options, like @OneToManyself-reference or @NamedEntityGraphs annotations, either make too many separate SELECT calls or don't work recursively (see this post of Juan Vimberg for details). So this approach alone is not enough for our requirements, but let's keep it in mind.

2. Enumerated path (AKA materialized path): every element keeps the information about all its ancestors. For example, if the node with id=n3 has its parent id=n2 and its grandparent id=n1, then n3 has a string "n1.n2.n3", where "." is a delimiter. To naively find all descendants of a node n3, for example, we need to do a SELECT call with LIKE "n1.n2.n3.%". This is an extremely costly operation because we need to scan the whole table of comments and it takes a long time to compare strings. This approach will do, but we need a faster index.

3. Nested sets & nested intervals: every node has two numbers - left and right; the numbers make up an interval. All descendants of this node have their intervals within their ancestor's intervals. This data structure allows us to find all descendants very quickly, but makes inserts and deletes extremely slow (see this post for details and this post for benchmarks). So, this approach on its own is not enough for us either, but it gives a valuable clue on how to index our data.

Materialized Path

Açıklaması şöyle. Hiyerarşik veriyi ilişkisel veri tabanında saklamayı sağlar.

Materialized paths allow you to "project" three-dimensional datasets (graph objects) into a fundamentally two-dimensional data structure (tables) without losing data

Örnek

Elimizde şöyle bir tablo olsun

Name – humanoids
Path – /animals/mammals/primates/humanoid
IsGroup – false

Path alanında tutulan veri hiyerarşik ve şöyle

/animals/mammal/primates/humanoids
/animals/mammal/primates/chimpanzees
/animals/birds/parrots

Açıklaması şöyle

Now, if we need to count all species in our single table database that are of type mammals, we can easily accomplish that using the following SQL:

Şöyle yaparız

select count(*) from species where path like '/animals/mammals/%' and IsGroup=false

Açıklaması şöyle

Flipping the above IsGroup value to true returns all groups of species beneath mammals.

Örnek

Elimizde şöyle bir tablo olsunn

The table has columns:

- name (varchar)
- location_type (int) enum values: (1,2,3)
- ancestry (varchar)

id name ancestry
1 root null
5 node '1'
12 node '1/5'
22 leaf '1/5/12'

Parent nesnesi olanları bulmak için şöyle yaparız

SELECT * FROM geolocations
WHERE EXISTS (
   SELECT 1 FROM geolocations g2
   WHERE g2.ancestry = 
      CONCAT(geolocations.ancestry, '/', geolocations.id)
)

Yazılım Çorbası

7 Ekim 2020 Çarşamba

Graph Database

Hiç yorum yok:

Yorum Gönder

Blog Arşivi