Yazılım Çorbası: Elasticsearch

Elasticsearch etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

7 Aralık 2022 Çarşamba

Docker Compose ve Elasticsearch

Giriş

Elastic Search UI için açıklama şöyle

Install this free browser plugin Elasticvue for the access to Elasticsearch with UI. The plugin connects to http://localhost:9200 by default. Otherwise, you will need to configure the connection.

Uygulamamızın log4j2.xml dosyasında şöyle yaparız

<?xml version="1.0" encoding="UTF-8"?>
<Configuration>
  <Properties>
    <Property name="defaultPattern">[%highlight{%-5level}] %d{DEFAULT} %c{1}.%M() 
      - %msg%n%throwable{short.lineNumber}</Property>
  </Properties>
  <Appenders>
    <Socket name="socket" host="${sys:logstash.host.name:-localhost}" 
      port="${sys:logstash.port.number:-9999}" reconnectionDelayMillis="5000">
      <PatternLayout pattern="${defaultPattern}" />
    </Socket>
  </Appenders>
  <Loggers>
    <Root level="info">
      <AppenderRef ref="rollingFile"/>
    </Root>
  </Loggers>
</Configuration>
log4j2.xml

Örnek

Şöyle yaparız

elasticsearch:
    image: elasticsearch:8.7.1
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      discovery.type: single-node
      xpack.security.enabled: false
      ES_JAVA_OPTS: "-Xms1g -Xmx1g"

Örnek - elasticsearch kubernetes

PersistentVolumeClaim için şöyle yaparız

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: elastic-pvc
  namespace: default
  labels:
    app: elastic-pvc
spec:
  storageClassName: nfs-client
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi

Şöyle yaparız

apiVersion: apps/v1
kind: Deployment
metadata:
  name: elastic
  namespace: default
  labels:
    app: elastic
spec:
  selector:
    matchLabels:
      app: elastic
  replicas: 1
  template:
    metadata:
      labels:
        app: elastic
    spec:
      containers:
      - name: elastic
        image: docker.elastic.co/elasticsearch/elasticsearch:7.5.2
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 1000m
            memory: 1024Mi
          limits:
            cpu: 1000m
            memory: 2048Mi
        env:
        - name: discovery.type
          value: "single-node"
        ports:
        - containerPort: 9200
          name: elastic-port
        - containerPort: 9300
          name: elastic-intra
        volumeMounts:
        - name: elastic-data
          mountPath: /usr/share/elasticsearch/data
      volumes:
        - name: elastic-data
          persistentVolumeClaim:
            claimName: elastic-pvc 
      restartPolicy: Always

service için şöyle yaparız

apiVersion: v1
kind: Service
metadata:
  name: elastic-svc
  namespace: default
spec:
  selector:
    app: elastic
  clusterIP: None
  ports:
  - port: 9200
    name: rest
  - port: 9300
    name: intra

Örnek - elasticsearch + logstash + kibana

Şöyle yaparız

version: '3'

services:
  elasticsearch:
    image: elasticsearch:7.10.1
    container_name: elasticsearch
    volumes:
      - ./volumes/es/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      ES_JAVA_OPTS: "-Xmx256m -Xms256m"
  logstash:
    image: logstash:7.10.1
    container_name: logstash
    command: -f /etc/logstash/conf.d/
    volumes:
      - ./volumes/logstash/:/etc/logstash/conf.d/
    ports:
      - "9999:9999"
    environment:
      LS_JAVA_OPTS: "-Xmx256m -Xms256m"
    depends_on:
      - elasticsearch
  kibana:
    image: kibana:7.10.1
    container_name: kibana
    volumes:
      - ./volumes/kibana/:/usr/share/kibana/config/
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch

Örnek

Şöyle yaparız. Burada filebeat sonradan kurulduğu için yok

version: '2.2'

services:

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.9.2
    container_name: elasticsearch
    environment:
      - node.name=elasticsearch
      - discovery.seed_hosts=elasticsearch
      - cluster.initial_master_nodes=elasticsearch
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200

  kibana:
    image: docker.elastic.co/kibana/kibana:7.9.2
    container_name: kibana
    environment:
      ELASTICSEARCH_URL: "http://elasticsearch:9200"
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch

volumes:
  esdata1:
    driver: local

30 Eylül 2022 Cuma

Docker ve Elasticsearch

Örnek

Şöyle yaparız. Böylece https://localhost:9200 adresinden erişebiliriz

docker pull docker.elastic.co/elasticsearch/elasticsearch:8.3.2

docker network create elastic

docker run --name es01 
           --net elastic 
           -p 9200:9200 -p 9300:9300 
            -it docker.elastic.co/elasticsearch/elasticsearch:8.3.2

docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .

Şöyle yaparız

curl --cacert http_ca.crt -u elastic https://localhost:9200

discovery.type Alanı

Örnek

Şöyle yaparız

# Custom network
docker network create sat-elk-net

docker run -d --name sat-elasticsearch \
  --net sat-elk-net \
  -p 9200:9200 \
  -p 9300:9300 \
  -e "discovery.type=single-node" \
  elasticsearch:7.17.4

# ElasticHQ Management Tool
docker run -d --name sat-elastichq \
  --net sat-elk-net \
  -p 5000:5000 \
  elastichq/elasticsearch-hq

Şöyle yaparız

# Disable XPack in Elasticsearch
docker exec -it <container_id> bash
cd /usr/share/elasticsearch/config
echo "xpack.security.enabled: false" >> elasticsearch.yml

xpack.security.enabled Alanı

Açıklaması şöyle

Elasticsearch 8 comes with SSL/TLS enabled by default, I disabled security with the environment variable “xpack.security.enabled=false”. If security remains enabled, configuring the Elasticsearch client will require setting up a proper SSL connection.

Örnek

Şöyle yaparız

docker run -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.8.1

18 Ekim 2021 Pazartesi

Elasticsearch URI API

Giriş

URI kullanarak Elasticsearch'e erişim mümkün. API başlıkları şöyle

Document API
Search API
Indices API
cat API
Ingest API
Cluster API

Search API

Bu arama yöntemi REST + JSON yöntemine göre biraz daha kısıtlı. Açıklaması şöyle

Although the URI search is a simple and efficient way to query your cluster, you’ll quickly find that it doesn’t support all of the features ES offers. The full power of Elasticsearch is evident through Request Body Search. Using Request Body Search allows you to build a complex search request using various elements and query clauses that will match, filter, and order as well as manipulate documents depending on multiple criteria.

Örnek

Şöyle yaparız

“localhost:9200/_search?q=name:john~1 AND (age:[30 TO 40} OR surname:K*) AND -city”

Ingest API

Örnek - Insert

Şöyle yaparız

curl -XPUT -H "Content-Type: application/json" 
  http://localhost:9200/employee/_doc/1000?pretty -d '
{
"name": "Steve",
"age": 23,
"experienceInYears": 1
}'
{
  "_index" : "employee",
  "_type" : "_doc",
  "_id" : "1000",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

Örnek - Update

Şöyle yaparız

curl -XPOST -H "Content-Type: application/json"alo 
http://localhost:9200/employee/_doc/1000/_update?pretty -d '
{
"doc" : {
   "name": "Smith"
  }
}'
{
  "_index" : "employee",
  "_type" : "_doc",
  "_id" : "1000",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

Örnek - Delete

Şöyle yaparız

curl -XDELETE -H "Content-Type: application/json" 
  http://localhost:9200/employee/_doc/1000?pretty

{
  "_index" : "employee",
  "_type" : "_doc",
  "_id" : "1000",
  "_version" : 3,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

Elasticsearch Index

Giriş

Açıklaması şöyle. Namespace gibi düşünülebilir.

You can think of an index in Elasticsearch as a database in the world of relational databases. To add data to Elasticsearch, we need to create an index. In reality, an index is just a logical namespace, data actually divided and stored into many shards. All data-related operations like CRUD perform on shards instead of index, index acts as a representative for hiding complexity.

Örnek

Şöyle yaparız. Burada ismi employee olan iki tane shard'dan oluşan ve her bir shard'ın iki tane replicası olan bir index yaratıyoruz

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/employee?pretty -d '
{
"settings": {
   "index": {
         "number_of_shards": 1,
         "number_of_replicas": 1
         }
      },
   "mappings": {
       "properties": {
         "age": {
               "type": "long"
         },
         "experienceInYears": {
               "type": "long"      
         },
         "name": {
               "type": "text"
         }
     }
   }
 } 
}'

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "employee"
}

11 Şubat 2021 Perşembe

Elasticsearch terms Query

Giriş

Bu sorgular analiz edilmez ve birebir eşleşme aranır. Açıklaması şöyle

terms query
Returns documents that contain one or more exact terms in a provided field.

Açıklaması şöyle

The terms query is somewhat an alternative of SQL "select * from table_name where column_name is in ...

Örnek

Şöyle yaparız

GET /_search
{
  "query" : {
    "terms" : {
      "name" : ["Frigg", "Odin", "Balrd"]
    }
  }
}

Elasticsearch query_string Sorgusu - Full Text Search İçindir

Giriş

Açıklaması şöyle.

Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.

Açıklaması şöyle

The query string is parsed into a series of terms and operators. A term can be a single word — quick or brown — or a phrase, surrounded by double quotes — "quick brown" — which searches for all the words in the phrase, in the same order.

Örnek

Bu örnekte AND|OR gibi boolean işlemler yapılmıyor. Aslında match sorgusu ile benzer şekildedir.

Elimizde şöyle bir sorgu olsun. Sıra (yani order) önemli olmadığı için tüm dokümanları döner.

{
  "query": {
    "query_string": {
      "query": "hello World"
    }
  }
}

Ancak aynı sorgu şöyle olsun. Bu durumda tırnak kullanıldığından sıra önemli olur ve sonuç olarak sadece 1 ve 2. dokümanları alırız. 0. doküman sonuca dahil olmaz.

{
  "query": {
    "query_string": {
      "query": "\"Hello World\""
    }
  }
}

Örnek

Bu örnekte AND|OR gibi boolean işlemler yapılmıyor. Aslında match sorgusu ile benzer şekildedir.

Çoklu alan sorgusu yapmak istersek şöyle yaparız. Burada aynı zamanda wildcard kullanılıyor

"query": {
    "query_string": {
        "query": "*mar*",
        "fields": ["user.name", "user.surname"]
    }
}

Örnek

Elimizde şöyle bir kod olsun

public interface ProductRepository extends ElasticsearchRepository<Product, String> {
 
  List<Product> findByName(String name);
  
  List<Product> findByNameContaining(String name);
 
  List<Product> findByManufacturerAndCategory (String manufacturer, String category);
}

Çıktı olarak şunu alırız. Burada AND işlemi için "must" kullanıldığı görülebilir.

findByName() için
POST /productindex/_search? ..: 
Request body: {.."query":{
  "bool":{
    "must":[
      {"query_string":{"query":"apple","fields":["name^1.0"],..}

findByManufacturerAndCategory() için 
POST /productindex/_search..: 
Request body: {..
  "query":{
    "bool":{
      "must":[
        {"query_string":{"query":"samsung","fields":["manufacturer^1.0"],..}},
        {"query_string":{"query":"laptop","fields":["category^1.0"],..}}],..}},
        "version":true}

Elasticsearch term Query

Giriş

Bu sorgular analiz edilmez ve birebir eşleşme aranır. Açıklaması şöyle

term query matches a single term as it is : the value is not analyzed. So, it doesn't have to be lowercased depending on what you have indexed.

Açıklaması şöyle. Belirtilen term değerine sahip keyword aranır. Eğer birden fazla term kümesi içinde arama yapmak istersek terms sorgusu kullanılır

The term query is somewhat an alternative of SQL "select * from table_name where column_name = "..."

Örnek

Sorgu şöyle olsun

{
  "query": {
    "term" : { "user" : "bennett" }
  }
}

Açıklaması şöyle

If you provided Bennett at index time and the value is not analyzed, the following query won't return anything :

Örnek

term sorgusunda score olmadığı için sonucu sıralamak için şöyle yaparız

GET /_search
{
  "query" : {
    "bool" : {
      "filter" : {
        "term" : {
          "group_city" : "London"
        }
      }
    }
  },
  "sort" : {
    "venue.venue_name": {"order": "asc"}
  }
}

Elasticsearch match Query - Full Text Search İçindir

Giriş

Analiz edilen sorgulardan birisidir. Açıklaması şöyle

match query
The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.

Açıklaması şöyle.

Creates a boolean query that returns results if the search term is present in the field.

Term queries vs Match query

Açıklaması şöyle

Term-level queries are not analyzed. The match queries that work on text fields, on the other hand, are analyzed. The same analyzers used during the indexing process (unless search queries were explicitly defined with different analyzers) process the search words in match queries. If a standard analyzer (default analyzer) is used during the indexing of our document, the search words are analyzed using the same standard analyzer before the search is executed.

Additionally, the standard analyzer applies the same lowercase token filter (remember, the lowercase token filter is applied during the indexing) to the search words. Thus, if you provide the search keywords as uppercased, they are converted to lowercase letters and searched against the inverted index. For example, if we change the titlevalue to use uppercase criteria such as "title”: “JAVA”, for example, and rerun the query, the results are the same as the search query in listing 10.4. If you change the title value to lowercase or in any other way (e.g., java, jaVA, etc.), the query still returns the same results.

Standard analyzer

Açıklaması şöyle. Yani her kelimeyi küçük harfe çevirir.

Additionally, the standard analyzer applies the same lowercase token filter (remember, the lowercase token filter is applied during the indexing) to the search words. Thus, if you provide the search keywords as uppercased, they are converted to lowercase letters and searched against the inverted index. For example, if we change the title value to use uppercase criteria such as "title”: “JAVA”, for example, and rerun the query, the results are the same as the search query in listing 10.4. If you change the title value to lowercase or in any other way (e.g., java, jaVA, etc.), the query still returns the same results.

Söz dizimi

Kısa Form

Söz dizimi şöyle

GET books/_search
{
  "query": {
    "match": { 
      "FIELD": "SEARCH TEXT" 
    }
  }
}

Örnek

Şöyle yaparız

GET books/_search
{
  "query": {
    "match": { 
      "title": "Java" 
    }
  }
}

Uzun Form

Söz dizimi şöyle

GET books/_search
{
  "query": {
    "match": {
      "FIELD": { 
        "query":"<SEARCH TEXT>", 
        "<parameter>":"<MY_PARAM>", 
     }
    }
  }
}

Açıklaması şöyle

As you can see in the snippet, the match query expects the search criteria to be defined in the form of a field value. The field can be any of the text fields present in a document, whose values are to be matched. The value can be a word or multiple words, given either as uppercase, lowercase, or camel case.

Çok Sayıda İndex

Örnek

Şöyle yaparız

GET new_books,classics,top_sellers, crime* /_search
{
  ...
}

Açıklaması şöyle

We can search across multiple indices by providing comma-separated indices in the search URL

As you can see, any number of indices can be provided when invoking the _search endpoint, including wildcards.

Note : If we omit the index (or indices) in the search request, we effectively search the entire index. For example, GET _search{ ... } searches across all the indices in the cluster.

match Query Belirtilen Değerlerden Herhangi Birisi Varsa Eşleşir

match Query Or sorgusu olarak düşünülebilir. Sorgudaki tam kelimelerin herhangi birisinin belirtilen field'da olması durumunda doküman sonuca dahil edilir.

Örnek

Açıklaması şöyle

Keywords: “puerto baham”
It will look for countries that have “puerto” or “baham” in their name, so it will return users from Puerto Rico and Bahamas, which is exactly what want.

Örnek

Elimizde şöyle bir arama olsun

GET books/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Java Complete Guide"
      }
    }
  },
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}

Bu arama aslında şöyle Yani title alanın da Java veya Complete veya Guide geçen tüm kitapları döndürür

GET books/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Java Complete Guide",
        "operator": "OR" 
      }
    }
  }
}

Bunu değiştirmek için şöyle yaparız

GET books/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Java Complete Guide",
        "operator": "AND" 
      }
    }
  }
}

Örnek - minimum_should_matchattribute

Açıklaması şöyle

What if we want to find documents that match at least a few words from the given set of words? In the previous example, suppose we want at least two words out of three to match (say, Java and Guide, for example). This is where the minimum_should_matchattribute comes in handy.

The minimum_should_matchattribute indicates the minimum number of words that should be used to match the documents.

Şöyle yaparız

GET books/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Java Complete Guide",
        "operator": "OR",
        "minimum_should_match": 2 
      }
    }
  }
}

Fuzzy Search

Açıklaması şöyle

Simply put, fuzziness is a mechanism to correct a user’s spelling mistakes in query criteria.

Fuzziness makes character changes to string input so that it is the same as the string that may exist in the index. It employs the Levenshtein distance algorithm to fix incorrect spellings.

A match query also allows us to add a fuzzinessparameter to fix spelling mistakes. We can set it as a numeric value, where the expected values are 0, 1, or 2, meaning none, one, or two character changes (insertions, deletions, modifications), respectively. In addition to setting these values, we also use an AUTO setting; we let the engine deal with the changes by setting AUTOas its fuzziness parameter.

Örnek

Şöyle yaparız

GET books/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Kava",
        "fuzziness": 1 
      }
    }
  }
}

Tüm Alanlara Göre Aramak

Örnek

Şöyle yaparız

{“query”: { “match”: { “_all”: “meaning” } } }

Açıklaması şöyle

...looks for the term “meaning” in all of the fields in all of the documents in your cluster.

Döndürülecek Alanları Belirtmek

Örnek

Şöyle yaparız

{
  “query”: {
    “match”: { “_all”: “meaning” }
  },
  “fields”: [“name”, “surname”, “age”],
  “from”: 100, “size”: 20
}

Açıklaması şöyle

Here, we’re using the “fields” element to restrict which fields should be returned and the “from” and “size” elements to tell Elasticsearch we’re looking for documents 100 to 119 (starting at 100 and counting 20 documents).

Örnek - score

Elimizde şöyle bir sorgu olsun

GET /_search
{
   "query" : {
     "match" : {
       "tweet" : "grow up"
     }
  }
}

Çıktı olarak şunu alırız

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.9790175,
    "hits": [
      {
        "_index": "App3",
        "_type": "tweets",
        "_id": "2",
        "_score": 1.9790175,
        "_source": {
          "name": "Katrina Kaif",
          "age": 22,
          "tweet": "We never really grow up, we only learn how to act in public."
        }
      },
      {
        "_index": "App3",
        "_type": "tweets",
        "_id": "114",
        "_score": 0.30432263,
        "_source": {
          "name": "Ajay Devgn",
          "age": 62,
          "tweet": "Stress is when you wake up screaming and you realize you haven’t fallen asleep yet."
        }
      }
    ]
  }
}

Açıklaması şöyle

L2–8 shows meta information like it took 3ms for the query to return the result and some information about the shards.
L9 onwards we see the actual query results.
L10 We know that there are two matching results to the query.
L11: We see the max relevance _score value as 1.979. This is followed by the two matching objects, the first with a _score value of 1.979 and the second with a _score value of 0.304. The drastic score difference is likely due to the fact that the second tweet doesn’t have “grow up” as a phrase. It only has the word “up”.

11 Kasım 2020 Çarşamba

Elasticsearch

Giriş

Elasticsearch, full text search için tek araç değil.

Couchbase, Splunk ve Apache Solr gibi diğer araçlar da bu yeteneğe sahip.

Write Architecture and Data Updates

Açıklaması şöyle

Elasticsearch prioritizes near-real-time search with document-by-document writes and frequent index refreshes. Data is ingested via REST APIs (e.g., Bulk), tokenized, and indexed, becoming searchable after periodic refreshes (default: 1 second). This ensures rapid log retrieval but incurs high write overhead, with CPU-intensive indexing limiting single-core throughput to ~2 MB/s, often causing bottlenecks during peaks.

Query Language

Açıklaması şöyle

Elasticsearch, however, employs a proprietary JSON-based DSL, distinct from SQL, requiring nested structures for filtering and aggregation. This presents a steep learning curve for new users and complicates integration with traditional BI tools.

Elasticsearch ve Veri tabanını Senkron Tutmak

İki temel yöntem var

1. Periyodik olarak veri tabanını dolaşmak ve güncellemeleri ElasticSearch'e aktarmak

2. Eğer Hibernate kullanıyorsak, Hibernate Search anotasyonlarını projeye eklemek

Bir diğer yöntem ise eğer SpringData ElasticSearch kullanıyorsak, JPA işlemini SpringData ElasticSearch ile de tekrar etmek. Açıklaması şöyle

As far as I understand, Spring-Data-Elasticsearch is focused on accessing Elasticsearch and has no JPA integration whatsoever. That is to say, you can use Spring-Data-JPA, and you can use Spring-Data-Elasticsearch, but they won't communicate with each other. You will have two separate models, which you will update and query separately.

Elasticsearch ve Veri

Açıklaması şöyle. Yani shard'ler yüzünden verinin tamamını tek bir düğümden göremeyiz.

As a distributed database, your data is partitioned into “shards” which are then allocated to one or more servers.

Because of this sharding, a read or write request to an Elasticsearch cluster requires coordinating between multiple nodes as there is no “global view” of your data on a single server. While this makes Elasticsearch highly scalable, it also makes it much more complex to setup and tune than other popular databases like MongoDB or PostgresSQL, which can run on a single server.

Elastic Stack - Log Management

Elastic Stack yazısına taşıdım

Elastic APM

Açıklaması şöyle

Elastic APM is an application performance monitoring tool that is built on top of Elastic Search and Kibana, the E and the K of the ELK stack. Implementing Elastic APM is super easy — all you need to do is add the agent jar to your service and set some basic properties. This is done once per service, and that enables distributed tracing for all requests of that service.

Maven

Şöyle yaparız

<dependency>
  <groupId>co.elastic.apm</groupId>
  <artifactId>apm-agent-attach</artifactId>
  <!--version should be compatible with your elastic instance-->
  <version>${elastic-version}</version>
</dependency>

Açıklaması şöyle

Properties can be set in one of the following ways:
1. elasticapm.properties in classpath
2. Java System properties
3. Environment variables

Docker

Docker ve Elasticsearch yazısına taşıdım

Docker Compose

Docker Compose ve ElasticSearch yazısına taşıdım

Cluster Yapısı

Cluster'da 3 çeşit node vardır. Bunlar Master Node, Master-Eligible Node ve Data Node. Açıklaması şöyle

Data nodes hold data and perform data-related operations such as CRUD, search, and aggregations.

A master node in charge of cluster-wide management and configuration actions such as add/remove nodes, create/update/delete index, … A cluster has only one master node at a time. If a master node fails, Master-Eligible Nodes in the cluster elect a new master node from the master-eligible node pool.

Master-eligible node which can be voted to become a new master node when disaster happens with the master node.

In a cluster with only one node, it’s both master node and data node

Index

Elasticsearch Index yazısına taşıdım

Shard

Açıklaması şöyle.

Simply, the shard is a single instance of Lucene. It stores data and can perform any data-related operations. A shard can be a primary shard or replica shard. Any document in an index belongs to a single primary shard. A replica shard is simply just a copy of a primary shard. It provides redundant copies and helps protect data when problems happen with primary shards. Replica shard also improves read performance, because it can serve read requests like primary shard but you only can perform write requests on the primary shard.

When creating an index, you should specify the number of primary shards. This number is fixed after the index is created, but you can change the number of replica shards by changing index settings.

Field Data Types

Her field'ın bir tipi olmalı. Field tipleri şu başlıklar altında toplanmış

Common Types
Object and Relational Types
Structured data typese
Aggregate data types
Text search types
Document ranking types
Spatial data types
Other types
Arrays
Multi-fields

Text vs Keyword

Field tipleri arasında Text Search Types başlığı altındaki "text" ve Common Types başlığı altındaki "keyword" farkını bilmek lazım. Açıklaması şöyle

A String field can be either mapped to the text or the keyword type of Elasticsearch.

The primary difference between text and a keyword is that a text field will be tokenized while a keyword cannot.

We can use the keyword type when we want to perform filtering or sorting operations on the field.

For instance, let’s assume that we have a String field called body, and let’s say it has the value ‘Hibernate is fun’.

If we choose to treat body as text then we will be able to tokenize it [‘Hibernate’, ‘is’, ‘fun’] and we will be able to perform queries like body: Hibernate.

If we make it a keyword type, a match will only be found if we pass the complete text body: Hibernate is fun (wildcard will work, though: body: Hibernate*).

Açıklaması şöyle

If the field type is Text, Elasticsearch pre-processes raw data with an Analyzer before saving processed data to an Inverted Index (An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.)

Analyzers ve Normalizers

Açıklaması şöyle. Yani Text alan için Analyzer kullanılır, Keyword tipindeki alan için Normalizer kullanılır.

Analyzers and normalizers are text analysis operations that are performed on text and keyword respectively, before indexing them and searching for them.

When an analyzer is applied on text, it first tokenizes the text and then applies one or more filters such as a lowercase filter (which converts all the text to lowercase) or a stop word filter (which removes common English stop words such as ‘is’, ‘an’, ‘the’ etc).

Normalizers are similar to analyzers with the difference that normalizers don’t apply a tokenizer.

On a given field we can either apply an analyzer or a normalizer.

To summarize:

Text Keyword
Is tokenized Can not be tokenized
Is analyzed Can be normalized
Can perform term based search Can only match exact text

URI API

Elasticsearch URI API yazısına taşıdım

The Request Body Search

REST + JSON kullanarak gönderilen sorgulardır

Analiz Edilmeyen Sorgular - Term Level Queries

Açıklaması şöyle

exists query
Returns documents that contain any indexed value for a field.

fuzzy query
Returns documents that contain terms similar to the search term. Elasticsearch measures similarity, or fuzziness, using a Levenshtein edit distance.

ids query
Returns documents based on their document IDs.

prefix query
Returns documents that contain a specific prefix in a provided field.

range query
Returns documents that contain terms within a provided range.

regexp query
Returns documents that contain terms matching a regular expression.

term query
Returns documents that contain an exact term in a provided field.

terms query
Returns documents that contain one or more exact terms in a provided field.

terms_set query
Returns documents that contain a minimum number of exact terms in a provided field. You can define the minimum number of matching terms using a field or script.

type query
Returns documents of the specified type.

wildcard query
Returns documents that contain terms matching a wildcard pattern.

1. Exists Sorgusu

Açıklaması şöyle

Due to the fact that Elasticsearch is schemaless (or not strict scema limitation), it is a fairly common situation when different documents have different fields. As a result, there is a lot of use to know whether a document has any certain field or not.

Örnek

Şöyle yaparız

GET /_search
{
  "query" : {
    "exists" : {
      "field": "<your_field_name>"
    }
  }
}

2. Fuzzy Sorgusu

Açıklaması şöyle. Yazım hatası varsa kullanılabilir. wildcard query ile farkı da açıklamada var.

Fuzzy search gives relevant results even if you have some typos in your query. It gives end-users some flexibility in terms of searching by allowing some degree of error. The threshold of the error to be allowed can be decided by us.

For instance, here we have set edit distance to 2 (default is also 2 by the way) which means Elasticsearch will match all the words with a maximum of 2 differences to the input. e.g., ‘jab’ will match ‘jane’.

While Fuzzy queries allow us to search even when we have misspelled words in your query, wildcard queries allow us to perform pattern-based searches. For instance, a search query with ‘s?ring*’ will match ‘spring’,’string’,’strings’’ etc.

Here ‘*’ indicates zero or more characters and ‘?’ indicates a single character.

Açıklaması şöyle

Fuzzy searching uses the Damerau-Levenshtein Distance to match terms that are similar in spelling. This is great when your data set has misspelled words.

Use the tilde (~) to find similar terms:

blow~
This will return results like “blew,” “brow,” and “glow.”

Use the tilde (~) along with a number to specify the how big the distance between words can be:

john~2
This will match, among other things: “jean,” “johns,” “jhon,” and “horn”

3. Term Sorgusu

term Sorgusu yazısına taşıdım

4. Terms Sorgusu

terms Sorgusu yazısına taşıdım

5. wildcard_query Sorgusu - Tek Field'a Wildcard Sorgu Yapar

Örnek

Şöyle yaparız. Burada eski ElasticSearch kullanılıyor ve sorguda filter görülebilir.

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "should": [
            {"query": {"wildcard": {"user.name": {"value": "*mar*"}}}},
            {"query": {"wildcard": {"user.surname": {"value": "*mar*"}}}}
          ]
        }
      }
    }
  }
}

Analiz Edilen Sorgular - Full Text Queries

Açıklaması şöyle

intervals query
A full text query that allows fine-grained control of the ordering and proximity of matching terms.

match query
The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.

match_bool_prefix query
Creates a bool query that matches each term as a term query, except for the last term, which is matched as a prefix query

match_phrase query
Like the match query but used for matching exact phrases or word proximity matches.

match_phrase_prefix query
Like the match_phrase query, but does a wildcard search on the final word.

multi_match query
The multi-field version of the match query.

common terms query
A more specialized query which gives more preference to uncommon words.

query_string query
Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.

simple_query_string query
A simpler, more robust version of the query_string syntax suitable for exposing directly to users.

match Sorgusu

match Sorgusu yazısına taşıdım.

match_phrase Sorgusu

Phrase kelimesi aynı sırada olduğunu belirtir. Yani tam kelimelerin hepsi aynı sırada varsa bulur. Açıklaması şöyle

match_phrase query will analyze the input if analyzers are defined for the queried field and find documents matching the following criterias :

- all the terms must appear in the field
- they must have the same order as the input value

Örnek

Elimizde şöyle bir indeks olsun

{ "foo":"I just said hello world" }

{ "foo":"Hello world" }

{ "foo":"World Hello" }

Sorgu şöyle olsun. Sonuç olarak sadece 1 ve 2. dokümanları alırız. 0. doküman sonuca dahil olmaz.

{
  "query": {
    "match_phrase": {
      "foo": "Hello World"
    }
  }
}

match_phrase_prefix Sorgusu - search as you type

Tam kelimelerin hepsi varsa + yarım kelimeleri bulur

Örnek

Açıklaması şöyle

Keywords: “puerto r”
It considers “puerto” as exact word that needs to be in the country name, and “r” as prefix for any word after “puerto”. This will match “Puerto Rico”.

multi_match Sorgusu - Çoklu Field İçin Sorgu Yapar

Açıklaması şöyle.

Similar to match, but searches multiple fields.

Örnek ver

common terms query

Açıklaması şöyle.

A more specialized query which gives more preference to uncommon words.

Örnek ver

query_string Sorgusu - Çoklu Field İçin Sorgu Yapar ve AND, OR Gibi Kriterleri Destekler

query_string Sorgusu yazısına taşıdım

simple_query_string Sorgusu

Açıklaması şöyle.

A simpler, more robust version of the query_string syntax suitable for exposing directly to users.

Örnek ver

Bileşik Sorgular - Compound Query

Açıklaması şöyle

bool query
The default query for combining multiple leaf or compound query clauses, as must, should, must_not, or filter clauses. The must and should clauses have their scores combined — the more matching clauses, the better — while the must_not and filter clauses are executed in filter context.

boosting query
Return documents which match a positive query, but reduce the score of documents which also match a negative query.

constant_score query
A query which wraps another query, but executes it in filter context. All matching documents are given the same “constant” _score.

dis_max query
A query which accepts multiple queries, and returns any documents which match any of the query clauses. While the bool query combines the scores from all matching queries, the dis_max query uses the score of the single best- matching query clause.

function_score query
Modify the scores returned by the main query with functions to take into account factors like popularity, recency, distance, or custom algorithms implemented with scripting.

bool query ile kullanılan Sorgular

Açıklaması şöyle

must
All queries within this clause must match a document in order for ES to return it. Think of this as your AND queries. The query that we used here is the fuzzy query, and it will match any documents that have a name field that matches “john” in a fuzzy way. The extra “fuzziness” parameter tells Elasticsearch that it should be using a Damerau-Levenshtein Distance of 2 two determine the fuzziness.

must_not
Any documents that match the query within this clause will be outside of the result set. This is the NOT or minus (-) operator of the query DSL. In this case, we do a simple match query, looking for documents that contain the term “city.” Using _all as the field name indicates that the term can appear in any of the document’s fields. This is the must_not clause, so matching documents will be excluded.

should
Up until now, we have been dealing with absolutes: must and must_not. Should is not absolute and is equivalent to the OR operator. Elasticsearch will return any documents that match one or more of the queries in the should clause. The first query that we provided looks for documents where the age field is between 30 and 40. The second query does a wildcard search on the surname field, looking for values that start with “K.”

The query contained three different clauses, so Elasticsearch will only return documents that match the criteria in all of them. These queries can be nested, so you can build up very complex queries by specifying a bool query as a must, must_not, should or filter query.