Elasticsearch etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster
Elasticsearch etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

7 Aralık 2022 Çarşamba

Docker Compose ve Elasticsearch

Giriş
Elastic Search UI için açıklama şöyle
Install this free browser plugin Elasticvue for the access to Elasticsearch with UI. The plugin connects to http://localhost:9200 by default. Otherwise, you will need to configure the connection.

Uygulamamızın log4j2.xml dosyasında şöyle yaparız
<?xml version="1.0" encoding="UTF-8"?>
<Configuration>
  <Properties>
    <Property name="defaultPattern">[%highlight{%-5level}] %d{DEFAULT} %c{1}.%M() 
      - %msg%n%throwable{short.lineNumber}</Property>
  </Properties>
  <Appenders>
    <Socket name="socket" host="${sys:logstash.host.name:-localhost}" 
      port="${sys:logstash.port.number:-9999}" reconnectionDelayMillis="5000">
      <PatternLayout pattern="${defaultPattern}" />
    </Socket>
  </Appenders>
  <Loggers>
    <Root level="info">
      <AppenderRef ref="rollingFile"/>
    </Root>
  </Loggers>
</Configuration>
log4j2.xml
Örnek
Şöyle yaparız
elasticsearch:
    image: elasticsearch:8.7.1
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      discovery.type: single-node
      xpack.security.enabled: false
      ES_JAVA_OPTS: "-Xms1g -Xmx1g"

Örnek - elasticsearch kubernetes
PersistentVolumeClaim için şöyle yaparız
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: elastic-pvc
  namespace: default
  labels:
    app: elastic-pvc
spec:
  storageClassName: nfs-client
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
Şöyle yaparız
apiVersion: apps/v1
kind: Deployment
metadata:
  name: elastic
  namespace: default
  labels:
    app: elastic
spec:
  selector:
    matchLabels:
      app: elastic
  replicas: 1
  template:
    metadata:
      labels:
        app: elastic
    spec:
      containers:
      - name: elastic
        image: docker.elastic.co/elasticsearch/elasticsearch:7.5.2
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 1000m
            memory: 1024Mi
          limits:
            cpu: 1000m
            memory: 2048Mi
        env:
        - name: discovery.type
          value: "single-node"
        ports:
        - containerPort: 9200
          name: elastic-port
        - containerPort: 9300
          name: elastic-intra
        volumeMounts:
        - name: elastic-data
          mountPath: /usr/share/elasticsearch/data
      volumes:
        - name: elastic-data
          persistentVolumeClaim:
            claimName: elastic-pvc 
      restartPolicy: Always
service için şöyle yaparız
apiVersion: v1
kind: Service
metadata:
  name: elastic-svc
  namespace: default
spec:
  selector:
    app: elastic
  clusterIP: None
  ports:
  - port: 9200
    name: rest
  - port: 9300
    name: intra
Örnek - elasticsearch + logstash + kibana
Şöyle yaparız
version: '3'

services:
  elasticsearch:
    image: elasticsearch:7.10.1
    container_name: elasticsearch
    volumes:
      - ./volumes/es/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      ES_JAVA_OPTS: "-Xmx256m -Xms256m"
  logstash:
    image: logstash:7.10.1
    container_name: logstash
    command: -f /etc/logstash/conf.d/
    volumes:
      - ./volumes/logstash/:/etc/logstash/conf.d/
    ports:
      - "9999:9999"
    environment:
      LS_JAVA_OPTS: "-Xmx256m -Xms256m"
    depends_on:
      - elasticsearch
  kibana:
    image: kibana:7.10.1
    container_name: kibana
    volumes:
      - ./volumes/kibana/:/usr/share/kibana/config/
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch
Örnek
Şöyle yaparız. Burada filebeat sonradan kurulduğu için yok
version: '2.2'

services:

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.9.2
    container_name: elasticsearch
    environment:
      - node.name=elasticsearch
      - discovery.seed_hosts=elasticsearch
      - cluster.initial_master_nodes=elasticsearch
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200

  kibana:
    image: docker.elastic.co/kibana/kibana:7.9.2
    container_name: kibana
    environment:
      ELASTICSEARCH_URL: "http://elasticsearch:9200"
    ports:
      - 5601:5601
    depends_on:
      - elasticsearch

volumes:
  esdata1:
    driver: local


30 Eylül 2022 Cuma

Docker ve Elasticsearch

Örnek
Şöyle yaparız. Böylece https://localhost:9200 adresinden erişebiliriz
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.3.2

docker network create elastic

docker run --name es01 
           --net elastic 
           -p 9200:9200 -p 9300:9300 
            -it docker.elastic.co/elasticsearch/elasticsearch:8.3.2

docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .
Şöyle yaparız
curl --cacert http_ca.crt -u elastic https://localhost:9200
discovery.type Alanı

Örnek
Şöyle yaparız
# Custom network
docker network create sat-elk-net

docker run -d --name sat-elasticsearch \
  --net sat-elk-net \
  -p 9200:9200 \
  -p 9300:9300 \
  -e "discovery.type=single-node" \
  elasticsearch:7.17.4

# ElasticHQ Management Tool
docker run -d --name sat-elastichq \
  --net sat-elk-net \
  -p 5000:5000 \
  elastichq/elasticsearch-hq
Şöyle yaparız
# Disable XPack in Elasticsearch
docker exec -it <container_id> bash
cd /usr/share/elasticsearch/config
echo "xpack.security.enabled: false" >> elasticsearch.yml
xpack.security.enabled Alanı
Açıklaması şöyle
Elasticsearch 8 comes with SSL/TLS enabled by default, I disabled security with the environment variable “xpack.security.enabled=false”. If security remains enabled, configuring the Elasticsearch client will require setting up a proper SSL connection.
Örnek
Şöyle yaparız
docker run -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.8.1




18 Ekim 2021 Pazartesi

Elasticsearch URI API

Giriş
URI kullanarak Elasticsearch'e erişim mümkün. API başlıkları şöyle
Document API
Search API
Indices API
cat API
Ingest API
Cluster API
Search API
Bu arama yöntemi REST + JSON yöntemine göre biraz daha kısıtlı. Açıklaması şöyle
Although the URI search is a simple and efficient way to query your cluster, you’ll quickly find that it doesn’t support all of the features ES offers. The full power of Elasticsearch is evident through Request Body Search. Using Request Body Search allows you to build a complex search request using various elements and query clauses that will match, filter, and order as well as manipulate documents depending on multiple criteria.
Örnek
Şöyle yaparız
“localhost:9200/_search?q=name:john~1 AND (age:[30 TO 40} OR surname:K*) AND -city”
Ingest API
Örnek - Insert
Şöyle yaparız
curl -XPUT -H "Content-Type: application/json" 
  http://localhost:9200/employee/_doc/1000?pretty -d '
{
"name": "Steve",
"age": 23,
"experienceInYears": 1
}'
{
  "_index" : "employee",
  "_type" : "_doc",
  "_id" : "1000",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
Örnek - Update
Şöyle yaparız
curl -XPOST -H "Content-Type: application/json"alo 
http://localhost:9200/employee/_doc/1000/_update?pretty -d '
{
"doc" : {
   "name": "Smith"
  }
}'
{
  "_index" : "employee",
  "_type" : "_doc",
  "_id" : "1000",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}
Örnek - Delete
Şöyle yaparız
curl -XDELETE -H "Content-Type: application/json" 
  http://localhost:9200/employee/_doc/1000?pretty

{
  "_index" : "employee",
  "_type" : "_doc",
  "_id" : "1000",
  "_version" : 3,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

Elasticsearch Index

Giriş
Açıklaması şöyle. Namespace gibi düşünülebilir.
You can think of an index in Elasticsearch as a database in the world of relational databases. To add data to Elasticsearch, we need to create an index. In reality, an index is just a logical namespace, data actually divided and stored into many shards. All data-related operations like CRUD perform on shards instead of index, index acts as a representative for hiding complexity.
Örnek
Şöyle yaparız. Burada ismi employee olan iki tane shard'dan oluşan ve her bir shard'ın iki tane replicası olan bir index yaratıyoruz
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/employee?pretty -d '
{
"settings": {
   "index": {
         "number_of_shards": 1,
         "number_of_replicas": 1
         }
      },
   "mappings": {
       "properties": {
         "age": {
               "type": "long"
         },
         "experienceInYears": {
               "type": "long"      
         },
         "name": {
               "type": "text"
         }
     }
   }
 } 
}'

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "employee"
}

11 Şubat 2021 Perşembe

Elasticsearch terms Query

Giriş
Bu sorgular analiz edilmez ve birebir eşleşme aranır. Açıklaması şöyle
terms query
Returns documents that contain one or more exact terms in a provided field.
Açıklaması şöyle
The terms query is somewhat an alternative of  SQL "select * from table_name where column_name is in ...
Örnek
Şöyle yaparız
GET /_search
{
  "query" : {
    "terms" : {
      "name" : ["Frigg", "Odin", "Balrd"]
    }
}
}

Elasticsearch query_string Sorgusu - Full Text Search İçindir

Giriş
Açıklaması şöyle.
Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.
Açıklaması şöyle
The query string is parsed into a series of terms and operators. A term can be a single word — quick or brown — or a phrase, surrounded by double quotes — "quick brown" — which searches for all the words in the phrase, in the same order.
Örnek
Bu örnekte AND|OR gibi boolean işlemler yapılmıyor. Aslında match sorgusu ile benzer şekildedir. 
Elimizde şöyle bir sorgu olsun. Sıra (yani order) önemli olmadığı için tüm dokümanları döner.
{
  "query": {
    "query_string": {
      "query": "hello World"
    }
  }
}
Ancak aynı sorgu şöyle olsun. Bu durumda tırnak kullanıldığından sıra önemli olur ve sonuç olarak sadece 1 ve 2. dokümanları alırız. 0. doküman sonuca dahil olmaz.
{
  "query": {
    "query_string": {
      "query": "\"Hello World\""
} } }
Örnek
Bu örnekte AND|OR gibi boolean işlemler yapılmıyor. Aslında match sorgusu ile benzer şekildedir.
Çoklu alan sorgusu yapmak istersek şöyle yaparız. Burada aynı zamanda wildcard kullanılıyor
"query": {
    "query_string": {
        "query": "*mar*",
        "fields": ["user.name", "user.surname"]
    }
}
Örnek
Elimizde şöyle bir kod olsun
public interface ProductRepository extends ElasticsearchRepository<Product, String> {
 
  List<Product> findByName(String name);
  
  List<Product> findByNameContaining(String name);
 
  List<Product> findByManufacturerAndCategory (String manufacturer, String category);
}
Çıktı olarak şunu alırız. Burada AND işlemi için "must" kullanıldığı görülebilir.
findByName() için
POST /productindex/_search? ..: 
Request body: {.."query":{
  "bool":{
    "must":[
      {"query_string":{"query":"apple","fields":["name^1.0"],..}

findByManufacturerAndCategory() için 
POST /productindex/_search..: 
Request body: {..
  "query":{
    "bool":{
      "must":[
        {"query_string":{"query":"samsung","fields":["manufacturer^1.0"],..}},
        {"query_string":{"query":"laptop","fields":["category^1.0"],..}}],..}},
        "version":true}

Elasticsearch term Query

Giriş
Bu sorgular analiz edilmez ve birebir eşleşme aranır. Açıklaması şöyle
term query matches a single term as it is : the value is not analyzed. So, it doesn't have to be lowercased depending on what you have indexed.
Açıklaması şöyle. Belirtilen term değerine sahip keyword aranır. Eğer birden fazla term kümesi içinde arama yapmak istersek terms sorgusu kullanılır
The term query is somewhat an alternative of  SQL "select * from table_name where column_name = "..."
Örnek
Sorgu şöyle olsun
{
  "query": {
    "term" : { "user" : "bennett" }
  }
}
Açıklaması şöyle
If you provided Bennett at index time and the value is not analyzed, the following query won't return anything :
Örnek
term sorgusunda score olmadığı için sonucu sıralamak için şöyle yaparız
GET /_search
{
  "query" : {
    "bool" : {
      "filter" : {
        "term" : {
          "group_city" : "London"
        }
      }
    }
  },
  "sort" : {
    "venue.venue_name": {"order": "asc"}
  }
}

Elasticsearch match Query - Full Text Search İçindir

Giriş
Analiz edilen sorgulardan birisidir. Açıklaması şöyle
match query
The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.
Açıklaması şöyle.
Creates a boolean query that returns results if the search term is present in the field.
Term queries vs Match query
Açıklaması şöyle
Term-level queries are not analyzed. The match queries that work on text fields, on the other hand, are analyzed. The same analyzers used during the indexing process (unless search queries were explicitly defined with different analyzers) process the search words in match queries. If a standard analyzer (default analyzer) is used during the indexing of our document, the search words are analyzed using the same standard analyzer before the search is executed.

Additionally, the standard analyzer applies the same lowercase token filter (remember, the lowercase token filter is applied during the indexing) to the search words. Thus, if you provide the search keywords as uppercased, they are converted to lowercase letters and searched against the inverted index. For example, if we change the titlevalue to use uppercase criteria such as "title”: “JAVA”, for example, and rerun the query, the results are the same as the search query in listing 10.4. If you change the title value to lowercase or in any other way (e.g., java, jaVA, etc.), the query still returns the same results.
Standard analyzer
Açıklaması şöyle. Yani her kelimeyi küçük harfe çevirir.
Additionally, the standard analyzer applies the same lowercase token filter (remember, the lowercase token filter is applied during the indexing) to the search words. Thus, if you provide the search keywords as uppercased, they are converted to lowercase letters and searched against the inverted index. For example, if we change the title value to use uppercase criteria such as "title”: “JAVA”, for example, and rerun the query, the results are the same as the search query in listing 10.4. If you change the title value to lowercase or in any other way (e.g., java, jaVA, etc.), the query still returns the same results.
Söz dizimi
Kısa Form
Söz dizimi şöyle
GET books/_search
{
  "query": {
    "match": { 
      "FIELD": "SEARCH TEXT" 
    }
  }
}
Örnek
Şöyle yaparız
GET books/_search
{
  "query": {
    "match": { 
      "title": "Java" 
    }
  }
}
Uzun Form
Söz dizimi şöyle
GET books/_search
{
  "query": {
    "match": {
      "FIELD": { 
        "query":"<SEARCH TEXT>", 
        "<parameter>":"<MY_PARAM>", 
     }
    }
  }
}
Açıklaması şöyle
As you can see in the snippet, the match query expects the search criteria to be defined in the form of a field value. The field can be any of the text fields present in a document, whose values are to be matched. The value can be a word or multiple words, given either as uppercase, lowercase, or camel case.
Çok Sayıda İndex
Örnek
Şöyle yaparız
GET new_books,classics,top_sellers, crime* /_search
{
  ...
}
Açıklaması şöyle
We can search across multiple indices by providing comma-separated indices in the search URL

As you can see, any number of indices can be provided when invoking the _search endpoint, including wildcards.

Note : If we omit the index (or indices) in the search request, we effectively search the entire index. For example, GET _search{ ... } searches across all the indices in the cluster.
match Query Belirtilen Değerlerden Herhangi Birisi Varsa Eşleşir
match Query Or sorgusu olarak düşünülebilir. Sorgudaki tam kelimelerin herhangi birisinin belirtilen field'da olması durumunda doküman sonuca dahil edilir.

Örnek
Açıklaması şöyle
Keywords: “puerto baham”
It will look for countries that have “puerto” or “baham” in their name, so it will return users from Puerto Rico and Bahamas, which is exactly what want.
Örnek
Elimizde şöyle bir arama olsun
GET books/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Java Complete Guide"
      }
    }
  },
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}
Bu arama aslında şöyle Yani title alanın da Java veya Complete veya Guide geçen tüm kitapları döndürür
GET books/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Java Complete Guide",
        "operator": "OR" 
      }
    }
  }
}
Bunu değiştirmek için şöyle yaparız
GET books/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Java Complete Guide",
        "operator": "AND" 
      }
    }
  }
}
Örnek - minimum_should_matchattribute 
Açıklaması şöyle
What if we want to find documents that match at least a few words from the given set of words? In the previous example, suppose we want at least two words out of three to match (say, Java and Guide, for example). This is where the minimum_should_matchattribute comes in handy.

The minimum_should_matchattribute indicates the minimum number of words that should be used to match the documents.
Şöyle yaparız
GET books/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Java Complete Guide",
        "operator": "OR",
        "minimum_should_match": 2 
      }
    }
  }
}
Fuzzy Search
Açıklaması şöyle
Simply put, fuzziness is a mechanism to correct a user’s spelling mistakes in query criteria.

Fuzziness makes character changes to string input so that it is the same as the string that may exist in the index. It employs the Levenshtein distance algorithm to fix incorrect spellings.

A match query also allows us to add a fuzzinessparameter to fix spelling mistakes. We can set it as a numeric value, where the expected values are 0, 1, or 2, meaning none, one, or two character changes (insertions, deletions, modifications), respectively. In addition to setting these values, we also use an AUTO setting; we let the engine deal with the changes by setting AUTOas its fuzziness parameter.
Örnek
Şöyle yaparız
GET books/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Kava",
        "fuzziness": 1 
      }
    }
  }
}
Tüm Alanlara Göre Aramak
Örnek
Şöyle yaparız
{“query”: { “match”: { “_all”: “meaning” } } }
Açıklaması şöyle
...looks for the term “meaning” in all of the fields in all of the documents in your cluster.
Döndürülecek Alanları Belirtmek
Örnek
Şöyle yaparız
{
  “query”: {
    “match”: { “_all”: “meaning” }
  },
  “fields”: [“name”, “surname”, “age”],
  “from”: 100, “size”: 20
}
Açıklaması şöyle
Here, we’re using the “fields” element to restrict which fields should be returned and the “from” and “size” elements to tell Elasticsearch we’re looking for documents 100 to 119 (starting at 100 and counting 20 documents).
Örnek - score
Elimizde şöyle bir sorgu olsun
GET /_search
{
   "query" : {
     "match" : {
       "tweet" : "grow up"
     }
  }
}
Çıktı olarak şunu alırız
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.9790175,
    "hits": [
      {
        "_index": "App3",
        "_type": "tweets",
        "_id": "2",
        "_score": 1.9790175,
        "_source": {
          "name": "Katrina Kaif",
          "age": 22,
          "tweet": "We never really grow up, we only learn how to act in public."
        }
      },
      {
        "_index": "App3",
        "_type": "tweets",
        "_id": "114",
        "_score": 0.30432263,
        "_source": {
          "name": "Ajay Devgn",
          "age": 62,
          "tweet": "Stress is when you wake up screaming and you realize you haven’t fallen asleep yet."
        }
      }
    ]
  }
}
Açıklaması şöyle
L2–8 shows meta information like it took 3ms for the query to return the result and some information about the shards.
L9 onwards we see the actual query results.
L10 We know that there are two matching results to the query.
L11: We see the max relevance _score value as 1.979. This is followed by the two matching objects, the first with a _score value of 1.979 and the second with a _score value of 0.304. The drastic score difference is likely due to the fact that the second tweet doesn’t have “grow up” as a phrase. It only has the word “up”.

11 Kasım 2020 Çarşamba

Elasticsearch

Giriş
Elasticsearch, full text search için tek araç değil.
Couchbase, Splunk ve Apache Solr gibi diğer araçlar da bu yeteneğe sahip.

Write Architecture and Data Updates
Açıklaması şöyle
Elasticsearch prioritizes near-real-time search with document-by-document writes and frequent index refreshes. Data is ingested via REST APIs (e.g., Bulk), tokenized, and indexed, becoming searchable after periodic refreshes (default: 1 second). This ensures rapid log retrieval but incurs high write overhead, with CPU-intensive indexing limiting single-core throughput to ~2 MB/s, often causing bottlenecks during peaks.
Query Language
Açıklaması şöyle
Elasticsearch, however, employs a proprietary JSON-based DSL, distinct from SQL, requiring nested structures for filtering and aggregation. This presents a steep learning curve for new users and complicates integration with traditional BI tools.
Elasticsearch ve Veri tabanını Senkron Tutmak
İki temel yöntem var
1. Periyodik olarak veri tabanını dolaşmak ve güncellemeleri ElasticSearch'e aktarmak
2. Eğer Hibernate kullanıyorsak, Hibernate Search anotasyonlarını projeye eklemek

Bir diğer yöntem ise eğer SpringData ElasticSearch kullanıyorsak, JPA işlemini SpringData ElasticSearch ile de tekrar etmek. Açıklaması şöyle
As far as I understand, Spring-Data-Elasticsearch is focused on accessing Elasticsearch and has no JPA integration whatsoever. That is to say, you can use Spring-Data-JPA, and you can use Spring-Data-Elasticsearch, but they won't communicate with each other. You will have two separate models, which you will update and query separately.
Elasticsearch ve Veri
Açıklaması şöyle. Yani shard'ler yüzünden verinin tamamını tek bir düğümden göremeyiz.
As a distributed database, your data is partitioned into “shards” which are then allocated to one or more servers.

Because of this sharding, a read or write request to an Elasticsearch cluster requires coordinating between multiple nodes as there is no “global view” of your data on a single server. While this makes Elasticsearch highly scalable, it also makes it much more complex to setup and tune than other popular databases like MongoDB or PostgresSQL, which can run on a single server.

Elastic Stack - Log Management
Elastic Stack yazısına taşıdım

Elastic APM
Açıklaması şöyle
Elastic APM is an application performance monitoring tool that is built on top of Elastic Search and Kibana, the E and the K of the ELK stack. Implementing Elastic APM is super easy — all you need to do is add the agent jar to your service and set some basic properties. This is done once per service, and that enables distributed tracing for all requests of that service.
Maven
Şöyle yaparız
<dependency> <groupId>co.elastic.apm</groupId> <artifactId>apm-agent-attach</artifactId> <!--version should be compatible with your elastic instance--> <version>${elastic-version}</version> </dependency>
Açıklaması şöyle
Properties can be set in one of the following ways:
1. elasticapm.properties in classpath
2. Java System properties
3. Environment variables
Docker
Docker ve Elasticsearch yazısına taşıdım

Docker Compose
Docker Compose ve ElasticSearch yazısına taşıdım

Cluster Yapısı
Cluster'da 3 çeşit node vardır. Bunlar Master Node, Master-Eligible Node ve Data Node. Açıklaması şöyle
Data nodes hold data and perform data-related operations such as CRUD, search, and aggregations.

A master node in charge of cluster-wide management and configuration actions such as add/remove nodes, create/update/delete index, … A cluster has only one master node at a time. If a master node fails, Master-Eligible Nodes in the cluster elect a new master node from the master-eligible node pool.

Master-eligible node which can be voted to become a new master node when disaster happens with the master node.

In a cluster with only one node, it’s both master node and data node
Index
Elasticsearch Index yazısına taşıdım

Shard
Açıklaması şöyle.
Simply, the shard is a single instance of Lucene. It stores data and can perform any data-related operations. A shard can be a primary shard or replica shard. Any document in an index belongs to a single primary shard. A replica shard is simply just a copy of a primary shard. It provides redundant copies and helps protect data when problems happen with primary shards. Replica shard also improves read performance, because it can serve read requests like primary shard but you only can perform write requests on the primary shard.

When creating an index, you should specify the number of primary shards. This number is fixed after the index is created, but you can change the number of replica shards by changing index settings.
Field Data Types
Her field'ın bir tipi olmalı. Field tipleri şu başlıklar altında toplanmış
Common Types
Object and Relational Types
Structured data typese
Aggregate data types
Text search types
Document ranking types
Spatial data types
Other types
Arrays
Multi-fields
Text vs Keyword
Field tipleri arasında Text Search Types başlığı altındaki "text" ve Common Types başlığı altındaki "keyword" farkını bilmek lazım. Açıklaması şöyle
A String field can be either mapped to the text or the keyword type of Elasticsearch.

The primary difference between text and a keyword is that a text field will be tokenized while a keyword cannot.

We can use the keyword type when we want to perform filtering or sorting operations on the field.

For instance, let’s assume that we have a String field called body, and let’s say it has the value ‘Hibernate is fun’.

If we choose to treat body as text then we will be able to tokenize it [‘Hibernate’, ‘is’, ‘fun’] and we will be able to perform queries like body: Hibernate.

If we make it a keyword type, a match will only be found if we pass the complete text body: Hibernate is fun (wildcard will work, though: body: Hibernate*).
Açıklaması şöyle
If the field type is Text, Elasticsearch pre-processes raw data with an Analyzer before saving processed data to an Inverted Index (An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.)
Analyzers ve Normalizers
Açıklaması şöyle. Yani Text alan için Analyzer kullanılır, Keyword tipindeki alan için Normalizer kullanılır.
Analyzers and normalizers are text analysis operations that are performed on text and keyword respectively, before indexing them and searching for them.

When an analyzer is applied on text, it first tokenizes the text and then applies one or more filters such as a lowercase filter (which converts all the text to lowercase) or a stop word filter (which removes common English stop words such as ‘is’, ‘an’, ‘the’ etc).

Normalizers are similar to analyzers with the difference that normalizers don’t apply a tokenizer.

On a given field we can either apply an analyzer or a normalizer.

To summarize:

Text Keyword
Is tokenized Can not be tokenized
Is analyzed Can be normalized
Can perform term based search Can only match exact text

URI API
Elasticsearch URI API yazısına taşıdım

The Request Body Search
REST + JSON kullanarak gönderilen sorgulardır

Analiz Edilmeyen Sorgular - Term Level Queries
Açıklaması şöyle
exists query
Returns documents that contain any indexed value for a field.

fuzzy query
Returns documents that contain terms similar to the search term. Elasticsearch measures similarity, or fuzziness, using a Levenshtein edit distance.

ids query
Returns documents based on their document IDs.

prefix query
Returns documents that contain a specific prefix in a provided field.

range query
Returns documents that contain terms within a provided range.

regexp query
Returns documents that contain terms matching a regular expression.

term query
Returns documents that contain an exact term in a provided field.

terms query
Returns documents that contain one or more exact terms in a provided field.

terms_set query
Returns documents that contain a minimum number of exact terms in a provided field. You can define the minimum number of matching terms using a field or script.

type query
Returns documents of the specified type.

wildcard query
Returns documents that contain terms matching a wildcard pattern.
1. Exists Sorgusu
Açıklaması şöyle
Due to the fact that Elasticsearch is schemaless (or not strict scema limitation), it is a fairly common situation when different documents have different fields. As a result, there is a lot of use to know whether a document has any certain field or not.
Örnek
Şöyle yaparız
GET /_search
{
  "query" : {
    "exists" : {
      "field": "<your_field_name>"
    }
  }
}
2. Fuzzy Sorgusu
Açıklaması şöyle. Yazım hatası varsa kullanılabilir. wildcard query ile farkı da açıklamada var.
Fuzzy search gives relevant results even if you have some typos in your query. It gives end-users some flexibility in terms of searching by allowing some degree of error. The threshold of the error to be allowed can be decided by us.

For instance, here we have set edit distance to 2 (default is also 2 by the way) which means Elasticsearch will match all the words with a maximum of 2 differences to the input. e.g., ‘jab’ will match ‘jane’.

While Fuzzy queries allow us to search even when we have misspelled words in your query, wildcard queries allow us to perform pattern-based searches. For instance, a search query with ‘s?ring*’ will match ‘spring’,’string’,’strings’’ etc.

Here ‘*’ indicates zero or more characters and ‘?’ indicates a single character.
Açıklaması şöyle
Fuzzy searching uses the Damerau-Levenshtein Distance to match terms that are similar in spelling. This is great when your data set has misspelled words.

Use the tilde (~) to find similar terms:

  blow~
This will return results like “blew,” “brow,” and “glow.”

Use the tilde (~) along with a number to specify the how big the distance between words can be:

  john~2
This will match, among other things: “jean,” “johns,” “jhon,” and “horn”
3. Term Sorgusu
term Sorgusu yazısına taşıdım

4. Terms Sorgusu
terms Sorgusu yazısına taşıdım

5. wildcard_query  Sorgusu - Tek Field'a Wildcard Sorgu Yapar
Örnek
Şöyle yaparız. Burada eski ElasticSearch kullanılıyor ve sorguda filter görülebilir.
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "should": [
            {"query": {"wildcard": {"user.name": {"value": "*mar*"}}}},
            {"query": {"wildcard": {"user.surname": {"value": "*mar*"}}}}
          ]
        }
      }
    }
  }
}
Analiz Edilen Sorgular - Full Text Queries
Açıklaması şöyle
intervals query
A full text query that allows fine-grained control of the ordering and proximity of matching terms.

match query
The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.

match_bool_prefix query
Creates a bool query that matches each term as a term query, except for the last term, which is matched as a prefix query

match_phrase query
Like the match query but used for matching exact phrases or word proximity matches.

match_phrase_prefix query
Like the match_phrase query, but does a wildcard search on the final word.

multi_match query
The multi-field version of the match query.

common terms query
A more specialized query which gives more preference to uncommon words.

query_string query
Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.

simple_query_string query
A simpler, more robust version of the query_string syntax suitable for exposing directly to users.
match Sorgusu
match Sorgusu yazısına taşıdım.

match_phrase Sorgusu
Phrase kelimesi aynı sırada olduğunu belirtir. Yani tam kelimelerin hepsi aynı sırada varsa bulur. Açıklaması şöyle
match_phrase query will analyze the input if analyzers are defined for the queried field and find documents matching the following criterias :

- all the terms must appear in the field
- they must have the same order as the input value
Örnek
Elimizde şöyle bir indeks olsun
{ "foo":"I just said hello world" }

{ "foo":"Hello world" }

{ "foo":"World Hello" }
Sorgu şöyle olsun. Sonuç olarak sadece 1 ve 2. dokümanları alırız. 0. doküman sonuca dahil olmaz.
{
  "query": {
    "match_phrase": {
      "foo": "Hello World"
    }
  }
}
match_phrase_prefix Sorgusu - search as you type
Tam kelimelerin hepsi varsa + yarım kelimeleri bulur
Örnek
Açıklaması şöyle
Keywords: “puerto r”
It considers “puerto” as exact word that needs to be in the country name, and “r” as prefix for any word after “puerto”. This will match “Puerto Rico”.
multi_match Sorgusu - Çoklu Field İçin Sorgu Yapar
Açıklaması şöyle.
Similar to match, but searches multiple fields.
Örnek ver

common terms query
Açıklaması şöyle.
A more specialized query which gives more preference to uncommon words.
Örnek ver

query_string Sorgusu Çoklu Field İçin Sorgu Yapar ve AND, OR Gibi Kriterleri Destekler
query_string Sorgusu yazısına taşıdım

simple_query_string Sorgusu
Açıklaması şöyle.
A simpler, more robust version of the query_string syntax suitable for exposing directly to users.
Örnek ver

Bileşik Sorgular - Compound Query
Açıklaması şöyle
bool query
The default query for combining multiple leaf or compound query clauses, as must, should, must_not, or filter clauses. The must and should clauses have their scores combined — the more matching clauses, the better — while the must_not and filter clauses are executed in filter context.

boosting query
Return documents which match a positive query, but reduce the score of documents which also match a negative query.

constant_score query
A query which wraps another query, but executes it in filter context. All matching documents are given the same “constant” _score.

dis_max query
A query which accepts multiple queries, and returns any documents which match any of the query clauses. While the bool query combines the scores from all matching queries, the dis_max query uses the score of the single best- matching query clause.

function_score query
Modify the scores returned by the main query with functions to take into account factors like popularity, recency, distance, or custom algorithms implemented with scripting.
bool query ile kullanılan Sorgular
Açıklaması şöyle
must
All queries within this clause must match a document in order for ES to return it. Think of this as your AND queries. The query that we used here is the fuzzy query, and it will match any documents that have a name field that matches “john” in a fuzzy way. The extra “fuzziness” parameter tells Elasticsearch that it should be using a Damerau-Levenshtein Distance of 2 two determine the fuzziness.

must_not
Any documents that match the query within this clause will be outside of the result set. This is the NOT or minus (-) operator of the query DSL. In this case, we do a simple match query, looking for documents that contain the term “city.” Using _all as the field name indicates that the term can appear in any of the document’s fields. This is the must_not clause, so matching documents will be excluded.

should
Up until now, we have been dealing with absolutes: must and must_not. Should is not absolute and is equivalent to the OR operator. Elasticsearch will return any documents that match one or more of the queries in the should clause. The first query that we provided looks for documents where the age field is between 30 and 40. The second query does a wildcard search on the surname field, looking for values that start with “K.”

The query contained three different clauses, so Elasticsearch will only return documents that match the criteria in all of them. These queries can be nested, so you can build up very complex queries by specifying a bool query as a must, must_not, should or filter query.