'ELK/elasticsearch' 카테고리의 글 목록

ELK/elasticsearch +47

Loading..elasticsearch match-query
2020.12.23

뷰어로 보기
Loading..elasticsearch term query
2020.12.22

뷰어로 보기
Loading..3노드 클러스터 엘라스틱서치 ( elasticsearch 3node)
2020.05.09

뷰어로 보기
Loading..elasticsearch index mapping 에 관한 생각
2020.04.17

뷰어로 보기
Loading..logstash ruby syntax
2020.04.14

뷰어로 보기
Loading..logstash file stdin
2020.04.14

뷰어로 보기
Loading..보안 적용된 elasticsearch에 쿼리
2020.03.22

뷰어로 보기
Loading..reindex query shellscript array
2020.02.03

뷰어로 보기
Loading..대상 인덱스의 field 모두를 fielddata true로 변환하는 방법
2020.01.17

뷰어로 보기
Loading..app-search
2020.01.06

뷰어로 보기

elasticsearch match-query

ELK/elasticsearch2020. 12. 23. 01:54

뷰어
댓글로
이전글
다음글

match query

Match Query는 텍스트, 숫자, 날짜 등이 포함된 문장을 형태소 분석을 통해 텀으로 분리한 후 이 텀들을 이용해 검색 질의를 수행

============================================================================

형태소 분석기는 nori를 사용하도록 한다.

1] nori 설치

$ ./elasticsearch-plugin install analysis-nori

설치가 잘 됬는지 확인해보자

2] index mapping 설정

( 왜 인지 모르겠지만 mapping 설정이 자꾸 에러나서 elasticsearch service 내렸다가 다시 올렸다. )

3] 데이터 insert

( 대충 요런 document를 insert 하였다. )

4] MatchQuery 작성

GET sample_2/_search
{
  "query": {
    "match": {
      "col": "LG"
    }
  }
}

[ 내용 추가 중 ]

결론] match query를 유사도 검색으로 사용하기 위해서는 적절한 형태소 분석기를 동반해서 사용해야 될 것으로 보여진다. 그렇지 않으면 해당 value값과 동일하게 일치되는 형태로 dsl 질의문을 날려야 한다.

'ELK > elasticsearch' 카테고리의 다른 글

elasticsearch term query (0)	2020.12.22
3노드 클러스터 엘라스틱서치 ( elasticsearch 3node) (0)	2020.05.09
elasticsearch index mapping 에 관한 생각 (0)	2020.04.17
logstash ruby syntax (0)	2020.04.14
logstash file stdin (0)	2020.04.14

elasticsearch term query

ELK/elasticsearch2020. 12. 22. 23:14

뷰어
댓글로
이전글
다음글

term query

-> 제공된 필드에 정확한 용어가 포함 된 문서를 반환합니다.

-> 검색어라는 용어를 사용하여 가격, 제품 ID 또는 사용자 이름과 같은 정확한 값을 기반으로 문서를 찾을 수 있습니다.

핵심은 정확하게 해당 field의 value값과 일치하여야 hit될 수 있다로 결론 지을 수 있다.

============================================================================

그럼 간단한 예제로 확인해보자

++++++++++++++++++++++++++++++++++++++++++++

sample 데이터

POST sample_index/_doc
{
"col1": "삼성전자"
}

POST sample_index/_doc
{
"col1": "LG전자"
}

++++++++++++++++++++++++++++++++++++++++++++

다음과 같이 쿼리를 날려보자

GET sample_index/_search
{
  "query": {
    "term": {
      "col1.keyword": {
        "value": "LG"
      }
    }
  }
}

결과는 일치하는 document가 없다고 나온다.

다음과 같이 쿼리를 날리면

GET sample_index/_search
{
  "query": {
    "term": {
      "col1.keyword": {
        "value": "LG전자"
      }
    }
  }
}

hti된 document가 결과로 나온다.

============================================================================

참고 Elasticsearch에서 인덱스 매핑없이 데이터를 insert하여 인덱스를 생성할 경우 string에 해당하는 column은 자동적으로 text와 keyword 타입으로 동시 매핑이 잡히게 된다.

============================================================================

'ELK > elasticsearch' 카테고리의 다른 글

elasticsearch match-query (0)	2020.12.23
3노드 클러스터 엘라스틱서치 ( elasticsearch 3node) (0)	2020.05.09
elasticsearch index mapping 에 관한 생각 (0)	2020.04.17
logstash ruby syntax (0)	2020.04.14
logstash file stdin (0)	2020.04.14

3노드 클러스터 엘라스틱서치 ( elasticsearch 3node)

ELK/elasticsearch2020. 5. 9. 12:10

뷰어
댓글로
이전글
다음글

셋팅 관련 자료가 너무 없어서 혼자서 삽질하면서 익힌게 생각이 난다. 누군가에게 도움이 되었으면 한다.

( 물리서버가 1대인 경우 사실 이와 같은 환경은 위험이 따른다. 만약 물리서버가 죽게되면 cluster를 잃게 된다. )

물리서버 : ubuntu 20/04

작업 전 port 개방 ( root 사용자로 작업할 것 )

- centos

firewall-cmd --zone=public --permanent --add-port=9200/tcp
firewall-cmd --zone=public --permanent --add-port=9300/tcp
firewall-cmd --zone=public --permanent --add-port=5601/tcp

firewall-cmd --reload

- ubuntu

ufw allow 9200
ufw allow 9300
ufw allow 5601

elasticsearch 디렉토리 구조

서비스가 정상적으로 올라 왔는지 확인

방법1) $ ps -ef | grep elasticsearch

방법2) $ jps
4629 Elasticsearch
4935 Elasticsearch
7419 Jps
5087 Elasticsearch

[ 단 jps 명령어 사용시에는 openjdk 설치가 반드시 필요하다 ]

방법3) $ curl -XGET http://localhost:9200/_cluster/health?pretty

kibana 설정

브라우저에서

: http://xxx.xxx.xxx.xxx:5601로 접근 시도한다.

'ELK > elasticsearch' 카테고리의 다른 글

elasticsearch match-query (0)	2020.12.23
elasticsearch term query (0)	2020.12.22
elasticsearch index mapping 에 관한 생각 (0)	2020.04.17
logstash ruby syntax (0)	2020.04.14
logstash file stdin (0)	2020.04.14

elasticsearch index mapping 에 관한 생각

ELK/elasticsearch2020. 4. 17. 01:22

뷰어
댓글로
이전글
다음글

보통 인덱스를 생성할 때 반드시 인덱스 매핑을 먼저 정의할 것을 권유한다. 매핑 정의서가 있어야 유지 보수가 용이하기 때문이다.

매핑 타입에 맞지 않는 document 를 insert 할 시 어떤 일이 발생하는지 확인해보겠다.

1] 인덱스 정의

#!/bin/bash

curl -X PUT 'http://:9200/kim?pretty' -H 'Content-Type: application/json' -d'

{

"settings" : {

"number_of_shards" : 3,

"number_of_replicas" : 1

"mappings" : {

"properties" : {

"x" : {"type" : "integer"}

}

2] 데이터 타입에 맞는 doc를 넣는 경우

#!/bin/bash

curl -X POST 'http://:9200/kim/_doc/1' -H 'Content-Type: application/json' -d'

{

"x": 10

{"_index":"kim","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":0,"_primary_term":1}

3] 데이터 타입에 맞지 않는 doc를 넣는 경우

#!/bin/bash

curl -X POST 'http://:9200/kim/_doc/2' -H 'Content-Type: application/json' -d'

{

"x": "hello world"

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse field [x] of type [integer] in document with id '2'"}],"type":"mapper_parsing_exception","reason":"failed to parse field [x] of type [integer] in document with id '2'","caused_by":{"type":"number_format_exception","reason":"For input string: \"hello world\""}},"status":400}

4] 결론 : 인덱스를 생성하기 전에 매핑을 정의하는 것이 정신건강에 좋을 꺼라 생각한다.

'ELK > elasticsearch' 카테고리의 다른 글

elasticsearch term query (0)	2020.12.22
3노드 클러스터 엘라스틱서치 ( elasticsearch 3node) (0)	2020.05.09
logstash ruby syntax (0)	2020.04.14
logstash file stdin (0)	2020.04.14
보안 적용된 elasticsearch에 쿼리 (0)	2020.03.22

logstash ruby syntax

ELK/elasticsearch2020. 4. 14. 00:55

뷰어
댓글로
이전글
다음글

logstash conf 파일 안쪽에서 프로그래밍 언어 처럼 조건문을 넣어야 할 경우가 있다 그럴 경우 ruby 문법을 사용하여 넣으면 된다.

[ logstash conf 파일 ]

input {

stdin {

codec => "json"

}

filter {

mutate {

remove_field => ["host", "@timestamp", "@version"]

}

output {

if [name] == "kimjunhyeon" {

stdout {}

}

[ json 파일 ]

{"name": "kimjunhyeon"}

{"name": "LeeJung"}

[ 실행 ]

$ logstash -f basic_02.conf < test.json

[ 결과 ]

'ELK > elasticsearch' 카테고리의 다른 글

3노드 클러스터 엘라스틱서치 ( elasticsearch 3node) (0)	2020.05.09
elasticsearch index mapping 에 관한 생각 (0)	2020.04.17
logstash file stdin (0)	2020.04.14
보안 적용된 elasticsearch에 쿼리 (0)	2020.03.22
reindex query shellscript array (0)	2020.02.03

logstash file stdin

ELK/elasticsearch2020. 4. 14. 00:41

뷰어
댓글로
이전글
다음글

logstash -f xxxx.conf 로 실행을 하면 언제 업무를 끝마쳤는지 알기 쉽지 않다. 그럴경우 파일을 stdin 의 redirect 값으로 던져주면 수행을 마치면 자동적으로 내려가게 된다. 구조는 다음과 같다.

logstash -f xxx.conf < yyy.json

ㄱ] logstash conf 파일

input {

stdin {

codec => "json"

}

output {

stdout {

codec => rubydebug

}

ㄴ] json 파일

{"name": "kimjunhyeon"}

{"name": "LeeJung"

ㄷ] 실행

$ logstash -f basic_01.conf < test.json

'ELK > elasticsearch' 카테고리의 다른 글

elasticsearch index mapping 에 관한 생각 (0)	2020.04.17
logstash ruby syntax (0)	2020.04.14
보안 적용된 elasticsearch에 쿼리 (0)	2020.03.22
reindex query shellscript array (0)	2020.02.03
대상 인덱스의 field 모두를 fielddata true로 변환하는 방법 (0)	2020.01.17

보안 적용된 elasticsearch에 쿼리

ELK/elasticsearch2020. 3. 22. 22:04

뷰어
댓글로
이전글
다음글

#!/bin/bash

$ curl -u elastic:password -k "https://xxx.xxx.xxx.xxx:xxxx/_cluster/health?pretty"

=====================================================

$ curl -u elastic:password http://xxx.xxx.xxx.xxx:9200/_cluster/health?pretty

{
  "cluster_name" : "hello_world",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 267,
  "active_shards" : 533,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

'ELK > elasticsearch' 카테고리의 다른 글

logstash ruby syntax (0)	2020.04.14
logstash file stdin (0)	2020.04.14
reindex query shellscript array (0)	2020.02.03
대상 인덱스의 field 모두를 fielddata true로 변환하는 방법 (0)	2020.01.17
app-search (0)	2020.01.06

reindex query shellscript array

ELK/elasticsearch2020. 2. 3. 00:30

뷰어
댓글로
이전글
다음글

#!/bin/bash

test_array=("sk_poc_news_itscience_sciencecomputer" "sk_poc_news_itscience_sciencecomputer_test" "sk_poc_news_itscience_sciencemobile" "sk_poc_news_itscience_sciencemobile_test" "sk_poc_news_itscience_sciencegeneralscience" "sk_poc_news_itscience_sciencegeneralscience_test" "sk_poc_news_itscience_sciencecommunicationandnewmedia" "sk_poc_news_itscience_sciencecommunicationandnewmedia_test" "sk_poc_news_social_accidnet" "sk_poc_news_social_accidnet_test" "sk_poc_news_social_labor" "sk_poc_news_social_labor_test" "sk_poc_news_politics_administ" "sk_poc_news_politics_administ_test" "sk_poc_news_itscience_scienceinternetandsns" "sk_poc_news_itscience_scienceinternetandsns_test")

for (( i=0 ; i <16 ; i=i+2 )); do
        echo "${test_array[i]}"
        curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -u "elastic:ezfarm123"  -d"
        {
                \"source\": {
                        \"index\": \"${test_array[i]}\"
                },
                \"dest\": {
                        \"index\": \"${test_array[i+1]}\"
                }
        }"
done

참고

https://stackoverflow.com/questions/37202122/how-can-i-put-parameters-in-elasticsearch-curl-post

'ELK > elasticsearch' 카테고리의 다른 글

logstash file stdin (0)	2020.04.14
보안 적용된 elasticsearch에 쿼리 (0)	2020.03.22
대상 인덱스의 field 모두를 fielddata true로 변환하는 방법 (0)	2020.01.17
app-search (0)	2020.01.06
nginx setting (0)	2019.12.19

대상 인덱스의 field 모두를 fielddata true로 변환하는 방법

ELK/elasticsearch2020. 1. 17. 08:03

뷰어
댓글로
이전글
다음글

case1) 해당 인덱스의 field들

PUT test_sk_poc/_mapping
{
  "properties": {
    "*": {
      "type": "text",
      "fielddata": true
    }
  }
}

case2) 모든 인덱스

PUT */_mapping
{
  "properties": {
    "*": {
      "type": "text",
      "fielddata": true
    }
  }
}

'ELK > elasticsearch' 카테고리의 다른 글

보안 적용된 elasticsearch에 쿼리 (0)	2020.03.22
reindex query shellscript array (0)	2020.02.03
app-search (0)	2020.01.06
nginx setting (0)	2019.12.19
python-appsearch (0)	2019.12.19

app-search

ELK/elasticsearch2020. 1. 6. 11:30

뷰어
댓글로
이전글
다음글

from pdflib import Document
import os
import base64
import yaml

import time
from time import strftime

# app-search
from swiftype_app_search import Client

from ela_dir.Ela import Ela

#
# pdf 파일 읽어 app-search 에 insert
#

class AppSearch():


    def __init__(self):

        ARGS = AppSearch.getAppObj() 
        self._appClient = ARGS.get("client")
        self._appEngine = ARGS.get("engine_name")

    @classmethod
    def getAppObj(cls):
        
        try:

            f=open("./app_search_info/app_info.yml", "r", encoding="utf-8")
        except FileExistsError as E:
            print(E)
            exit(1)
        else:
            
            appArgs   = yaml.safe_load(f)
            arguments = dict()

            client  = Client(
                api_key       = appArgs.get("api_key"), 
                base_endpoint = appArgs.get("base_endpoint"), 
                use_https     = appArgs.get("use_https")
            )
            
            engine_name = appArgs.get("engine_name")

            arguments["client"] = client
            arguments["engine_name"] = engine_name

            return arguments 


class PDFObj(AppSearch):


    def __init__(self):

        AppSearch.__init__(self)
        self._targetPath = PDFObj.getFilePath()
        self._fileTypeList = [".pdf"]
        self._timeObj = strftime("%Y%m%d", time.localtime())

    # ftp로 들어온 파일을 순회
    def dirSearch(self):

        os.chdir(self._targetPath)
        cur = os.listdir()

        for f in cur:
            
            fname, fext = os.path.splitext(f)

            if fext in self._fileTypeList:

                doc = Document(f)
                #text = []
                for page, content in enumerate(doc):
                    
                    print("{} 처리 중 ...".format(page+1))

                    strData = " ".join(content.lines).strip()
                    #text.append(strData)
                
                    element = {"metadata": doc.metadata, "fileName": fname, "content": strData, "cllctTime": self._timeObj,
                               "filepath": os.path.abspath(f)}
                    self._appClient.index_document(self._appEngine, element)

                #resultContent = "".join(text) 

    @classmethod
    def getFilePath(cls):

        try:

            f = open("./conf/info.yml", "r", encoding="utf-8")
        except FileNotFoundError as E:
            print(E)
            exit(1)
        else:

            filePath = yaml.safe_load(f)
            return filePath.get("target_path")

if __name__ == "__main__":

    o = PDFObj()
    o.dirSearch()

'ELK > elasticsearch' 카테고리의 다른 글

reindex query shellscript array (0)	2020.02.03
대상 인덱스의 field 모두를 fielddata true로 변환하는 방법 (0)	2020.01.17
nginx setting (0)	2019.12.19
python-appsearch (0)	2019.12.19
Elasticsearch + python + pipeline (0)	2019.12.02

‹ Prev 1 2 3 4 5 Next ›

최근에 올라온 글

최근에 달린 댓글

공지사항

글 보관함

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

링크

나니시큐리티

total :

today :

yesterday :

길

elasticsearch match-query

'ELK > elasticsearch' 카테고리의 다른 글

elasticsearch term query

'ELK > elasticsearch' 카테고리의 다른 글

3노드 클러스터 엘라스틱서치 ( elasticsearch 3node)

'ELK > elasticsearch' 카테고리의 다른 글

elasticsearch index mapping 에 관한 생각

'ELK > elasticsearch' 카테고리의 다른 글

logstash ruby syntax

'ELK > elasticsearch' 카테고리의 다른 글

logstash file stdin

'ELK > elasticsearch' 카테고리의 다른 글

보안 적용된 elasticsearch에 쿼리

'ELK > elasticsearch' 카테고리의 다른 글

reindex query shellscript array

'ELK > elasticsearch' 카테고리의 다른 글

대상 인덱스의 field 모두를 fielddata true로 변환하는 방법

'ELK > elasticsearch' 카테고리의 다른 글

app-search

'ELK > elasticsearch' 카테고리의 다른 글

최근에 올라온 글

최근에 달린 댓글

공지사항

글 보관함

링크

티스토리툴바