결과 물 

OpenAPI 사용 ( 문화재청 ) 

Elasticsearch 데이터 적재 

Python Folium 사용하여 시각화 

-----------------

코드 일부분

 

from xml.etree import ElementTree
import requests
import re

from CH.chConfigFile import Conf
from Elastic.Elasrv import Elasrv

'''문화재 리스트
'''
class CttrList:

    getConfigure = Conf.chListYamlReturn()
    fetchCount = 0
    totalCount = 0

    def __init__(self):

        self.chListIndex  = CttrList.getConfigure.get("index-name")
        self.url       = CttrList.getConfigure.get("url")
        self.pageUnit  = CttrList.getConfigure.get("pageUnit")
        self.pageIndex = CttrList.getConfigure.get("pageIndex")
        self.elementsJson = []

    def urlRequest(self):

        e = dict(Conf.chListMappingReturn())
        keyList = e.keys()

        while True:

            requestUrl = "{0}?pageIndex={1}&pageUnit={2}".format(self.url, self.pageIndex, self.pageUnit)
            print ("요청 url : {}".format(requestUrl))
            html = requests.get(requestUrl)

            if html.status_code == 200:

                xmlString = re.sub(pattern="\n", repl= "", string= html.content.decode("utf-8"))

                """ xml 파일 이라면 
                """
                if xmlString[0] == "<":

                    print ("xml file")

                    xmlDoc = ElementTree.fromstring(xmlString)

                    for t in xmlDoc:

                        if t.tag == "totalCnt":
                            CttrList.totalCount = int(t.text, base=10)

                        if t.tag == "item":
                            for s in t.iter():
                                if s.tag in keyList:
                                    e[s.tag] = s.text

                            Elasrv.insert(index=self.chListIndex, document=e)

                            #self.elementsJson.append(e)

                    CttrList.fetchCount += self.pageUnit

            if (self.headers()): break
            else: self.pageIndex += 1


    def headers(self):

        if CttrList.fetchCount < CttrList.totalCount:
            return False

        else:
            return True


def main():

    ctList = CttrList()
    ctList.urlRequest()

if __name__ == "__main__":
    main()

-------------------------------------------------------

'빅데이터' 카테고리의 다른 글

임시  (0) 2018.12.08
JDBC + ELASTIC  (0) 2018.12.08

임시

빅데이터2018. 12. 8. 19:29

                    수집                                    검색 (jboss)

             . collection (openapi, tibero)        . ela 검색 api
                     . oozie                                   - RestFul api
                     . logstash
                     . kafka
                     . elasticsearch
              - index 정의
             - spark
             - hive 적재
1. git
    - master
    - develop



2. jenkins
   (주요 용도)
    - compile  : 1.8 Spark app
                     수집   app

                 1.7 검색 
    - deployment : fat jar  (java app) => remote server (ssh)
                   .

    jenkins => port : 9090
            => user : forebig
                  openjdk
          plugin



git init . --bare : 서버 만들어짊
git clone -b 서버 위치
kafka

  

'빅데이터' 카테고리의 다른 글

openapi + elasticsearch + folium  (0) 2019.04.06
JDBC + ELASTIC  (0) 2018.12.08

JDBC + ELASTIC

빅데이터2018. 12. 8. 18:59


'빅데이터' 카테고리의 다른 글

openapi + elasticsearch + folium  (0) 2019.04.06
임시  (0) 2018.12.08