Bulk insert from MongoDB to elasticsearch python

0 votes
asked Sep 12, 2017 by jasdeep-singh-chhabr

Background

I have data in my Mongodb collection(not a lot ~100 MB).It consists of several documents.Pymongo queries the collection with some user input.Through that query appropriate document need to be searched and returned.The data won't update regularly.

I have decided to use elasticsearch in order to search and return the relevant document.My workflow is(when queried everytime)-

Retrieve all documents from mongodb-->bulk insert it in elasticsearch--> search using elasticsearch-->Return the top document.

I am using pymongo,elasticsearch-py

Problem

As I bulk insert for every query fired,the documents keep on accumulating(the count variable increases everytime by the number of documents in collection).As I understand I need to bulk insert only once.But this seems a bit hacky.(setting to run bulk only if count==0)

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
from pymongo import MongoClient


es = Elasticsearch()



INDEX_NAME = "documents"
TYPE = "document"
stop_words=[]


client=MongoClient("URI")

db=client["database"]

collection=db["collection"]



def make_documents():
    for faq in collection.find():
        doc = {
            '_op_type': 'create',
            '_index': INDEX_NAME,
            '_type': TYPE,
            '_source': {'text': faq['answer']}
        }
        yield (doc)

    # put documents in index in bulk
bulk(es, make_documents())

# count the matches
count = es.count(index=INDEX_NAME, doc_type=TYPE, body={"query": {"match_all": {}}})

# now we can do searches.
print("Ok. I've got an index of {0} documents. Let's do some searches...".format(count['count']))
while True:
    try:
        query = input("Enter a search: ")
        result = es.search(index=INDEX_NAME, doc_type=TYPE, body={"query": {"match": {"text": query.strip()}}})
        if result.get('hits') is not None and result['hits'].get('hits') is not None:
            print(result['hits']['hits'])
        else:
            print({})
    except(KeyboardInterrupt):
        break

I just need to load all the data in elasticsearch once and query a lot of times without updating the elasticsearch.I can be wildly incorrect in my approach as I am really new to elasticsearch and I feel there has to be a better way.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
Anti-spam verification:
To avoid this verification in future, please log in or register.
Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
...