Elasticsearch Python Client

Hareesh Pallathoor Balakrishnan
4 min readNov 15, 2021

--

Introduction

Elasticsearch is a NoSQL database, based on the Apache Lucene its an open-source, distributed, modern search and analytics engine. It stores data as JSON documents. Elasticsearch allows to store, search, and analyze huge volumes of data quickly and in near real-time

Environment Setup

A trial of cloud-hosted Elasticsearch can be used as a test environment. The link below can be used to signup and create an electicsearch instance.

The username and password is available in a popup on deployment. Visiting the link below will give the cloud-id that needs to be used in the code.

Python Client

The client can be installed by

pip install elasticsearch

The following code establishes a connection with the cloud server and we can confirm the connection by extracting basic connection information

{'cluster_name': '2ad3dad5b81a41f5a5e499d9980b0f8f',
'cluster_uuid': 'Bs5M_U7yRPWVMp8OJDY7UA',
'name': 'instance-0000000000',
'tagline': 'You Know, for Search',
'version': {'build_date': '2021-11-04T14:04:42.515624022Z',
'build_flavor': 'default',
'build_hash': '93d5a7f6192e8a1a12e154a2b81bf6fa7309da0c',
'build_snapshot': False,
'build_type': 'docker',
'lucene_version': '8.9.0',
'minimum_index_compatibility_version': '6.0.0-beta1',
'minimum_wire_compatibility_version': '6.8.0',
'number': '7.15.2'}}

The generic syntax is

es = Elasticsearch(host= 'locahost',http_auth=('username', 'pass'),scheme="https",port=9200)

Index

Indices are equivalent to tables. It's a collection of JSON documents. We can get all the indices in the database using

es.indices.get_alias("*")

Create an Index

A new index can be created using

{'acknowledged': True, 'index': 'test_index', 'shards_acknowledged': True}

Deleting an Index

An existing index can be deleted using

{'acknowledged': True}

Check if an index exists

False

Document & Fields

Documents are equivalents of rows and fields of columns. In elasticsearch documents are JSON objects and fields are the key-value pairs in them.

Inserting data

Data can be inserted using the following command, each document is a row and id is the row number. The index will be automatically created if it does not exist.

{'_id': '3',  
'_index': 'capitals',
'_primary_term': 1,
'_seq_no': 11,
'_shards': {'failed': 0, 'successful': 2, 'total': 2},
'_type': '_doc', '_version': 1, 'result': 'created'}

Updating Data

{'_id': '3',
'_index': 'capitals',
'_primary_term': 1,
'_seq_no': 10,
'_shards': {'failed': 0, 'successful': 2, 'total': 2},
'_type': '_doc',
'_version': 3,
'result': 'updated'}

Fetching Data

An entry can be fetched using the id as follows

{'_id': '1',  
'_index': 'capitals',
'_primary_term': 1,
'_seq_no': 7,
'_source': {'Capital': 'Delhi', 'Country': 'India'},
'_type': '_doc', '_version': 6, 'found': True}

or just the data using,

{'Capital': 'Muscat', 'Country': 'Oman'}

Querying

We can query using the search parameter

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  'hits': {'hits': [{'_id': '1',     
'_index': 'capitals',
'_score': 1.0,
'_source': {'Capital': 'Delhi', 'Country': 'India'},
'_type': '_doc'},
{'_id': '2',
'_index': 'capitals',
'_score': 1.0,
'_source': {'Capital': 'Muscat', 'Country': 'Oman'},
'_type': '_doc'},
{'_id': '3',
'_index': 'capitals',
'_score': 1.0,
'_source': {'Capital': 'London1', 'Country': 'England'}, '_type': '_doc'},
{'_id': '4',
'_index': 'capitals',
'_score': 1.0,
'_source': {'Capital': 'AbuDhabi', 'Country': 'UAE'},
'_type': '_doc'}],
'max_score': 1.0,
'total': {'relation': 'eq', 'value': 4}},
'timed_out': False,
'took': 1}

We can query for a particular field,

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  
'hits': {'hits': [{'_id': '1',
'_index': 'capitals',
'_score': 1.2039728,
'_source': {'Capital': 'Delhi', 'Country': 'India'},
'_type': '_doc'}],
'max_score': 1.2039728,
'total': {'relation': 'eq', 'value': 1}},
'timed_out': False,
'took': 2}

match vs match_phrase

match_phrase is a more restrictive query. To demonstrate this let's add one more entry to the capitals, Muscat-City, and let query using both match and match_phrase

#1 match returns

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  
'hits': {'hits': [{'_id': '5',
'_index': 'capitals',
'_score': 1.3808408,
'_source': {'Capital': 'Muscat-City', 'Country': 'Oman2'}, '_type': '_doc'},
{'_id': '2',
'_index': 'capitals',
'_score': 0.7801935,
'_source': {'Capital': 'Muscat', 'Country': 'Oman'},
'_type': '_doc'}],
'max_score': 1.3808408,
'total': {'relation': 'eq', 'value': 2}},
'timed_out': False, 'took': 4}

#2 match_phrase returns

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  
'hits': {'hits': [{'_id': '5',
'_index': 'capitals',
'_score': 1.2048804,
'_source': {'Capital': 'Muscat-City', 'Country': 'Oman2'}, '_type': '_doc'}],
'max_score': 1.2048804,
'total': {'relation': 'eq', 'value': 1}},
'timed_out': False, 'took': 13}

Combining multiple queries

We can combine multiple queries using a boolean. To demonstrate this, let's add a new entry London City

#1 Regular Query return

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  'hits': {'hits': [{'_id': '5',     
'_index': 'capitals',
'_score': 0.7104268,
'_source': {'Capital': 'Muscat-City', 'Country': 'Oman2'}, '_type': '_doc'},
{'_id': '6',
'_index': 'capitals',
'_score': 0.7104268,
'_source': {'Capital': 'London-City', 'Country': 'England2'}, '_type': '_doc'}],
'max_score': 0.7104268,
'total': {'relation': 'eq', 'value': 2}}, 'timed_out': False, 'took': 0}

#2 Combined query returns

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  
'hits': {'hits': [{'_id': '6',
'_index': 'capitals',
'_score': 0.7104268,
'_source': {'Capital': 'London-City', 'Country': 'England2'}, '_type': '_doc'}],
'max_score': 0.7104268,
'total': {'relation': 'eq', 'value': 1}},
'timed_out': False, 'took': 1}

Asynchronous Client

To use asynchronous client, install elastic client using, this will automatically install other async libraries like aiohttp

pip install elasticsearch[async]

and an asynchronous program would be as follows

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response