Elasticsearch Python Client

Introduction
Elasticsearch is a NoSQL database, based on the Apache Lucene its an open-source, distributed, modern search and analytics engine. It stores data as JSON documents. Elasticsearch allows to store, search, and analyze huge volumes of data quickly and in near real-time
Environment Setup
A trial of cloud-hosted Elasticsearch can be used as a test environment. The link below can be used to signup and create an electicsearch instance.
The username and password is available in a popup on deployment. Visiting the link below will give the cloud-id that needs to be used in the code.
Python Client
The client can be installed by
pip install elasticsearch
The following code establishes a connection with the cloud server and we can confirm the connection by extracting basic connection information
{'cluster_name': '2ad3dad5b81a41f5a5e499d9980b0f8f',
'cluster_uuid': 'Bs5M_U7yRPWVMp8OJDY7UA',
'name': 'instance-0000000000',
'tagline': 'You Know, for Search',
'version': {'build_date': '2021-11-04T14:04:42.515624022Z',
'build_flavor': 'default',
'build_hash': '93d5a7f6192e8a1a12e154a2b81bf6fa7309da0c',
'build_snapshot': False,
'build_type': 'docker',
'lucene_version': '8.9.0',
'minimum_index_compatibility_version': '6.0.0-beta1',
'minimum_wire_compatibility_version': '6.8.0',
'number': '7.15.2'}}
The generic syntax is
es = Elasticsearch(host= 'locahost',http_auth=('username', 'pass'),scheme="https",port=9200)
Index
Indices are equivalent to tables. It's a collection of JSON documents. We can get all the indices in the database using
es.indices.get_alias("*")
Create an Index
A new index can be created using
{'acknowledged': True, 'index': 'test_index', 'shards_acknowledged': True}
Deleting an Index
An existing index can be deleted using
{'acknowledged': True}
Check if an index exists
False
Document & Fields
Documents are equivalents of rows and fields of columns. In elasticsearch documents are JSON objects and fields are the key-value pairs in them.
Inserting data
Data can be inserted using the following command, each document is a row and id is the row number. The index will be automatically created if it does not exist.
{'_id': '3',
'_index': 'capitals',
'_primary_term': 1,
'_seq_no': 11,
'_shards': {'failed': 0, 'successful': 2, 'total': 2},
'_type': '_doc', '_version': 1, 'result': 'created'}
Updating Data
{'_id': '3',
'_index': 'capitals',
'_primary_term': 1,
'_seq_no': 10,
'_shards': {'failed': 0, 'successful': 2, 'total': 2},
'_type': '_doc',
'_version': 3,
'result': 'updated'}
Fetching Data
An entry can be fetched using the id as follows
{'_id': '1',
'_index': 'capitals',
'_primary_term': 1,
'_seq_no': 7,
'_source': {'Capital': 'Delhi', 'Country': 'India'},
'_type': '_doc', '_version': 6, 'found': True}
or just the data using,
{'Capital': 'Muscat', 'Country': 'Oman'}
Querying
We can query using the search parameter
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1}, 'hits': {'hits': [{'_id': '1',
'_index': 'capitals',
'_score': 1.0,
'_source': {'Capital': 'Delhi', 'Country': 'India'},
'_type': '_doc'},
{'_id': '2',
'_index': 'capitals',
'_score': 1.0,
'_source': {'Capital': 'Muscat', 'Country': 'Oman'},
'_type': '_doc'},
{'_id': '3',
'_index': 'capitals',
'_score': 1.0,
'_source': {'Capital': 'London1', 'Country': 'England'}, '_type': '_doc'},
{'_id': '4',
'_index': 'capitals',
'_score': 1.0,
'_source': {'Capital': 'AbuDhabi', 'Country': 'UAE'},
'_type': '_doc'}],
'max_score': 1.0,
'total': {'relation': 'eq', 'value': 4}},
'timed_out': False,
'took': 1}
We can query for a particular field,
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},
'hits': {'hits': [{'_id': '1',
'_index': 'capitals',
'_score': 1.2039728,
'_source': {'Capital': 'Delhi', 'Country': 'India'},
'_type': '_doc'}],
'max_score': 1.2039728,
'total': {'relation': 'eq', 'value': 1}},
'timed_out': False,
'took': 2}
match vs match_phrase
match_phrase is a more restrictive query. To demonstrate this let's add one more entry to the capitals, Muscat-City, and let query using both match and match_phrase
#1 match returns
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},
'hits': {'hits': [{'_id': '5',
'_index': 'capitals',
'_score': 1.3808408,
'_source': {'Capital': 'Muscat-City', 'Country': 'Oman2'}, '_type': '_doc'},
{'_id': '2',
'_index': 'capitals',
'_score': 0.7801935,
'_source': {'Capital': 'Muscat', 'Country': 'Oman'},
'_type': '_doc'}],
'max_score': 1.3808408,
'total': {'relation': 'eq', 'value': 2}},
'timed_out': False, 'took': 4}
#2 match_phrase returns
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},
'hits': {'hits': [{'_id': '5',
'_index': 'capitals',
'_score': 1.2048804,
'_source': {'Capital': 'Muscat-City', 'Country': 'Oman2'}, '_type': '_doc'}],
'max_score': 1.2048804,
'total': {'relation': 'eq', 'value': 1}},
'timed_out': False, 'took': 13}
Combining multiple queries
We can combine multiple queries using a boolean. To demonstrate this, let's add a new entry London City
#1 Regular Query return
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1}, 'hits': {'hits': [{'_id': '5',
'_index': 'capitals',
'_score': 0.7104268,
'_source': {'Capital': 'Muscat-City', 'Country': 'Oman2'}, '_type': '_doc'},
{'_id': '6',
'_index': 'capitals',
'_score': 0.7104268,
'_source': {'Capital': 'London-City', 'Country': 'England2'}, '_type': '_doc'}],
'max_score': 0.7104268,
'total': {'relation': 'eq', 'value': 2}}, 'timed_out': False, 'took': 0}
#2 Combined query returns
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},
'hits': {'hits': [{'_id': '6',
'_index': 'capitals',
'_score': 0.7104268,
'_source': {'Capital': 'London-City', 'Country': 'England2'}, '_type': '_doc'}],
'max_score': 0.7104268,
'total': {'relation': 'eq', 'value': 1}},
'timed_out': False, 'took': 1}
Asynchronous Client
To use asynchronous client, install elastic client using, this will automatically install other async libraries like aiohttp
pip install elasticsearch[async]
and an asynchronous program would be as follows