In this tutorial we are going to learn about indexing. We can create different indexes with different purposes:
- Create an index to improve query performance
- Create a search index to search for keywords
- Create an index to get a Data Guide, or get the JSON structure
Let’s go with the first index. As usual, first we are going to connect to the Oracle Database 23c and open our collection “hotel_reservations”:
import oracledbimport json#connect with thick-clientoracledb.init_oracle_client()connection = oracledb.connect(user="myapp",password="PassworD123##",dsn="localhost:1521/FREEPDB1")#auto commitconnection.autocommit = True#Open collection for hotel reservationssoda = connection.getSodaDatabase()collection = soda.openCollection("hotel_reservations")import oracledb import json #connect with thick-client oracledb.init_oracle_client() connection = oracledb.connect(user="myapp", password="PassworD123##", dsn="localhost:1521/FREEPDB1") #auto commit connection.autocommit = True #Open collection for hotel reservations soda = connection.getSodaDatabase() collection = soda.openCollection("hotel_reservations")import oracledb import json #connect with thick-client oracledb.init_oracle_client() connection = oracledb.connect(user="myapp", password="PassworD123##", dsn="localhost:1521/FREEPDB1") #auto commit connection.autocommit = True #Open collection for hotel reservations soda = connection.getSodaDatabase() collection = soda.openCollection("hotel_reservations")
Enter fullscreen mode Exit fullscreen mode
Now we have to define the index. In this case I want to create a composite index, as I would like to search by the reservation_id and the room_id. We are going to call the index “reservation_index”. Then we will call the function “createIndex”:
index_def = {'name': 'RESERVATION_INDEX','fields': [{'path': 'reservation_id','datatype': 'string','order': 'asc'},{'path': 'room_id','datatype': 'string','order': 'asc'},]}collection.createIndex(index_def)index_def = { 'name': 'RESERVATION_INDEX', 'fields': [ { 'path': 'reservation_id', 'datatype': 'string', 'order': 'asc' }, { 'path': 'room_id', 'datatype': 'string', 'order': 'asc' }, ] } collection.createIndex(index_def)index_def = { 'name': 'RESERVATION_INDEX', 'fields': [ { 'path': 'reservation_id', 'datatype': 'string', 'order': 'asc' }, { 'path': 'room_id', 'datatype': 'string', 'order': 'asc' }, ] } collection.createIndex(index_def)
Enter fullscreen mode Exit fullscreen mode
Easy, isn’t it? Now let’s create our second index. Let me show you the code first:
index_search_def ={"name" : "SEARCH_AND_DATA_GUIDE_IDX","dataguide" : "on","search_on" : "text" }collection.createIndex(index_search_def)index_search_def ={ "name" : "SEARCH_AND_DATA_GUIDE_IDX", "dataguide" : "on", "search_on" : "text" } collection.createIndex(index_search_def)index_search_def ={ "name" : "SEARCH_AND_DATA_GUIDE_IDX", "dataguide" : "on", "search_on" : "text" } collection.createIndex(index_search_def)
Enter fullscreen mode Exit fullscreen mode
In this case, the index is called “search_and_data_guide_idx”. As the name describes, it has double purpose. There is a first parameter which is called “dataguide”. Data guide provides a summary of the structural and type information about our JSON documents. Later on we will have a look into it. This parameter can be on or off.
The second parameter is “search_on”. This parameter can have two values: text or text_value. If your search index involve only text search or string-equality search you can use text. If you are interested in numeric ranges as well as text, you have to use text_value.
Now that we have defined the index, let’s do a query where we are looking for bookings that are talking about checking, because the customer had added a comment into the booking:
documents = collection.find().filter({"$textContains" : "check%"}).getDocuments()print('\n Found the following documents talking about check:')for d in documents:content = d.getContent()print(content)documents = collection.find().filter({"$textContains" : "check%"}).getDocuments() print('\n Found the following documents talking about check:') for d in documents: content = d.getContent() print(content)documents = collection.find().filter({"$textContains" : "check%"}).getDocuments() print('\n Found the following documents talking about check:') for d in documents: content = d.getContent() print(content)
Enter fullscreen mode Exit fullscreen mode
We can see we have a reservation with the following comment: Please arrange for a 2pm checkout.
As promised before, let’s have a look at the Data Guide. One of the great benefits of JSON is the flexibility, and could be also one of the problems! If we want to know if there are any document which is not following the expected structure or if I want to know how it is, we can get this data guide. Let’s have a look:
data_guide= collection.getDataGuide().getContent()print(json.dumps(data_guide, indent=1))data_guide= collection.getDataGuide().getContent() print(json.dumps(data_guide, indent=1))data_guide= collection.getDataGuide().getContent() print(json.dumps(data_guide, indent=1))
Enter fullscreen mode Exit fullscreen mode
You should get the following JSON document. We can get all the info about number of fields, length and preferred column name.
>>> print(json.dumps(data_guide, indent=1)){"type": "object","o:length": 1,"properties": {"_id": {"type": "string","o:length": 32,"o:preferred_column_name": "DATA$_id"},"room_id": {"type": "string","o:length": 4,"o:preferred_column_name": "DATA$room_id"},"hotel_id": {"type": "string","o:length": 4,"o:preferred_column_name": "DATA$hotel_id"},"guest_name": {"type": "object","o:length": 1,"o:preferred_column_name": "DATA$guest_name","properties": {"last_name": {"type": "string","o:length": 16,"o:preferred_column_name": "DATA$last_name"},"first_name": {"type": "string","o:length": 8,"o:preferred_column_name": "DATA$first_name"}}},"num_adults": {"type": "number","o:length": 1,"o:preferred_column_name": "DATA$num_adults"},"total_cost": {"type": "number","o:length": 4,"o:preferred_column_name": "DATA$total_cost"},"checkin_date": {"type": "string","o:length": 16,"o:preferred_column_name": "DATA$checkin_date"},"num_children": {"type": "number","o:length": 1,"o:preferred_column_name": "DATA$num_children"},"checkout_date": {"type": "string","o:length": 16,"o:preferred_column_name": "DATA$checkout_date"},"payment_status": {"type": "string","o:length": 4,"o:preferred_column_name": "DATA$payment_status"},"reservation_id": {"type": "string","o:length": 1,"o:preferred_column_name": "DATA$reservation_id"},"guest_contact_info": {"type": "object","o:length": 1,"o:preferred_column_name": "DATA$guest_contact_info","properties": {"email": {"type": "string","o:length": 32,"o:preferred_column_name": "DATA$email"},"phone": {"type": "string","o:length": 8,"o:preferred_column_name": "DATA$phone"},"address": {"type": "object","o:length": 1,"o:preferred_column_name": "DATA$address","properties": {"city": {"type": "string","o:length": 16,"o:preferred_column_name": "DATA$city"},"country": {"type": "string","o:length": 8,"o:preferred_column_name": "DATA$country"}}}}},"additional_requests": {"type": "array","o:length": 1,"o:preferred_column_name": "DATA$additional_requests","items": {"properties": {"type": {"type": "string","o:length": 16,"o:preferred_column_name": "DATA$type"},"details": {"type": "string","o:length": 64,"o:preferred_column_name": "DATA$details"},"quantity": {"type": "number","o:length": 1,"o:preferred_column_name": "DATA$quantity"}}}}}}>>>>>> print(json.dumps(data_guide, indent=1)) { "type": "object", "o:length": 1, "properties": { "_id": { "type": "string", "o:length": 32, "o:preferred_column_name": "DATA$_id" }, "room_id": { "type": "string", "o:length": 4, "o:preferred_column_name": "DATA$room_id" }, "hotel_id": { "type": "string", "o:length": 4, "o:preferred_column_name": "DATA$hotel_id" }, "guest_name": { "type": "object", "o:length": 1, "o:preferred_column_name": "DATA$guest_name", "properties": { "last_name": { "type": "string", "o:length": 16, "o:preferred_column_name": "DATA$last_name" }, "first_name": { "type": "string", "o:length": 8, "o:preferred_column_name": "DATA$first_name" } } }, "num_adults": { "type": "number", "o:length": 1, "o:preferred_column_name": "DATA$num_adults" }, "total_cost": { "type": "number", "o:length": 4, "o:preferred_column_name": "DATA$total_cost" }, "checkin_date": { "type": "string", "o:length": 16, "o:preferred_column_name": "DATA$checkin_date" }, "num_children": { "type": "number", "o:length": 1, "o:preferred_column_name": "DATA$num_children" }, "checkout_date": { "type": "string", "o:length": 16, "o:preferred_column_name": "DATA$checkout_date" }, "payment_status": { "type": "string", "o:length": 4, "o:preferred_column_name": "DATA$payment_status" }, "reservation_id": { "type": "string", "o:length": 1, "o:preferred_column_name": "DATA$reservation_id" }, "guest_contact_info": { "type": "object", "o:length": 1, "o:preferred_column_name": "DATA$guest_contact_info", "properties": { "email": { "type": "string", "o:length": 32, "o:preferred_column_name": "DATA$email" }, "phone": { "type": "string", "o:length": 8, "o:preferred_column_name": "DATA$phone" }, "address": { "type": "object", "o:length": 1, "o:preferred_column_name": "DATA$address", "properties": { "city": { "type": "string", "o:length": 16, "o:preferred_column_name": "DATA$city" }, "country": { "type": "string", "o:length": 8, "o:preferred_column_name": "DATA$country" } } } } }, "additional_requests": { "type": "array", "o:length": 1, "o:preferred_column_name": "DATA$additional_requests", "items": { "properties": { "type": { "type": "string", "o:length": 16, "o:preferred_column_name": "DATA$type" }, "details": { "type": "string", "o:length": 64, "o:preferred_column_name": "DATA$details" }, "quantity": { "type": "number", "o:length": 1, "o:preferred_column_name": "DATA$quantity" } } } } } } >>>>>> print(json.dumps(data_guide, indent=1)) { "type": "object", "o:length": 1, "properties": { "_id": { "type": "string", "o:length": 32, "o:preferred_column_name": "DATA$_id" }, "room_id": { "type": "string", "o:length": 4, "o:preferred_column_name": "DATA$room_id" }, "hotel_id": { "type": "string", "o:length": 4, "o:preferred_column_name": "DATA$hotel_id" }, "guest_name": { "type": "object", "o:length": 1, "o:preferred_column_name": "DATA$guest_name", "properties": { "last_name": { "type": "string", "o:length": 16, "o:preferred_column_name": "DATA$last_name" }, "first_name": { "type": "string", "o:length": 8, "o:preferred_column_name": "DATA$first_name" } } }, "num_adults": { "type": "number", "o:length": 1, "o:preferred_column_name": "DATA$num_adults" }, "total_cost": { "type": "number", "o:length": 4, "o:preferred_column_name": "DATA$total_cost" }, "checkin_date": { "type": "string", "o:length": 16, "o:preferred_column_name": "DATA$checkin_date" }, "num_children": { "type": "number", "o:length": 1, "o:preferred_column_name": "DATA$num_children" }, "checkout_date": { "type": "string", "o:length": 16, "o:preferred_column_name": "DATA$checkout_date" }, "payment_status": { "type": "string", "o:length": 4, "o:preferred_column_name": "DATA$payment_status" }, "reservation_id": { "type": "string", "o:length": 1, "o:preferred_column_name": "DATA$reservation_id" }, "guest_contact_info": { "type": "object", "o:length": 1, "o:preferred_column_name": "DATA$guest_contact_info", "properties": { "email": { "type": "string", "o:length": 32, "o:preferred_column_name": "DATA$email" }, "phone": { "type": "string", "o:length": 8, "o:preferred_column_name": "DATA$phone" }, "address": { "type": "object", "o:length": 1, "o:preferred_column_name": "DATA$address", "properties": { "city": { "type": "string", "o:length": 16, "o:preferred_column_name": "DATA$city" }, "country": { "type": "string", "o:length": 8, "o:preferred_column_name": "DATA$country" } } } } }, "additional_requests": { "type": "array", "o:length": 1, "o:preferred_column_name": "DATA$additional_requests", "items": { "properties": { "type": { "type": "string", "o:length": 16, "o:preferred_column_name": "DATA$type" }, "details": { "type": "string", "o:length": 64, "o:preferred_column_name": "DATA$details" }, "quantity": { "type": "number", "o:length": 1, "o:preferred_column_name": "DATA$quantity" } } } } } } >>>
Enter fullscreen mode Exit fullscreen mode
We can use this Data Guide for creating a table to make it very easy to run SQL over it, but this will be in a different chapter 😉
原文链接:Python Developer Track for Oracle JSON and Duality Views – Part 5: Indexing
暂无评论内容