What is a Vector
A Vector is an object that represents a real-world item as an array of floating numbers.
Each item in the real world is represented in Vector format(as an array) and has many dimensions (attributes) associated with the object based on its characteristics.
For example, if we want to represent Colours in Vector format, we can create an array of attribute values. Example [ “R”,”G”,”B”]
Each colour in an RGB image is represented by three values: the amount of red, green, and blue light present. These values typically range from 0 to 255, indicating the intensity of each colour component.
Pure Red= [ “255”,”0”,”0”]
Where:
-
- R represents the intensity of red (in this case, maximum intensity, 255),
- G represents the intensity of green (in this case, 0, so no green),
- B represents the intensity of blue (in this case, 0, so no blue).
Similarly, you can represent any colour using this RGB vector format, with values ranging from 0 to 255 for each colour channel.
If we want to find close matches for the colour red, we can find out based on the first attribute value of the colour.
Real world objects can have many other attributes that they have to represent and, therefore, a Vector representing a real world object is represented by a larger array of 512, 1028, 1536 or 2048 attribute values.
What is Vector Search?
Vector search is a method of finding items based on their vector representation. In vector search, each item is represented in multidimensional space where each dimension represents the value of the attribute of the item.
More details can be found at:
Industry-wide Use Cases
Vector search can be used across industries for various use cases, here are a few of them:
-
- Content Generation
- Anomaly detection
- Hybrid Search
- AI powered chatbots.
Vector Search vs. Full Text Search?
Vector search and full-text search are both methods used for searching through collections of data, but they operate in different ways and are suited to different types of data and use cases.
Full text search: is a technique used in information retrieval to search and analyse textual content within documents or databases. Unlike traditional search methods that match exact phrases or keywords, full text search engines analyse the content of documents or records to match search queries based on the meaning and context of the words.
Comparison area | Full text search | Vector Search | |
1 | Representation of data | Data is represented as documents of text or strings | Data is represented as vectors in multidimensional space |
2 | Matching criteria | Exact or fuzzy match | Nearest neighbouring match |
3 | Search | Textual search or comparison | Contextual search or comparison based on attributes of object. |
4 | Use case | Searching through a document, web page, email content, etc. | Searching through audio, video, image, text, etc. |
Why Couchbase for Vector Search?
-
- Vector across our products: First in the industry to announce support for all 3 deployments: cloud, on-prem, mobile.
- Broad Capabilities: Integrated Cache, Full Text Search, Analytical Search, Time Series, Key-Value, Eventing and other features along with Vector search into single platform.
- Ecosystem integration: LangChain and LlamaIndex integration.
- Proven Speed and Flexibility: In-memory architecture, flexible json format and powering indexing.
More details can be found in our vector search release announcement.
Prerequisites
- Couchbase Capella or Couchbase Server 7.6 EE
- Have already created a database
- Sample data:
- Download file color_data_2vectors.zip
- For this example, we will use the rgb.json file
- Index file: color-index.json
- color-index.json:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124{"type": "fulltext-index","name": "color-index","sourceType": "gocbcore","sourceName": "vector-sample","sourceUUID": "789365cccdf940ee2814a5dd2752040a","planParams": {"maxPartitionsPerPIndex": 512,"indexPartitions": 1},"params": {"doc_config": {"docid_prefix_delim": "","docid_regexp": "","mode": "scope.collection.type_field","type_field": "type"},"mapping": {"analysis": {},"default_analyzer": "standard","default_datetime_parser": "dateTimeOptional","default_field": "_all","default_mapping": {"dynamic": false,"enabled": false},"default_type": "_default","docvalues_dynamic": false,"index_dynamic": false,"store_dynamic": false,"type_field": "_type","types": {"color.rgb": {"dynamic": false,"enabled": true,"properties": {"brightness": {"dynamic": false,"enabled": true,"fields": [{"index": true,"name": "brightness","store": true,"type": "number"}]},"color": {"dynamic": false,"enabled": true,"fields": [{"analyzer": "en","index": true,"name": "color","store": true,"type": "text"}]},"colorvect_dot": {"dynamic": false,"enabled": true,"fields": [{"dims": 3,"index": true,"name": "colorvect_dot","similarity": "dot_product","type": "vector"}]},"colorvect_l2": {"dynamic": false,"enabled": true,"fields": [{"dims": 3,"index": true,"name": "colorvect_l2","similarity": "l2_norm","type": "vector"}]},"description": {"dynamic": false,"enabled": true,"fields": [{"analyzer": "en","index": true,"name": "description","store": true,"type": "text"}]},"embedding_vector_dot": {"dynamic": false,"enabled": true,"fields": [{"dims": 1536,"index": true,"name": "embedding_vector_dot","similarity": "dot_product","type": "vector"}]}}}}},"store": {"indexType": "scorch","segmentVersion": 16}},"sourceParams": {}}
- color-index.json:
- Sample search definition:
{ "fields": ["*"], "query": { "match_none": "" }, "knn": [ { "k": 2, "field": "colorvect_l2", "vector": [ 0, 0, 128 ] } ] }
Steps
Create Sample Data
Open Capella UI, Go to Database, and start importing data using import from browser using data tools:
-
- Use sample rgb.json data file provided in prerequisites.
- Choose option load from browser.
- Select file: rgb.json
- Specify new bucket with name: vector-sample
- Specify new scope with name: color
- Specify new collection with name: rgb
- In step 3, preview your data,
Choose how Capella creates identifiers for each of your documents. Select option as field and specify field: Id as identified as shown in screenshot below. - Click Import.
Create Vector Search Index
In Search Options under Data Tools:
-
- Create Search Index
- Select advanced mode
- Click on Index Definition on the right side of the UI
- Select option Import from file
- Choose file color-index.json specified in prerequisites
- Specify Index Name as: color-index.json
- Choose bucket: vector-sample
- Scope will be auto-populated as color
- Click on Create Index
- Create Search Index
Perform a Vector Search
Select the Search option in the color-index row (button near far right)
Paste the search text from the prerequisite step into the Search window.
Click on Search to get a result (shows in window below the search text).
Conclusion
In this post, we have gone through the basics of what vector search is and how to quickly get started with Vector Search with Couchbase.
After executing a basic vector search, one can easily combine SQL queries with vector search in Couchbase, helping to consolidate your database stack and avoid writing multiple queries to get a single meaningful result for an application.
Start for free
-
- Start your 30-day trial account for Capella to run your first experiment today!
References