declare
l_search_text varchar2(100);
l_search_float vector(*, float32);
l_search_binary vector(*, binary);
l_results sys.odcivarchar2list;
begin
l_search_text := 'A wickedly delectable, delicious and decadent dessert.';
select
vector_embedding(
ALL_MINILM_L12_V2
using l_search_text as data)
, to_binary_vector(
vector_embedding(
MXBAI_EMBED_LARGE_V1
using l_search_text as data)
)
into l_search_float, l_search_binary;
with fast_binary_search as (
select
i.item_name
, v.embedding3 as embedding_float
from
menu_items i
join menu_vectors v on i.item_id = v.item_id
order by
vector_distance(
v.embedding1_binary
, l_search_binary
, hamming)
fetch first 25 rows only
), precise_float_search as (
select item_name
from fast_binary_search b
order by
vector_distance(
b.embedding_float
, l_search_float
, cosine)
fetch first 5 rows only
)
select item_name
bulk collect into l_results
from precise_float_search;
for v in values of l_results loop
dbms_output.put_line(v);
end loop;
end;
/
Turtle Cheesecake
Chocolate Soufflé
Chocolate Mousse
Peach Cobbler
Molten Lava Cake
In the previous post (Binary Vectors: Quantization) we created a scalar macro to convert vectors to binary vectors. This gives us a tool to work with binary vectors in the database while we cannot load models that generate binary vectors directly. With binary vectors loaded into tables used for semantic searches, we open up the possibilities for highly optimized reranking searches combining multiple models in the same search.
In this post we will explore persisting these converted binary vectors and using them for semantic searches. After some basic searches, we will be able to take a look at the concept of a reranking search that uses multiple models in the same semantic search.
To accomplish this we will quickly review the process of loading LLMs to the database to generate vector embeddings. Then we will set up test data, generate embeddings and convert these to binary vectors with the macro. This will give us the ability to compare semantic searches with binary vectors to the same searches using float vectors.
- Getting Started
- Load Large Language Models
- Set Up Sample Data and Vector Tables
- Create The Scalar Macro TO_BINARY_VECTOR
- Set up Menu Vectors Test Data
- Persisting the Binary Vectors
- Semantic Search with Binary Vectors
- Reranking Search With Binary Vectors
- Mixed Model Reranking Searches
- Conclusion
Getting Started
Scripts for this post are located in vector-search/binary-vector-search. These scripts requires a minimum version of Oracle 23ai release 7 to support binary vectors. All scripts have been tested in OCI Autonomous databases and the Oracle 23ai Free database VirtualBox appliance. At the time of writing this article, both the OCI Autonomous Database and the VirtualBox appliance are at Oracle 23ai Release 9.
The source code for all articles here on practicalplsql.org is available in the practicalplsql public repository on github in the scripts/articles directory organized by concept and article. You can use create.user.practicalplsql.sql to create the practicalplsql schema. Sample data for articles can be found in the scripts/examples directory organized by example name. This article will be using the examples in scripts/examples/sample_foods as a starting point for test data.
The code for this article depends on loading some of the example models from Hugging face into the database after exporting the models in ONNX format. The article Quickstart article on OML4PY 2.1 has instructions on setting up and using Python with OML4PY 2.1 to generate properly prepared LLM mining models, examples for loading models to the database, and links to downloadable models in ONNX format ready for use stored in my Google Drive. All necessary scripts to load the models used in this article are included in the source code for this article.
Load Large Language Models
Rendering a picture of a sunset in black and white would basically remove the concept of sunset from the picture. The same is true for scalar binary quantization. If the model generates vectors that are all positive values, the resulting binary vectors would be all ones for every dimension value. The LLMs from the MixedBread developers on Hugging Face were designed for binary quantization with no accuracy loss. In this post I will be working with the MixedBread models mxbai-embed-large-v1 and mxbai-embed-xsmall-v1 to produce vectors that are intended for binary quantization. Both of these LLMs are in the sample models that can be prepared using OML4PY2.1. I have provided ready to load models in ONNX format here: mxbai-embed-large-v1, mxbai-embed-xsmall-v1.
Once the models are in a file system that is mapped to a database directory, they can be easily loaded to the database. We can create a procedure to load them and show what kind of embeddings they generate. For now we will set this procedure up to use a local database with a directory mapping. I am using an Always Free 23.9 VirtualBox database. Note: An alternate form of the procedure could be created to load models to an OCI Autonomous Database from an object storage bucket.
create.procedure.load_onnx_model.local.sql
create or replace procedure load_onnx_model(f in varchar2, m in varchar2)
is
l_info varchar2(100);
begin
dbms_vector.load_onnx_model('ML_MODELS_DIR', f, m);
dbms_output.put_line('Loaded ONNX Model ' || m);
select a.vector_info into l_info
from user_mining_model_attributes a
where
a.model_name = upper(m)
and a.attribute_type = 'VECTOR';
dbms_output.put_line('Model generates ' || l_info);
exception
when others then
dbms_output.put_line(sqlerrm);
end load_onnx_model;
/
Run the script to load the mixed bread models.
load.models.mxbai.sql
set serveroutput on;
begin
load_onnx_model('mxbai-embed-large-v1.onnx', 'mxbai_embed_large_v1');
load_onnx_model('mxbai-embed-xsmall-v1.onnx', 'mxbai_embed_xsmall_v1');
end;
/
Loaded ONNX Model mxbai_embed_large_v1
Model generates VECTOR(1024,FLOAT32)
Loaded ONNX Model mxbai_embed_xsmall_v1
Model generates VECTOR(384,FLOAT32)
We can also load the popular all_miniLM models for comparisons. The source models on Hugging Face are sentence-transformers/all-MiniLM-L6-v2 and sentence-transformers/all-MiniLM-L12-v2. You can download exported models that in ONNX format with OML4PY2.1 here: all-MiniLM-L6.onnx and all-MiniLM-L12-v2.onnx. After making these models available in a directory exposed to the database, run the script to load these models.
load.models.all_miniLM.sql
set serveroutput on;
begin
load_onnx_model('all-MiniLM-L6-v2.onnx', 'all_MiniLM_L6_v2');
load_onnx_model('all-MiniLM-L12-v2.onnx', 'all_MiniLM_L12_v2');
end;
/
Loaded ONNX Model all_MiniLM_L6_v2
Model generates VECTOR(384,FLOAT32)
Loaded ONNX Model all_MiniLM_L12_v2
Model generates VECTOR(384,FLOAT32)
Set Up Sample Data and Vector Tables
We will need test data so that we can generate some vectors and work with converting them to binary format. Run the script in examples/sample-foods to create the recipes, menu_categories and menu_items tables and load them with test data.
--sample-foods/create.examples.all.sql
prompt creating all sample foods examples
@create.example.recipes.sql
@create.example.menus.sql
prompt created all sample foods examples
Table RECIPES created.
Created 15 simple recipes
Table MENU_CATEGORIES created.
Table MENU_ITEMS created.
Created 14 menu categories
Created 352 menu items
Menu Category Appetizers: 14 menu items created.
Menu Category Beverages: 10 menu items created.
Menu Category Breakfast: 13 menu items created.
Menu Category Desserts: 136 menu items created.
Menu Category Gluten-Free: 6 menu items created.
Menu Category Main Courses: 8 menu items created.
Menu Category Pastas: 13 menu items created.
Menu Category Pizzas: 11 menu items created.
Menu Category Salads: 56 menu items created.
Menu Category Sandwiches: 18 menu items created.
Menu Category Seafood: 8 menu items created.
Menu Category Soups: 11 menu items created.
Menu Category Vegan: 24 menu items created.
Menu Category Vegetarian: 24 menu items created.
Using this data as a starting point, we can create some tables to store vectors based on the descriptive text in the sample data.
For basic testing, create a copy of the recipes table with added columns for vector embeddings. Also create a column named embedding_model to keep track of what model was used to generate the stored vectors.
create.table.recipe_vectors.sql
--requires loaded recipes table from examples/sample_foods recipes example
drop table if exists recipe_vectors purge
/
create table recipe_vectors as
select name, doc
from recipes
/
alter table recipe_vectors add(
embedding vector(*,*)
, embedding_model varchar2(50)
, embedding_binary vector(*, binary)
)
/
describe recipe_vectors;
Table RECIPE_VECTORS created.
Table RECIPE_VECTORS altered.
Name Null? Type
---------------- -------- ----------------------
NAME NOT NULL VARCHAR2(100 CHAR)
DOC VARCHAR2(4000 CHAR)
EMBEDDING VECTOR(*,*,DENSE)
EMBEDDING_MODEL VARCHAR2(50)
EMBEDDING_BINARY VECTOR(*,BINARY,DENSE)
Generate float32 vectors for the rows in the recipe vectors table using the vector_embedding function with xsmall model from mixed bread.
generate.vectors.recipe_vectors.mxbai_xsmall.sql
set serveroutput on;
declare
l_start timestamp := localtimestamp;
l_model varchar2(100);
begin
update recipe_vectors
set
embedding = vector_embedding(MXBAI_EMBED_XSMALL_V1 using doc as data),
embedding_model = 'MXBAI_EMBED_XSMALL_V1';
commit;
select
embedding_model || ' ('
|| vector_dimension_count(embedding) || ' dimensions, '
|| vector_dimension_format(embedding) || ') '
into l_model
from recipe_vectors
where rownum = 1;
dbms_output.put_line('Created embeddings using ' || l_model);
dbms_output.put_line('Total Duration: ' || to_char(localtimestamp - l_start));
end;
/
update recipe_vectors.embeddings using MXBAI_EMBED_XSMALL_V1
Created embeddings using MXBAI_EMBED_XSMALL_V1 (384 dimensions, FLOAT32)
Total Duration: +000000000 00:00:00.121922000
The basic syntax for a semantic search uses the vector_distance function with a given metric and a search vector that can be generated using the vector_embedding function with a model that matches the vectors being compared.
--semantic_search.basic_syntax.sql
--current embeddings loaded should use embedding model MXBAI_EMBED_XSMALL_V1
select name
from recipe_vectors
order by
vector_distance(
embedding
, vector_embedding(MXBAI_EMBED_XSMALL_V1 using 'tasty dessert' as data)
, cosine)
fetch first 3 rows only
/
NAME
--------------------
Strawberry Pie
Chocolate Cake
Raspberry Tarts
Notice that the model argument to vector_embedding is an identifier created when the model was loaded and the metric used is also an identifier. This makes it necessary to write specialized queries to use different models or metrics.
Having the call to vector_embedding to convert the search term to a vector located in the order by clause will cause this function to be called with every row. It will be better to convert the search term to a vector first so that this conversion is only done once for each search.
Instead of writing a lot of similar queries, we can create a procedure to dynamically create the necessary queries using a given model, metric and search term. The procedure can also convert the search text to a vector and pass it as a bind variable to the actual search query. This procedure is a flexible tool to run a variety of vector searches easily.
create.procedure.search_recipe_vectors.sql
create or replace procedure search_recipe_vectors(
p_search in varchar2,
p_rows in number default 3,
p_model in varchar2 default null,
p_metric in varchar2 default 'cosine')
is
l_search_vector vector;
type t_row is record (ranking number, name recipes.name%type);
type t_rows is table of t_row;
l_rows t_rows;
l_model varchar2(100);
l_start timestamp := localtimestamp;
l_search_vector_sql varchar2(4000);
l_sql varchar2(4000);
cv sys_refcursor;
c_dividing_line constant varchar2(80) := lpad('-', 80, '-');
begin
--validate the vectors currently loaded
select embedding_model
into l_model
from recipe_vectors
fetch first row only;
if p_model is not null and l_model <> p_model then
raise_application_error(-20100, l_model || ' embeddings found, expecting ' || p_model);
end if;
--generate the vector for the search text
l_search_vector_sql :=
'select vector_embedding(' || l_model || ' using :search_term as data)';
execute immediate l_search_vector_sql
into l_search_vector
using p_search;
l_sql :=
'select
rownum as ranking, name
from
(
select name
from recipe_vectors g
order by
vector_distance(
g.embedding
, :search_vector
, ' || p_metric || ')
fetch first :p_rows rows only
)';
dbms_output.put_line(c_dividing_line);
dbms_output.put_line('Run Test Vector Search For [' || p_search || ']');
open cv for l_sql
using l_search_vector, p_rows;
fetch cv bulk collect into l_rows;
close cv;
dbms_output.put_line('Search completed using ' || l_model);
dbms_output.put_line('Vector Distance Metric ' || p_metric);
dbms_output.put_line('Search duration: ' || to_char(localtimestamp - l_start));
dbms_output.put_line('Search text = [' || p_search || ']');
dbms_output.put_line('Top k=' || p_rows || ' results:');
for v in values of l_rows loop
dbms_output.put_line(v.ranking || ') ' || v.name);
end loop;
dbms_output.put_line(c_dividing_line);
end search_recipe_vectors;
/
Use this procedure to run some tests for vector searches with the small mixed bread model using float32 vector embeddings, a variety of metrics and a couple of different search terms.
search.vectors.recipe_vectors.mxbai_xsmall.sql
--generate the embeddings using the mxbai_xsmall model
@generate.vectors.recipe_vectors.mxbai_xsmall.sql
declare
type t_list is table of varchar2(100);
l_model varchar2(100) := 'MXBAI_EMBED_XSMALL_V1';
l_metrics t_list := t_list('cosine', 'euclidean', 'manhattan');
l_searches t_list := t_list('healthy dinner', 'yummy dessert');
begin
for s in values of l_searches loop
for m in values of l_metrics loop
search_recipe_vectors(s, 3, l_model, m);
end loop;
end loop;
end;
/
Reviewing the results on a search for ‘healthy dinner’ in the recipe_vectors table shows that all searches returned the same results. The timings for searching with different metrics showed some minor differences.
Run Test Vector Search For [healthy dinner]
Search completed using MXBAI_EMBED_XSMALL_V1
Top k=3 results:
1) Shepherd's Pie
2) Grilled Cheese Sandwiches
3) Curried Tofu
Vector Distance Metric cosine
Search duration: +000000000 00:00:00.004073000
Vector Distance Metric euclidean
Search duration: +000000000 00:00:00.002777000
Vector Distance Metric manhattan
Search duration: +000000000 00:00:00.003189000
Searching for a ‘yummy dessert’ came up with a different 3rd place option using the manhattan distance metric with all searches executed in approximately the same time.
Run Test Vector Search For [yummy dessert]
Search completed using MXBAI_EMBED_XSMALL_V1
Top k=3 results:
1) Strawberry Pie
2) Chocolate Cake
3) Raspberry Tarts [cosine and euclidean]
3) Banana, Mango and Blueberry Smoothie [manhattan]
Vector Distance Metric cosine
Search duration: +000000000 00:00:00.003149000
Vector Distance Metric euclidean
Search duration: +000000000 00:00:00.002980000
Vector Distance Metric manhattan
Search duration: +000000000 00:00:00.002708000
There are scripts in the repository to test all of the loaded models with these two searches using three different metrics: cosine, euclidean and manhattan. You can run them to get a feel for the results using different models and metrics. Each script updates the embeddings in the recipe_vectors table and then runs the same test searches with the appropriate model. Run the mxbai_xsmall script again last to leave the table with 384 dimension float32 vectors from the small mixed bread model. This will be the starting point for looking at binary vector conversions.
search.vectors.recipe_vectors.mxbai_xsmall.sql
search.vectors.recipe_vectors.mxbai_large.sql
search.vectors.recipe_vectors.all_miniLM_L6.sql
search.vectors.recipe_vectors.all_miniLM_L12.sql
Create The Scalar Macro TO_BINARY_VECTOR
In the previous post we designed a scalar macro function to convert vectors to binary vectors using SQL JSON functionality. For review, this is the final code for the macro, make sure it is compiled into the same schema that the models have been loaded into.
create.function.to_binary_vector.sql
create or replace function to_binary_vector(
p_input_vector in vector
) return varchar2
sql_macro (scalar)
is
begin
return q'~
select
to_vector(
json_serialize(
json_arrayagg(pb.uint_byte order by pb.byte#)
returning clob)
, *, binary)
from
(
select
p.byte#
, bin_to_num(p.b#1, p.b#2, p.b#3, p.b#4, p.b#5, p.b#6, p.b#7, p.b#8) as uint_byte
from
(
select
case sign(jt.dimval) when 1 then 1 else 0 end as dim_bitval
, mod(jt.dim#, 8) as bit#
, ceil(jt.dim#/8) as byte#
from
json_table (
json(vector_serialize(p_input_vector returning clob))
, '$[*]'
columns(
dim# for ordinality
, dimval number path '$'
)
) jt
) q
pivot(
max(q.dim_bitval) for bit# in (
1 as b#1, 2 as b#2, 3 as b#3, 4 as b#4
, 5 as b#5, 6 as b#6, 7 as b#7, 0 as b#8)
) p
) pb
~';
end to_binary_vector;
/
We can use the macro with any table that has a vector column. Try the test with the recipe_vectors table.
test.function.to_binary_vector.recipe_vectors.sql
select
v_model
, v_fmt, v_dims
, vector_dimension_format(v_binary) as v_bin_fmt
, vector_dimension_count(v_binary) as v_bin_dims
from
(
select
r.embedding_model as v_model
, vector_dimension_format(r.embedding) as v_fmt
, vector_dimension_count(r.embedding) as v_dims
, to_binary_vector(r.embedding) as v_binary
from recipe_vectors r
)
fetch first row only
/
V_MODEL V_FMT V_DIMS V_BIN_FMT V_BIN_DIMS
---------------------- --------- ---------- --------- ----------
MXBAI_EMBED_XSMALL_V1 FLOAT32 384 BINARY 384
The macro is working as expected, converting the stored float32 vectors to binary dimension format.
Set up Menu Vectors Test Data
To get more data to test with, lets use the menu_items table to test with a dataset of a few hundred vectors. If you created the full sample_foods example earlier, the menu_items table has been created and loaded with about 350 rows of test data.
Create the menu_vectors table with several vector columns to allow comparison of float32 vectors created with all the models loaded and the converted binary vectors. We will only keep the vectors in this table, using a join to menu_items on item_id to access the Item_name and item_description columns.
create.table.menu_vectors.sql
create table menu_vectors as
select item_id
from menu_items
/
alter table menu_vectors add(
embedding vector(*,*)
, embedding_model varchar2(50)
, embedding_binary vector(*, binary)
, embedding1 vector(*,*)
, embedding1_model varchar2(50)
, embedding1_binary vector(*, binary)
, embedding2 vector(*,*)
, embedding2_model varchar2(50)
, embedding2_binary vector(*, binary)
, embedding3 vector(*,*)
, embedding3_model varchar2(50)
, embedding3_binary vector(*, binary)
)
/
Run the script to generate vector embeddings using the item_description from the menu_items table and each of the four loaded models. The MXBAI_EMBED_LARGE_V1 model takes the longest time to generate embeddings.
generate.vectors.menu_vectors.sql
Updated 352 embeddings using MXBAI_EMBED_XSMALL_V1
Total duration: +000000000 00:00:04.851623000
Updated 352 embeddings using MXBAI_EMBED_LARGE_V1
Total duration: +000000000 00:02:02.573723000
Updated 352 embeddings using ALL_MINILM_L6_V2
Total duration: +000000000 00:00:07.123483000
Updated 352 embeddings using ALL_MINILM_L12_V2
Total duration: +000000000 00:00:13.843263000
We can use the scalar macro to convert any of these vectors to binary format dynamically in a query. We can try the embeddings generated by the mixed bread xsmall model which have 384 float32 dimensions, these are stored in the menu_vectors.embedding column.
test.function.to_binary_vector.menu_vectors.sql
select
count(*) items
, any_value(llm) as llm
, any_value(v_dim_cnt) || ' '
|| any_value(v_dim_fmt)
|| ' dimensions' as source_vector
, any_value(v_binary_dim_cnt) || ' '
|| any_value(v_binary_dim_fmt)
|| ' dimensions' as converted_vector
from
(
select
item_id, llm, v_dim_cnt, v_dim_fmt
, vector_dimension_count(v_binary) as v_binary_dim_cnt
, vector_dimension_format(v_binary) as v_binary_dim_fmt
from
(
select
item_id, embedding_model as llm
, vector_dimension_count(embedding) as v_dim_cnt
, vector_dimension_format(embedding) as v_dim_fmt
, to_binary_vector(embedding) as v_binary
from menu_vectors
)
)
/
ITEMS LLM SOURCE_VECTOR CONVERTED_VECTOR
---------- ---------------------- ------------------------ ------------------------
352 MXBAI_EMBED_XSMALL_V1 384 FLOAT32 dimensions 384 BINARY dimensions
It takes about twice as much time to convert the 1024 dimension float32 vectors generated from the mixed bread large model.
select
count(*) items
, any_value(llm) as llm
, any_value(v_dim_cnt) || ' '
|| any_value(v_dim_fmt)
|| ' dimensions' as source_vector
, any_value(v_binary_dim_cnt) || ' '
|| any_value(v_binary_dim_fmt)
|| ' dimensions' as converted_vector
from
(
select
item_id, llm, v_dim_cnt, v_dim_fmt
, vector_dimension_count(v_binary) as v_binary_dim_cnt
, vector_dimension_format(v_binary) as v_binary_dim_fmt
from
(
select
item_id, embedding1_model as llm
, vector_dimension_count(embedding1) as v_dim_cnt
, vector_dimension_format(embedding1) as v_dim_fmt
, to_binary_vector(embedding1) as v_binary
from menu_vectors
)
)
/
ITEMS LLM SOURCE_VECTOR CONVERTED_VECTOR
---------- ---------------------- ------------------------ ------------------------
352 MXBAI_EMBED_LARGE_V1 1024 FLOAT32 dimensions 1024 BINARY dimensions
Persisting the Binary Vectors
With the source vectors in the table, we can use the macro to update a column with the converted binary format vectors instead of dynamically doing the quantization in a query.
quantize_vectors.recipe_vectors.sql
begin
update recipe_vectors r
set r.embedding_binary = to_binary_vector(r.embedding);
dbms_output.put_line('Converted ' || sql%rowcount || ' vectors to binary format');
commit;
end;
/
Converted 15 vectors to binary format
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.026
While the mixed bread models are intended to be quantized in binary format, the other models we have loaded are not necessarily optimized for binary quantizing. We can still use the macro to convert these vectors in a single update statement. Later we can compare binary search results from the MiniLM models with the mixed bread models.
quantize_vectors.menu_vectors.sql
begin
update menu_vectors m
set
--quantize the mixed bread embeddings
m.embedding_binary = to_binary_vector(m.embedding)
, m.embedding1_binary = to_binary_vector(m.embedding1)
--quantize the all_minilm embeddings for comparison
, m.embedding2_binary = to_binary_vector(m.embedding2)
, m.embedding3_binary = to_binary_vector(m.embedding3);
dbms_output.put_line('Converted ' || (sql%rowcount * 4) || ' vectors to binary format');
commit;
end;
/
use the macro to quantize the all of the vectors in the menu vectors table
Converted 1408 vectors to binary format
Elapsed: 00:00:00.753
It did not really take long to run the full sql json based binary conversion of 1400 vectors. Now our test tables are all set up with plenty of vectors to test.
Semantic Search with Binary Vectors
Create a procedure to run various semantic searches against the recipe_vectors table with parameters for search term, distance metric and a switch to use binary embeddings or regular embeddings. If a binary search is requested, we will use the macro to convert the search term vector to a binary vector and we will use the embedding_binary column for the vector_distance function.
create.procedure.search_compare_recipe_vectors.sql
create or replace procedure search_compare_recipe_vectors(
p_search_text in varchar2,
p_rows in number default 3,
p_metric in varchar2 default 'cosine',
p_use_binary in boolean default false
)
is
l_model varchar2(100);
l_format varchar2(100);
l_dim_count number;
l_vector_column varchar2(100);
l_model_sql varchar2(1000);
l_search_v vector;
l_search_v_sql varchar2(4000);
l_sql varchar2(4000);
cv sys_refcursor;
l_start timestamp;
l_results sys.odcivarchar2list;
begin
--get the model, format, dimensions currently loaded to recipe_vectors
l_vector_column := 'embedding' || case when p_use_binary then '_binary' end;
l_model_sql := '
select
embedding_model
, vector_dimension_format(##VECTOR_COLUMN##)
, vector_dimension_count(##VECTOR_COLUMN##)
from recipe_vectors
where rownum = 1';
l_model_sql := replace(l_model_sql, '##VECTOR_COLUMN##', l_vector_column);
execute immediate l_model_sql
into l_model, l_format, l_dim_count;
dbms_output.put_line(
'Semantic search for [' || p_search_text || '] top k=' || p_rows);
dbms_output.put_line(
'Model ' || l_model || ', metric ' || p_metric);
dbms_output.put_line(
'Use Column ' || l_vector_column
|| ', dimensions ' || l_dim_count || ', format ' || l_format);
--convert the search string to a vector using the correct model and format
--wrap the call to vector_embeddings with the macro for a binary search
if p_use_binary then
l_search_v_sql := '
select
to_binary_vector(
vector_embedding(##MODEL## using :search_text as data))
';
else
l_search_v_sql := '
select
vector_embedding(##MODEL## using :search_text as data)
';
end if;
l_search_v_sql := replace(l_search_v_sql, '##MODEL##', l_model);
l_start := localtimestamp;
execute immediate l_search_v_sql
into l_search_v using p_search_text;
dbms_output.put_line('Time to get search vector: ' || to_char(localtimestamp - l_start));
--build the semantic search query
l_sql := '
select name
from recipe_vectors
order by vector_distance(##VECTOR_COLUMN##, :l_search_vector, ##METRIC##)
fetch first :results rows only
';
l_sql := replace(l_sql, '##METRIC##', p_metric);
l_sql := replace(l_sql, '##VECTOR_COLUMN##', l_vector_column);
l_start := localtimestamp;
open cv for l_sql using l_search_v, p_rows;
fetch cv bulk collect into l_results;
close cv;
dbms_output.put_line('Time to run search: ' || to_char(localtimestamp - l_start));
for v in values of l_results loop
dbms_output.put_line(v);
end loop;
dbms_output.put_line(lpad('-', 50, '-'));
end search_compare_recipe_vectors;
/
We can use this procedure to compare searches with different metrics for binary vectors and float vectors. For binary searches, we have to use either the hamming or jaccard distance metrics.
--search.compare_binary.recipe_vectors.sql
declare
l_search_text varchar2(100);
begin
l_search_text := 'a deliciously decadent dessert sounds good';
search_compare_recipe_vectors(l_search_text, 3, 'cosine', p_use_binary => false);
search_compare_recipe_vectors(l_search_text, 3, 'euclidean', p_use_binary => false);
search_compare_recipe_vectors(l_search_text, 3, 'manhattan', p_use_binary => false);
search_compare_recipe_vectors(l_search_text, 3, 'hamming', p_use_binary => true);
search_compare_recipe_vectors(l_search_text, 3, 'jaccard', p_use_binary => true);
end;
/
All of the searches produced the same results for binary and non binary searches based on vectors from mxbai small.
Semantic search for [a deliciously decadent dessert sounds good] top k=3
Strawberry Pie
Raspberry Tarts
Chocolate Cake
Create a similar procedure to test semantic searches against the various vectors stored in the menu_vectors table produced by the different models.
--create.procedure.search_compare_menu_vectors.sql
create or replace procedure search_compare_menu_vectors(
p_search_text in varchar2,
p_rows in number default 3,
p_metric in varchar2 default 'cosine',
p_vector_col in varchar2 default 'embedding',
p_use_binary in boolean default false
)
is
l_model varchar2(100);
l_format varchar2(100);
l_dim_count number;
l_vector_column varchar2(100);
l_search_v vector;
l_model_sql varchar2(1000);
l_search_v_sql varchar2(4000);
l_sql varchar2(4000);
cv sys_refcursor;
l_start timestamp;
l_results sys.odcivarchar2list;
begin
l_vector_column := p_vector_col || case when p_use_binary then '_binary' end;
--get the model and format currently loaded to menu_vectors. embedding column specified
l_model_sql := '
select
##COLUMN_BASE##_model
, vector_dimension_format(##VECTOR_COLUMN##)
, vector_dimension_count(##VECTOR_COLUMN##)
from menu_vectors
where rownum = 1';
l_model_sql := replace(l_model_sql, '##COLUMN_BASE##', p_vector_col);
l_model_sql := replace(l_model_sql, '##VECTOR_COLUMN##', l_vector_column);
execute immediate l_model_sql
into l_model, l_format, l_dim_count;
dbms_output.put_line('Semantic search for [' || p_search_text
|| '] top k=' || p_rows);
dbms_output.put_line('Model ' || l_model || ', metric ' || p_metric);
dbms_output.put_line('Use Column ' || l_vector_column
|| ', dimensions ' || l_dim_count
|| ', format ' || l_format);
--convert the search string to a vector using the correct model and format
if p_use_binary then
l_search_v_sql := '
select
to_binary_vector(
vector_embedding(##MODEL## using :search_text as data))
';
else
l_search_v_sql := '
select
vector_embedding(##MODEL## using :search_text as data)
';
end if;
l_search_v_sql := replace(l_search_v_sql, '##MODEL##', l_model);
l_start := localtimestamp;
execute immediate l_search_v_sql
into l_search_v using p_search_text;
dbms_output.put_line('Time to get search vector: ' || to_char(localtimestamp - l_start));
--build the semantic search query
l_sql := '
select mi.item_name
from
menu_items mi
join menu_vectors v on mi.item_id = v.item_id
order by vector_distance(##VECTOR_COLUMN##, :l_search_vector, ##METRIC##)
fetch first :results rows only
';
l_sql := replace(l_sql, '##METRIC##', p_metric);
l_sql := replace(l_sql, '##VECTOR_COLUMN##', l_vector_column);
open cv for l_sql using l_search_v, p_rows;
fetch cv bulk collect into l_results;
close cv;
dbms_output.put_line('Time to run search: ' || to_char(localtimestamp - l_start));
for v in values of l_results loop
dbms_output.put_line(v);
end loop;
dbms_output.put_line(lpad('-', 50, '-'));
end search_compare_menu_vectors;
/
Now we can run a variety of searches to compare different models easily.
search.compare_binary.menu_vectors.sql
declare
type t_list is table of varchar2(100);
l_searches t_list;
l_columns t_list;
l_metrics t_list;
begin
l_searches := t_list(
'a decadent dessert would be tasty'
, 'what kinds of healthy lunch options are there?'
);
l_columns := t_list('embedding', 'embedding1', 'embedding2', 'embedding3');
l_metrics := t_list('cosine','euclidean','manhattan');
<<searches>>
for s in values of l_searches loop
<<models>>
for c in values of l_columns loop
<<float_metrics>>
for m in values of l_metrics loop
search_compare_menu_vectors(s, 3, m, c, p_use_binary => false);
end loop float_metrics;
search_compare_menu_vectors(s, 3, 'hamming', c, p_use_binary => true);
search_compare_menu_vectors(s, 3, 'jaccard', c, p_use_binary => true);
end loop models;
end loop searches;
end;
/
For the small mixed bread model, all of the dessert searches took about the same amount of time. All of the returned results are almost the same in most cases, indicating that this binary search would be a good candidate for a fast initial search.
Semantic search for [a decadent dessert would be tasty] top k=3
Model MXBAI_EMBED_XSMALL_V1
Use Column embedding, dimensions 384, format FLOAT32
Cosine and Euclidean float searches produced the same results:
Turtle Cheesecake, Molten Lava Cake, Crème Brûlée
cosine search: +000000000 00:00:00.055934000
euclidean search: +000000000 00:00:00.007555000
manhattan search: +000000000 00:00:00.008049000
Turtle Cheesecake, Crème Brûlée, Molten Lava Cake
Use Column embedding_binary, dimensions 384, format BINARY
hamming search: +000000000 00:00:00.013352000
Turtle Cheesecake, Molten Lava Cake, Crème Brûlée
jaccard search: +000000000 00:00:00.007535000
Turtle Cheesecake, Molten Lava Cake, Chocolate Marble Cheesecake
Based on the performance of the cosine float search, the large mixed bread model is definitely not intended for cosine searches. All the other searches were consistent between the binary or float search timing. The results were also very consistent, although churro fans may lean towards the binary searches here.
Semantic search for [a decadent dessert would be tasty] top k=3
Model MXBAI_EMBED_LARGE_V1
Use Column embedding1, dimensions 1024, format FLOAT32
All float searches returned the same results:
Chocolate Soufflé, Turtle Cheesecake, Molten Lava Cake
cosine search: +000000000 00:00:04.004011000
euclidean search: +000000000 00:00:00.953499000
manhattan search: +000000000 00:00:00.941320000
Use Column embedding1_binary, dimensions 1024, format BINARY
Both binary searches returned the same results:
Turtle Cheesecake, Chocolate Soufflé, Churros
hamming search: +000000000 00:00:01.073347000
jaccard search: +000000000 00:00:00.941120000
With MiniLM L6, the cosine search also turned out to be the slowest performer. Surprisingly, the vectors converted to binary format returned results that are close to the float vector searches for this model. The vector space defined by this model may be satisfactory for binary conversions, at least for preliminary reranking searches.
Semantic search for [a decadent dessert would be tasty] top k=3
Model ALL_MINILM_L6_V2
Use Column embedding2, dimensions 384, format FLOAT32
cosine search: +000000000 00:00:00.362923000
Turtle Cheesecake, Molten Lava Cake, Crème Brûlée
euclidean search: +000000000 00:00:00.009617000
Turtle Cheesecake, Molten Lava Cake, Crème Brûlée
manhattan search: +000000000 00:00:00.009927000
Crème Brûlée, Turtle Cheesecake, Molten Lava Cake
Use Column embedding2_binary, dimensions 384, format BINARY
hamming search: +000000000 00:00:00.023431000
Molten Lava Cake, Hot Chocolate, Crème Brûlée
jaccard search: +000000000 00:00:00.009421000
Molten Lava Cake, Hot Chocolate, Coconut Cream Pie
The larger MiniLM L12 model also produced reasonable results with the converted binary searches for desserts.
Semantic search for [a decadent dessert would be tasty] top k=3
Model ALL_MINILM_L12_V2
Use Column embedding3, dimensions 384, format FLOAT32
Top 3 results are the same for float searches, with slight variance in ordering.
cosine search: +000000000 00:00:00.467303000
Turtle Cheesecake, Chocolate Marble Cheesecake, Clafoutis
euclidean search: +000000000 00:00:00.018389000
Turtle Cheesecake, Chocolate Marble Cheesecake, Clafoutis
manhattan search: +000000000 00:00:00.018366000
Turtle Cheesecake, Clafoutis, Chocolate Marble Cheesecake
Use Column embedding3_binary, dimensions 384, format BINARY
Both binary searches returned the same results
Clafoutis, Turtle Cheesecake, Qatayef
hamming search: +000000000 00:00:00.025727000
jaccard search: +000000000 00:00:00.017236000
The results for the lunch searches showed the same general results as the dessert searches. Here are the outcomes for the small mixed bread model:
Semantic search for [what kinds of healthy lunch options are there?] top k=3
Model MXBAI_EMBED_XSMALL_V1
Use Column embedding, dimensions 384, format FLOAT32
All float searches returned the same results:
Quinoa Salad, Egg Salad Sandwich, Cobb Salad
cosine search: +000000000 00:00:00.110066000
euclidean search: +000000000 00:00:00.008086000
manhattan search: +000000000 00:00:00.007773000
Use Column embedding_binary, dimensions 384, format BINARY
hamming search: +000000000 00:00:00.010658000
Quinoa Salad, Tofu Scramble, Apple Walnut Salad
jaccard search: +000000000 00:00:00.007808000
Quinoa Salad, Tofu Lasagna, Vegan Lasagna
Looking at the lunch searches, the large mixed bread model shows almost the same results for float searches as it does for binary searches. The timing on a binary hamming search with this model makes it less suitable for an initial search.
Semantic search for [what kinds of healthy lunch options are there?] top k=3
Model MXBAI_EMBED_LARGE_V1
Use Column embedding1, dimensions 1024, format FLOAT32
All float searches produced the same results:
Quinoa Salad, Egg Salad Sandwich, Lentil Salad
cosine search: +000000000 00:00:01.261661000
euclidean search: +000000000 00:00:00.958576000
manhattan search: +000000000 00:00:00.940072000
Use Column embedding1_binary, dimensions 1024, format BINARY
Both binary searches produced the same results:
Quinoa Salad, Egg Salad Sandwich, Asparagus Pea Salad
hamming search: +000000000 00:00:05.450606000
jaccard search: +000000000 00:00:00.867010000
A jaccard search on a binary vector offers fast performance with about the same search accurracy. This could be a good supplemental vector column in a table with embeddings with a minimal storage cost. If we do store multiple vector columns for a variety of searches, we also open the possibilities for reranking searches.
Reranking Search With Binary Vectors
One approach to a reranking search obtains a set of preliminary results using one model, and then run another search on these results using a different model. This approach can be used to offset slower performance with larger models by using an initial search with a smaller model using quantized vectors as the basis for final results refinement using the larger model.
We can run a reranking search in a single query using inline views to nest the preliminary search as the source of the final search. Setting aside the need to pass the search terms as a bind variable, the general syntax looks like this.
reranking_search.basic_syntax.sql
select rownum as float_rank, binary_rank, item_name
from
(
select binary_rank, item_name
from
(
select rownum as binary_rank, item_name, embedding
from
(
select i.item_name, v.embedding
from
menu_items i
join menu_vectors v on i.item_id = v.item_id
order by
vector_distance(
v.embedding_binary
, to_binary_vector(
vector_embedding(
MXBAI_EMBED_XSMALL_V1
using 'A really sweet treat would be good.' as data))
, hamming)
fetch first 25 rows only
)
)
order by
vector_distance(
embedding
, vector_embedding(
MXBAI_EMBED_XSMALL_V1
using 'A really sweet treat would be good.' as data)
, manhattan)
fetch first 5 rows only
)
/
FLOAT_RANK BINARY_RANK ITEM_NAME
---------- ----------- ------------------------------
1 1 Rocky Road
2 8 Snickerdoodle
3 22 Oatmeal Raisin Cookie
4 3 Fruit Salad
5 4 Sugar Cookie
Using a PL/SQL block, we can pass the search vectors to the reranking search query. We can also compare the performance and results to a search that only uses the float32 vectors.
--reranking_search.binary_and_float.same_model.mxbai_xsmall.sql
declare
l_start timestamp;
l_search_text varchar2(100);
v_float vector;
v_binary vector;
cursor c is
select rownum as float_rank, binary_rank,item_name
from
(
select binary_rank, item_name
from
(
select rownum as binary_rank, item_name, embedding
from
(
select i.item_name, i.item_description, v.embedding
from
menu_items i
join menu_vectors v on i.item_id = v.item_id
order by
vector_distance(
v.embedding_binary
, v_binary
, jaccard)
fetch first 20 rows only
)
)
order by
vector_distance(
embedding
, v_float
, manhattan)
fetch first 5 rows only);
begin
l_search_text := 'A really sweet and decadent dessert.';
l_start := localtimestamp;
--Generate search vectors
select vector_embedding(MXBAI_EMBED_XSMALL_V1 using l_search_text as data)
into v_float;
select to_binary_vector(v_float)
into v_binary;
dbms_output.put_line(
'Time to generate search vectors: ' || to_char(localtimestamp - l_start));
l_start := localtimestamp;
for r in c loop
dbms_output.put_line(
'Rank (Float ' || r.float_rank
|| ', Binary ' || rpad(r.binary_rank,2,' ')
|| ') ' || r.item_name);
end loop;
dbms_output.put_line(
'Nested Reranking Search Duration: ' || to_char(localtimestamp - l_start));
dbms_output.put_line(lpad('-',50,'-'));
dbms_output.put_line('Compare to search using only mxbai xsmall:');
search_compare_menu_vectors(l_search_text, 5, 'manhattan', 'embedding', p_use_binary => false);
end;
/
The mxbai xsmall model is pretty fast, however, the reranking approach to the same search using the binary vectors only took about half the time.
The results of both searches are identical.
Rank (Float 1, Binary 1 ) Turtle Cheesecake
Rank (Float 2, Binary 2 ) Molten Lava Cake
Rank (Float 3, Binary 7 ) Crème Brûlée
Rank (Float 4, Binary 6 ) Hot Chocolate
Rank (Float 5, Binary 3 ) Clafoutis
Nested Reranking Search Duration: +000000000 00:00:00.004549000
Time to run search: +000000000 00:00:00.008192000
Turtle Cheesecake
Molten Lava Cake
Crème Brûlée
Hot Chocolate
Clafoutis
The difference is more dramatic with the mxbai large model.
--reranking_search.binary_and_float.same_model.mxbai_large.sql
prompt run the initial search by binary vectors for top 20 hits
prompt use the mxbai large model with hamming metric
prompt then rerank the binary search results with search on float32 vectors
prompt use the mxbai large model float vectors with cosine metric
declare
l_start timestamp;
l_search_text varchar2(100);
v_float vector;
v_binary vector;
cursor c is
select rownum as float_rank, binary_rank,item_name
from
(
select binary_rank, item_name
from
(
select rownum as binary_rank, item_name, embedding
from
(
select i.item_name, i.item_description, v.embedding1 as embedding
from
menu_items i
join menu_vectors v on i.item_id = v.item_id
order by
vector_distance(
v.embedding1_binary
, v_binary
, jaccard)
fetch first 20 rows only
)
)
order by
vector_distance(
embedding
, v_float
, cosine)
fetch first 5 rows only);
begin
l_search_text := 'A healthy and nutritious dinner.';
l_start := localtimestamp;
--Generate search vectors
select vector_embedding(MXBAI_EMBED_LARGE_V1 using l_search_text as data)
into v_float;
select to_binary_vector(v_float)
into v_binary;
dbms_output.put_line(
'Time to generate search vectors: ' || to_char(localtimestamp - l_start));
l_start := localtimestamp;
for r in c loop
dbms_output.put_line(
'Rank (Float ' || r.float_rank
|| ', Binary ' || rpad(r.binary_rank,2,' ')
|| ') ' || r.item_name);
end loop;
dbms_output.put_line(
'Nested Reranking Search Duration: ' || to_char(localtimestamp - l_start));
dbms_output.put_line(lpad('-',50,'-'));
dbms_output.put_line('Compare to search using only mxbai large:');
search_compare_menu_vectors(l_search_text, 5, 'cosine', 'embedding1', p_use_binary => false);
end;
/
Using the larger model with 1024 dimensions, the binary reranking search took 0.00648 seconds, the simple search using only the float32 vectors took 0.9784 seconds-about two orders of magnitude difference in search performance.
The results are almost the same for the two searches.
Rank (Float 1, Binary 6 ) Lentil Salad
Rank (Float 2, Binary 3 ) Tofu and Kale Power Bowl
Rank (Float 3, Binary 5 ) Quinoa Salad
Rank (Float 4, Binary 8 ) Tofu and Mango Salad
Rank (Float 5, Binary 7 ) Farro Vegetable Salad
Nested Reranking Search Duration: +000000000 00:00:00.006480000
Time to run search: +000000000 00:00:00.978401000
Lentil Salad
Tofu and Kale Power Bowl
Tofu Buddha Bowl
Quinoa Salad
Tofu and Mango Salad
The mixed bread models both show clear optimizations in performance with the reranking approach used to search the binary vector space defined by the float vectors produced by the same model.
Mixed Model Reranking Searches
We don’t have to use vectors the same models for the reranking search if we have stored vectors from different models in the same table. We just need to make sure that the search vectors match the vectors being searched. This allows us to leverage the performance of a small quantized model to optimize the search with a large precise model.
Set up a reranking search with binary vectors converted from vectors generated by mxbai xsmall and float32 vectors generated from the miniLM_12 model. The inner search will get the top 25 results using the small binary vectors with a hamming distance search. The final search will rank these results using the float32 vector and cosine distance.
After executing the reranking search, run a top 5 search using just the miniLM_12 model float vectors and cosine distance for comparison.
reranking_search.binary_and_float.mixed_models.sql
prompt run the initial search by binary vectors for top 20 hits
prompt use the mxbai_xsmall binary vectors in menu_vectors.embedding_binary
prompt then rerank the binary search results with search on float32 vectors
prompt use the miniLM_12 float vectors in menu_vectors.embedding3
declare
l_start timestamp;
l_search_text varchar2(100);
v_float vector;
v_binary vector;
cursor c is
select rownum as float_rank, binary_rank,item_name
from
(
select binary_rank, item_name
from
(
select rownum as binary_rank, item_name, embedding
from
(
select i.item_name, i.item_description, v.embedding3 as embedding
from
menu_items i
join menu_vectors v on i.item_id = v.item_id
order by
vector_distance(
v.embedding_binary
, v_binary
, hamming)
fetch first 20 rows only
)
)
order by
vector_distance(
embedding
, v_float
, cosine)
fetch first 5 rows only);
begin
l_search_text := 'A healthy and nutritious dinner.';
l_start := localtimestamp;
dbms_output.put_line('Generate final search vector using ALL_MINILM_L12_V2');
select vector_embedding(ALL_MINILM_L12_V2 using l_search_text as data)
into v_float;
dbms_output.put_line('Generate binary search vector using MXBAI_EMBED_XSMALL_V1');
select
to_binary_vector(
vector_embedding(MXBAI_EMBED_XSMALL_V1 using l_search_text as data))
into v_binary;
dbms_output.put_line(
'Time to generate search vectors: ' || to_char(localtimestamp - l_start));
l_start := localtimestamp;
for r in c loop
dbms_output.put_line(
'Rank (Float ' || r.float_rank
|| ', Binary ' || rpad(r.binary_rank,2,' ')
|| ') ' || r.item_name);
end loop;
dbms_output.put_line(
'Nested Reranking Search Duration: ' || to_char(localtimestamp - l_start));
dbms_output.put_line(lpad('-',50,'-'));
dbms_output.put_line('Compare to search using only MiniLM12:');
search_compare_menu_vectors(l_search_text, 5, 'cosine', 'embedding3', p_use_binary => false);
end;
/
With mixed model reranking searches, we still get search performance that is several orders of magnitude better with about the same search results returned by the final query.
Rank (Float 1, Binary 19) Tofu and Lentil Shepherd's Pie
Rank (Float 2, Binary 4 ) Quinoa Salad
Rank (Float 3, Binary 1 ) Tofu and Kale Power Bowl
Rank (Float 4, Binary 6 ) Tofu Buddha Bowl
Rank (Float 5, Binary 7 ) Seitan and Spinach Lasagna
Nested Reranking Search Duration: +000000000 00:00:00.002750000
Same search without mixed model reranking: +000000000 00:00:00.110360000
Tofu and Lentil Shepherd's Pie
Tofu and Peanut Noodles
Chicken Caesar Wrap
Quinoa Salad
Tofu and Kale Power Bowl
Conclusion
Binary vectors may have a critical role to play in optimizing vector search performance, especially if we use them for the initial results phase of reranking searches. This would have a minimal requirement for additional storage of the binary vectors that would be more than balanced by the reranking search gains that are measurable in orders of magnitude.
It could also be reasonable to only store binary vectors and run initial searches using these vectors to gather a candidate set of search results quickly. These results could then be used to generate vectors from a larger more precise model dynamically for the final phase of reranking queries. This approach would all but eliminate the overhead of storing the vectors at a cost of calculating ten or twenty vectors at run time.
As we saw in this post, once we have the option to convert vectors to binary format, there are few limits to the explorations we can make to understand the usefullness and limitations for this compact vector datatype.
–Anthony Harper
Leave a reply to Binary Vectors: Quantization – Practical Pl/Sql Cancel reply