declare
l_row_differences boolean := true;
l_next_step varchar2(100);
begin
if l_row_differences then
l_next_step := 'unpivot to show column differences';
dbms_output.put_line(l_next_step);
end if;
end;
- Getting Started
- Using Unpivot To Find Column Diffferences
- Use Row Compare Macro with Unpivoted Row Data
- Convert Datatypes Using A Macro
- Creating The Dynamic Unpivot Macro
- Combining Macro Calls: The Problem
- The Column Differences Macro
- A Pipelined Table Function Alternative For Unpivoting
In Table Data Comparison – Row Differences we looked at comparing sets of rows from two sources to see which rows did not match. This is a very useful technique if you only want to know which rows are different, but it doesn’t identify which columns in these rows are different.
In this article we will look at approaches to determine the exact column differences between rows that are different. This query pattern requires a lot of manual SQL that is much easier to implement with SQL table macros. Defining these macros is a great workout in using the polymorphic table function types to access column information from query expressions dynamically.
Getting Started
The source code for all articles here on practicalplsql.org is available in the practicalplsql public repository on github in the scripts/articles directory organized by concept and article. You can use create.user.practicalplsql.sql to create the practicalplsql schema. Sample data for articles can be found in the scripts/examples directory organized by example name.
Scripts for this post are located in column-differences. I will be building on the sql queries, functions and macros from row-differences, and dynamic-unpivot-dataguide, they have been included in the source for this post with some modifications.
For the examples I will be using a simple table populated with products, replicated to two tables with some differences. This example is located at examples/table-data-compare. The example can be created by running the create_example.sql script. Larger datasets to compare can be found in the external tables at examples/geodata-ext-tables.
All scripts should be executed with serveroutput on, so this repetition is omitted for clarity. Some queries or code build on previous examples, these omit the repeated code for brevity, the source code has no omissions. Finally, when several scripts produce the same correct output, the output will only be shown once to avoid repetition.
Using Unpivot To Find Column Diffferences
If we unpivot the columns in each row, keeping row identifiers, the same approaches that we used to determine row differences can be used to identify columns that are different for each row.
The columns to be unpivoted must all be the same datatype and the columns to be unpivoted must all be explicitly specified. Use include nulls in the unpivot so that these columns show up in the final results. For simplicity, these first examples only unpivot the varchar2 columns in the test tables.
Once the rows are unpivoted to column value combinations, we can use the full outer join method to compare these decomposed rows by joining on the row identifiers, column names and column values. This example joins the column values using decode to make the join null aware.
prompt 1.1-query-unpivot-for-column-differences.sql
prompt unpivot to decompose rows and use full outer join comparison with decode
with unpivot_source as (
select product_id, code, column_name, column_value
from
products_source
unpivot include nulls (
column_value for column_name in (
name, description, style)
)
), unpivot_target as (
select product_id, code, column_name, column_value
from
products_target
unpivot include nulls (
column_value for column_name in (
name, description, style)
)
)
select
case when s.product_id is null then 'target' else 'source' end as row_source
, coalesce(s.product_id, t.product_id) as product_id
, coalesce(s.code, t.code) as code
, coalesce(s.column_name, t.column_name) as column_name
, coalesce(s.column_value, t.column_value) as column_value
from
unpivot_source s
full outer join unpivot_target t
on s.product_id = t.product_id
and s.code = t.code
and s.column_name = t.column_name
and decode(s.column_value, t.column_value, 1, 0) = 1
where
s.product_id is null or t.product_id is null
order by product_id, column_name, row_source
/
Now we see only the actual column differences between the rows in each table.
ROW_SOURCE PRODUCT_ID CODE COLUMN_NAME COLUMN_VALUE
---------- ---------- ------ ------------ ------------------------------
source 1 P-ES DESCRIPTION Mt. Everest Summit
target 1 P-ES DESCRIPTION Mount Everest Summit
source 2 P-EB DESCRIPTION Mt. Everest Basecamp
target 2 P-EB DESCRIPTION Mount Everest Basecamp
source 3 P-FD NAME Fujiyama Dawn
target 3 P-FD NAME Fuji Dawn
source 4 P-FS NAME Fujiyama Sunset
target 4 P-FS NAME Fuji Sunset
source 6 PC-ES DESCRIPTION Mt. Everest postcards
target 6 PC-ES DESCRIPTION Mount Everest postcards
source 6 PC-ES STYLE 5x7
target 6 PC-ES STYLE Monochrome
source 8 PC-K2 STYLE Color
target 8 PC-K2 STYLE (null)
source 9 PC-S DESCRIPTION Mount Shasta postcards
source 9 PC-S NAME Shasta Postcards
source 9 PC-S STYLE 5x7
17 rows selected.
We could also use the json_equal approach for the comparison of the decomposed rows.
prompt 1.2-query-unpivot-for-column-differences.sql
prompt unpivot to decompose rows and use full outer join comparison with json_equal
with unpivot_source as (
select product_id, code, column_name, column_value
from
products_source
unpivot include nulls (
column_value for column_name in (
name, description, style)
)
), unpivot_target as (
select product_id, code, column_name, column_value
from
products_target
unpivot include nulls (
column_value for column_name in (
name, description, style)
)
)
select
coalesce(s.row_source, t.row_source) as row_source
, coalesce(s.product_id, t.product_id) as product_id
, coalesce(s.code, t.code) as code
, coalesce(s.column_name, t.column_name) as column_name
, coalesce(s.column_value, t.column_value) as column_value
from
(
select 'source' as row_source, json_object(b.*) as jdoc, b.* from unpivot_source b
) s
full outer join (
select 'target' as row_source, json_object(b.*) as jdoc, b.* from unpivot_target b
) t
on s.product_id = t.product_id
and s.code = t.code
and s.column_name = t.column_name
and json_equal(s.jdoc, t.jdoc)
where
s.product_id is null or t.product_id is null
order by product_id, column_name, row_source
/
Use Row Compare Macro with Unpivoted Row Data
We have already seen how a macro can hide the implementation of determining the comparison between rows…this can also be used to simplify column comparisons. Let’s create a row compare macro and see how we can use it to help compare columns.
prompt 1.3-macro-row-differences.sql
prompt create a macro to find row differences using full outer join and json_equal
prompt don't include the json rows in the results
create or replace function find_row_differences(
source_data in dbms_tf.table_t,
target_data in dbms_tf.table_t,
id_columns in dbms_tf.columns_t
) return varchar2
sql_macro(table)
is
l_sql varchar2(4000);
l_id_columns varchar2(4000);
l_coalesce varchar2(4000);
l_all_columns varchar2(4000);
l_join_cols varchar2(4000);
l_column varchar2(128);
c_lf constant varchar2(1) := chr(10);
begin
select listagg(column_value,',')
into l_id_columns
from table(id_columns);
for c in 1..source_data.column.count loop
l_column := source_data.column(c).description.name;
l_coalesce := l_coalesce
|| ', coalesce(s.' || l_column
|| ', t.' || l_column
|| ') as ' || l_column || c_lf;
l_all_columns := l_all_columns || ', ' || l_column || c_lf;
end loop;
for i in 1..id_columns.count loop
l_join_cols := l_join_cols
|| case when l_join_cols is not null then ' and ' end
|| 's.' || id_columns(i) || ' = t.' || id_columns(i);
end loop;
l_sql := q'!
select
coalesce(s.row_source, t.row_source) as row_source
##COALESCE_COLUMNS##
from
(
select
'source' as row_source
##ALL_COLUMNS##
, json_object(*) as jdoc
from source_data
) s full outer join
(
select
'target' as row_source
##ALL_COLUMNS##
, json_object(*) as jdoc
from target_data
) t
on ##JOIN_ID_COLUMNS##
and json_equal(s.jdoc, t.jdoc)
where
s.##FIRST_ID_COLUMN## is null
or t.##FIRST_ID_COLUMN## is null
order by ##ID_COLUMNS##, row_source
!';
l_sql := replace(l_sql,'##ALL_COLUMNS##', l_all_columns);
l_sql := replace(l_sql,'##ID_COLUMNS##', l_id_columns);
l_sql := replace(l_sql,'##COALESCE_COLUMNS##', l_coalesce);
l_sql := replace(l_sql,'##JOIN_ID_COLUMNS##', l_join_cols);
l_sql := replace(l_sql,'##FIRST_ID_COLUMN##', id_columns(1));
return l_sql;
end find_row_differences;
/
Using the macro makes it clear that a column level comparision will unpivot the rows to decompose and then apply our standard row comparision logic.
prompt 1.4-query-unpivot-with-row-compare-macro.sql
prompt unpivot to decompose rows and then use row compare macro
with unpivot_source as (
select product_id, code, column_name, column_value
from
products_source
unpivot include nulls (
column_value for column_name in (
name, description, style)
)
), unpivot_target as (
select product_id, code, column_name, column_value
from
products_target
unpivot include nulls (
column_value for column_name in (
name, description, style)
)
)
select
row_source
, product_id
, code
, column_name
, column_value
from find_row_differences(
unpivot_source
, unpivot_target
, columns(product_id, code, column_name))
/
Even though the final comparison logic is now wrapped in the row compare macro, there is still a bit of code required to perform the initial unpivot. As the number of columns to be unpivoted increases, the amount of custom code required goes up for both unpivot expressions.
If we include columns that have varying datatypes, the unpivot would become more complex because all columns to be unpivoted need to be converted to the same datatype.
prompt 1.5-query-unpivot-different-datatypes-with-row-compare-macro.sql
prompt datatypes need to be the same for unpivot
prompt unpivot to decompose rows and then use row compare macro
with unpivot_source as (
select product_id, code, column_name, column_value
from
(
select
product_id
, code
, name
, description
, style
, to_char(msrp) as msrp --convert numbers explicitly to varchar
from products_source
)
unpivot include nulls (
column_value for column_name in (
name, description, style, msrp)
)
), unpivot_target as (
select product_id, code, column_name, column_value
from
(
select
product_id
, code
, name
, description
, style
, to_char(msrp) as msrp --convert numbers explicitly to varchar
from products_target
)
unpivot include nulls (
column_value for column_name in (
name, description, style, msrp)
)
)
select
row_source
, product_id
, code
, column_name
, column_value
from find_row_differences(
unpivot_source
, unpivot_target
, columns(product_id, code, column_name))
/
Coercing the datatypes prior to unpivoting allows the column differences to be found for mixed datatype columns. The resulting values will have to be in the same common datatype used for the unpivot.
...
source 3 P-FD MSRP 20
target 3 P-FD MSRP 19
source 3 P-FD NAME Fujiyama Dawn
target 3 P-FD NAME Fuji Dawn
...
Convert Datatypes Using A Macro
To get to a final macro that simplifies comparing column differences, we need to create an unpivot macro. To handle mixed datatypes we will also need to a way to easily consolidate datatypes for the unpivot. The macro should exclude columns that don’t need to be converted for the unpivot.
Instead of converting datatypes and unpivoting in a single macro, we can separate these functionalities into two macros. The first will be responsible for coercing datatypes to varchar. The second will just perform an unpivot.
The simplest form of converting column datatypes would be to simple use to_char(col) on all columns to be converted.
prompt 2.1-macro-convert-column-datatypes.sql
prompt create a macro to coerce datatypes for unpivot
create or replace function convert_columns_to_varchar(
data in dbms_tf.table_t,
exclude_cols in dbms_tf.columns_t
) return varchar2
sql_macro(table)
is
l_column varchar2(128);
l_all_columns varchar2(4000);
l_sql varchar2(4000);
begin
for i in 1..data.column.count loop
l_column := data.column(i).description.name;
if l_column not member of exclude_cols then
l_all_columns := l_all_columns || ','
|| 'to_char(' || l_column || ') as ' || l_column;
else
l_all_columns := l_all_columns || ','
|| l_column;
end if;
end loop;
l_all_columns := trim(leading ',' from l_all_columns);
l_sql := q'[
select ##ALL_COLUMNS## from data
]';
l_sql := replace(l_sql, '##ALL_COLUMNS##', l_all_columns);
return l_sql;
end convert_columns_to_varchar;
/
Because we used to_char for all conversions, this approach relies on the session’s nls settings for implicitly converting dates and timestamps. If the session had the nls_date_format set to a format that did not include times, all date columns would be compared without the time component.
alter session set nls_date_format = 'YYYY-MM-DD';
prompt Test the macro with different data types
prompt using a simple to_char for conversion makes datetime conversions dependent on nls settings
with base (id, str, num, dt) as (
select
level
, chr(level + 64)
, level + 64
, sysdate + level
from dual
connect by level <= 5
)
select id, str, num, dt
from convert_columns_to_varchar(base, columns(id))
/
ID STR NUM DT
--- --- --- ----------
1 A 65 2024-11-30
2 B 66 2024-12-01
3 C 67 2024-12-02
4 D 68 2024-12-03
5 E 69 2024-12-04
This approach may be too simple if we are using it to find exact differences between columns. Using the type information from the dbms_tf.table_t column collection we can create more precise conversions for various datatypes that are based on constants for the format models.
Experimenting with some different types shows that the type values are not quite the same as the constants in the dbms_types package. Since the column_type_name procedure is part of dbms_tf, it looks like the best way to interpret the column types used by the column object.
prompt 2.2-macro-convert-column-datatypes-based-on-type.sql
prompt create a macro to coerce datatypes for unpivot
prompt customize conversion for source datatypes
create or replace function convert_columns_to_varchar(
data in dbms_tf.table_t,
exclude_cols in dbms_tf.columns_t
) return varchar2
sql_macro(table)
is
l_column varchar2(128);
l_all_columns varchar2(4000);
l_sql varchar2(4000);
l_type_name varchar2(100);
c_fmt_number varchar2(100) := '''tm9''';
c_fmt_date varchar2(100) := '''yyyy-mm-dd hh24:mi:ss''';
c_fmt_ts varchar2(50) := '''yyyy-mm-dd hh24:mi:ss.ff9''';
c_fmt_tstz varchar2(50) := '''yyyy-mm-dd hh24:mi:ss.ff9 tzh:tzm''';
c_fmt_tsltz varchar2(50) := '''yyyy-mm-dd hh24:mi:ss.ff9 tzr''';
begin
for i in 1..data.column.count loop
l_column := data.column(i).description.name;
if l_column not member of exclude_cols then
l_type_name := dbms_tf.column_type_name(data.column(i).description);
l_all_columns := l_all_columns || ','
|| case
when l_type_name = 'NUMBER'
then 'to_char(' || l_column || ',' || c_fmt_number || ')'
when l_type_name in ('DATE','TYPE-CODE: 13')
then 'to_char(' || l_column || ',' || c_fmt_date || ')'
when l_type_name in ('TIMESTAMP')
then 'to_char(' || l_column || ',' || c_fmt_ts || ')'
when l_type_name in ('TIMESTAMP WITH TIMEZONE', 'TYPE-CODE: 188')
then 'to_char(' || l_column || ',' || c_fmt_tstz || ')'
when l_type_name in ('TIMESTAMP WITH LOCAL TIMEZONE')
then 'to_char(' || l_column || ',' || c_fmt_tsltz || ')'
when l_type_name in ('INTERVAL DAY TO SECOND','INTERVAL YEAR TO MONTH')
then 'to_char(' || l_column || ')'
when l_type_name in ('CLOB')
then 'to_char(' || l_column || ')'
else l_column
end
|| ' as ' || l_column;
else
l_all_columns := l_all_columns || ',' || l_column;
end if;
end loop;
l_all_columns := trim(leading ',' from l_all_columns);
l_sql := q'[
select ##ALL_COLS## from data
]';
l_sql := replace(l_sql, '##ALL_COLS##', l_all_columns);
return l_sql;
end convert_columns_to_varchar;
/
Testing this macro we see that details for date values are no longer dependent on nls settings:
prompt Test the macro with different data types
prompt use explicit datetime formats to preserve detail
alter session set nls_date_format = 'YYYY-MM-DD';
with base (id, str, num, dt) as (
select
level
, chr(level + 64)
, level + 64
, sysdate + level + power(level,level)/(24*60*60)
from dual
connect by level <= 5
)
select id, str, num, dt
from convert_columns_to_varchar(base, columns(id))
/
ID STR NUM DT
--- --- --- --------------------
1 A 65 2024-11-30 15:01:36
2 B 66 2024-12-01 15:01:39
3 C 67 2024-12-02 15:02:02
4 D 68 2024-12-03 15:05:51
5 E 69 2024-12-04 15:53:40
Creating The Dynamic Unpivot Macro
We can build a macro to dynamically unpivot columns using the column collection of the dbms_tf.table_t type that is used to pass data sources to table macros. A second parameter of type dbms_tf.columns_t can be used to exclude columns from the unpivot because we don’t want to unpivot the columns used to identify rows.
Because the data source to unpivot may actually include columns called column_name and column_value, I will use column#name and column#value to represent the extra columns created by the unpivot.
prompt 3.1-simple-unpivot-macro.sql
prompt create a simple macro to dynamically unpivot columns
prompt don't convert datatypes before unpivoting
create or replace function dynamic_unpivot_columns(
data in dbms_tf.table_t,
exclude_cols in dbms_tf.columns_t
) return varchar2
sql_macro(table)
is
l_column varchar2(128);
l_sql varchar2(4000);
l_unpivot_columns varchar2(4000);
l_exclude varchar2(4000);
begin
select listagg(column_value, ',') into l_exclude
from table(exclude_cols);
for i in 1..data.column.count loop
l_column := data.column(i).description.name;
if l_column not member of exclude_cols then
l_unpivot_columns := l_unpivot_columns || ',' || l_column;
end if;
end loop;
l_unpivot_columns := trim(leading ',' from l_unpivot_columns);
l_sql := q'[
select ##EXCLUDE_COLS##, column#name, column#value
from
data
unpivot include nulls(column#value for column#name in (##UNPIVOT_COLS##))
]';
l_sql := replace(l_sql, '##EXCLUDE_COLS##', l_exclude);
l_sql := replace(l_sql, '##UNPIVOT_COLS##', l_unpivot_columns);
return l_sql;
end dynamic_unpivot_columns;
/
Testing the macro we see that any column not in the exclude_cols collection has been unpivoted correctly.
with base as (
select product_id, code, name, description
from products_source
order by product_id
fetch first 2 rows only
)
select *
from dynamic_unpivot_columns(base, columns(product_id, code))
/
PRODUCT_ID CODE COLUMN#NAME COLUMN#VALUE
---------- ------ --------------- --------------------
1 P-ES NAME Everest Summit
1 P-ES DESCRIPTION Mt. Everest Summit
2 P-EB NAME Everest Basecamp
2 P-EB DESCRIPTION Mt. Everest Basecamp
Combining Macro Calls: The Problem
At this point we have created table macros for all three of the functionalities that we need to accomplish a dynamic row and column comparison: compare_row_differences, convert_datatypes_to_varchar and dynamic_unpivot_columns. These modular elements need to be combined for the final solution, but this becomes a bit problematic.
The primary problem is that table macro calls cannot actually be nested when they are invoked. A macro has to be called within a FROM clause, it cannot be passed as an argument to another macro.
Polymorphic table functions can be nested by wrapping the nested ptf in a CTE expression. Table macros are a form of polymorphic table function, but they are not supported inside a with query clause. This restriction means that we have no option to nest calls to sql table macros.
prompt 3.2-test-combining-macro.sql
prompt there is no syntax option to combine table macros
prompt restriction of macros inside of CTE expressions prevents simple nested calls
with base as (
select product_id, code, name, msrp
from products_source
order by product_id
fetch first 2 rows only
), converted_datatypes as (
select product_id, code, name, msrp
from convert_columns_to_varchar(base, columns(product_id, code))
)
select *
from dynamic_unpivot_columns(converted_datatypes, columns(product_id, code))
/
--ORA-64630: unsupported use of SQL macro: use of SQL macro inside WITH clause is not supported
prompt table macros can only accept a table name or CTE expression name
prompt nesting the macro calls will not work
with base as (
select product_id, code, name, msrp
from products_source
order by product_id
fetch first 2 rows only
)
select *
from dynamic_unpivot_columns(
convert_columns_to_varchar(base, columns(product_id, code))
, columns(product_id, code))
/
--ORA-64629: table SQL macro can only appear in FROM clause of a SQL statement
The only solution to this is to nest the macro calls within the actual macros. In the unpivot macro this works because we want the dynamic unpivot to always flatten the data types of columns to be unpivoted.
Parameter passing for the data source using a dbms_tf.table_t type is simple, just include the parameter name in the generated sql to be replaced with the actual data source when the final sql is generated. The correct replacement will occur for the nested macro.
Passing the columns collection for columns to exclude is harder. Instead of just using the parameter name, we have to convert it back to the sql syntax for passing these collections of columns: columns(n, m).
prompt 3.3-nested-unpivot-macro.sql
prompt create a nested macro to dynamically unpivot columns with different datatypes
prompt nest the call to convert datatypes to varchar inside the macro
prompt cannot pass dbms_tf.columns_t directly, convert back to columns(m, n) for the nested call
create or replace function dynamic_unpivot_columns(
data in dbms_tf.table_t,
exclude_cols in dbms_tf.columns_t
) return varchar2
sql_macro(table)
is
l_column varchar2(128);
l_sql varchar2(4000);
l_unpivot_columns varchar2(4000);
l_exclude varchar2(4000);
begin
select listagg(column_value, ',') into l_exclude
from table(exclude_cols);
for i in 1..data.column.count loop
l_column := data.column(i).description.name;
if l_column not member of exclude_cols then
l_unpivot_columns := l_unpivot_columns || ',' || l_column;
end if;
end loop;
l_unpivot_columns := trim(leading ',' from l_unpivot_columns);
l_sql := q'[
select ##EXCLUDE_COLS##, column#name, column#value
from
( select * from convert_columns_to_varchar(data, columns(##EXCLUDE_COLS##)))
unpivot include nulls(column#value for column#name in (##UNPIVOT_COLS##))
]';
l_sql := replace(l_sql, '##EXCLUDE_COLS##', l_exclude);
l_sql := replace(l_sql, '##UNPIVOT_COLS##', l_unpivot_columns);
return l_sql;
end dynamic_unpivot_columns;
/
Testing the new form of the dynamic_unpivot_columns macro shows that the nested call to the convert_columns_to_varchar macro is working correctly.
with base as (
select product_id, code, name, description, msrp
from products_source
order by product_id
fetch first 2 rows only
)
select *
from dynamic_unpivot_columns(base, columns(product_id, code))
/
PRODUCT_ID CODE COLUMN#NAME COLUMN#VALUE
---------- ------ --------------- --------------------
1 P-ES NAME Everest Summit
1 P-ES DESCRIPTION Mt. Everest Summit
1 P-ES MSRP 30
2 P-EB NAME Everest Basecamp
2 P-EB DESCRIPTION Mt. Everest Basecamp
2 P-EB MSRP 30
The Column Differences Macro
Since it worked to nest the convert_columns_to_varchar macro inside the dynamic_unpivot_columns macro, can we reuse the find_row_differences macro? The first nesting only worked because we could write the inner macro call as select * from convert_columns_to_varchar(data, columns(…)). To reuse the find_row_differences macro we would have to pass the results of dynamic_unpivot_columns for each data source to the find_row_differences macro… even inside the macro we still can’t call two nested macros.
prompt 4.1-macro-row-column-differences.sql
prompt create a simple macro to compare row and column differences
prompt no syntax options for passing results of one macro call to another macro
prompt cannot reuse find_row_differences macro
create or replace function find_row_column_differences(
source_data in dbms_tf.table_t
, target_data in dbms_tf.table_t
, id_columns in dbms_tf.columns_t
) return varchar2
sql_macro(table)
is
l_sql varchar2(4000);
l_id_columns varchar2(4000);
begin
select listagg(column_value,',')
into l_id_columns
from table(id_columns);
l_sql := q'!
select *
from find_row_differences(
dynamic_unpivot_columns(source_data, columns(##ID_COLUMNS##))
, dynamic_unpivot_columns(target_data, columns(##ID_COLUMNS##))
, columns(##ID_COLUMNS##))
!';
l_sql := replace(l_sql,'##ID_COLUMNS##', l_id_columns);
return l_sql;
end find_row_column_differences;
/
prompt test the macro to show row and column differences
select * from find_row_column_differences(products_source, products_target, columns(product_id, code))
/
--SQL Error: ORA-64629: table SQL macro can only appear in FROM clause of a SQL statement
The only way around this is to reproduce the logic for find_row_differences so that there is only a single layer of nesting within the macro.
prompt 4.2-nested-macro-row-column-differences.sql
prompt create a nested macro to compare row and column differences
prompt use nested invocation of dynamic_unpivot_columns macro
prompt cannot reuse find_row_differences macro
prompt the only option is to reproduce the comparison logic from find_row_differences
create or replace function find_row_column_differences(
source_data in dbms_tf.table_t
, target_data in dbms_tf.table_t
, id_columns in dbms_tf.columns_t
) return varchar2
sql_macro(table)
is
l_sql varchar2(4000);
l_id_columns varchar2(4000);
l_coalesce_id_columns varchar2(4000);
l_join_id_columns varchar2(4000);
l_column varchar2(128);
begin
select listagg(column_value,',')
into l_id_columns
from table(id_columns);
for i in 1..id_columns.count loop
l_column := id_columns(i);
l_coalesce_id_columns := l_coalesce_id_columns
|| ', coalesce(s.' || l_column
|| ', t.' || l_column
|| ') as ' || l_column;
l_join_id_columns := l_join_id_columns
|| case when l_join_id_columns is not null then ' and ' end
|| 's.' || l_column || ' = t.' || l_column;
end loop;
l_sql := q'!
select
coalesce(s.row#source, t.row#source) as row#source
##COALESCE_ID_COLUMNS##
, coalesce(s.column#name, t.column#name) as column#name
, coalesce(s.column#value, t.column#value) as column#value
from
(
select
'source' as row#source
, ##ID_COLUMNS##
, column#name
, column#value
, json_object(column#value) as json#column
from dynamic_unpivot_columns(source_data, columns(##ID_COLUMNS##))
) s full outer join
(
select
'target' as row#source
, ##ID_COLUMNS##
, column#name
, column#value
, json_object(column#value) as json#column
from dynamic_unpivot_columns(target_data, columns(##ID_COLUMNS##))
) t
on ##JOIN_COLUMNS##
and s.column#name = t.column#name
and json_equal(s.json#column, t.json#column)
where
s.##FIRST_ID_COLUMN## is null
or t.##FIRST_ID_COLUMN## is null
order by ##ID_COLUMNS##, column#name, row#source
!';
l_sql := replace(l_sql,'##ID_COLUMNS##', l_id_columns);
l_sql := replace(l_sql,'##COALESCE_ID_COLUMNS##', l_coalesce_id_columns);
l_sql := replace(l_sql,'##JOIN_COLUMNS##', l_join_id_columns);
l_sql := replace(l_sql,'##FIRST_ID_COLUMN##', id_columns(1));
return l_sql;
end find_row_column_differences;
/
Testing this macro yields the same results as the first query in this post without having to write either the unpivot or the comparision logic.
select * from find_row_column_differences(products_source, products_target, columns(product_id, code))
/
ROW#SOURCE PRODUCT_ID CODE COLUMN#NAME COLUMN#VALUE
---------- ---------- ------ ------------ ------------------------------
source 1 P-ES DESCRIPTION Mt. Everest Summit
target 1 P-ES DESCRIPTION Mount Everest Summit
source 2 P-EB DESCRIPTION Mt. Everest Basecamp
target 2 P-EB DESCRIPTION Mount Everest Basecamp
source 3 P-FD MSRP 20
target 3 P-FD MSRP 19
source 3 P-FD NAME Fujiyama Dawn
target 3 P-FD NAME Fuji Dawn
source 4 P-FS NAME Fujiyama Sunset
target 4 P-FS NAME Fuji Sunset
source 6 PC-ES DESCRIPTION Mt. Everest postcards
target 6 PC-ES DESCRIPTION Mount Everest postcards
source 6 PC-ES STYLE 5x7
target 6 PC-ES STYLE Monochrome
source 8 PC-K2 STYLE Color
target 8 PC-K2 STYLE (null)
source 9 PC-S DESCRIPTION Mount Shasta postcards
source 9 PC-S MSRP 9
source 9 PC-S NAME Shasta Postcards
source 9 PC-S STYLE 5x7
Table macros make it very simple to find row or column differences without writing any specific SQL code at all. The restriction on using sql macros within a CTE expression is a bit problematic, as is the inability to nest macro calls directly. Since macros are actually polymorphic table functions, it seems likely that at least one of these restrictions will eventually be removed.
The macro approach is not only efficient in terms of writing the code, since everything is executed in SQL it is just as fast as writing the entire query in SQL explicitly.
A Pipelined Table Function Alternative For Unpivoting
The first draft of this article was started with the idea that we could convert a row to json and dynamically unpivot it using a json_dataguide inside of a pipelined function. This approach sidesteps of the complexity of dealing with polymorphic table function types for dynamic tables and column information by converting to json and using the available dynamic json information.
Compile the dynamic_json package included in the source for the article (package.dynamic_json.spec and package.dynamic_json.body) to try the following example.
prompt 5.1-compare-using-pipelined-unpivot.sql
prompt without using table macros we can use a pipelined function to unpivot the results
with unpivot_source as (
select 'source' as row_source, s.product_id, s.code, u.*
from
(
select product_id, code, json_object(*) as jdoc
from products_source
) s,
dynamic_json.unpivot_json_row(s.jdoc) u
), unpivot_target as (
select 'target' as row_source, s.product_id, s.code, u.*
from
(
select product_id, code, json_object(*) as jdoc
from products_target
) s,
dynamic_json.unpivot_json_row(s.jdoc) u
)
select
coalesce(s.row_source, t.row_source) as row_source
, coalesce(s.product_id, t.product_id) as product_id
, coalesce(s.code, t.code) as code
, coalesce(s.column#name, t.column#name) as column#name
, coalesce(s.column#value, t.column#value) as column#value
from
unpivot_source s
full outer join unpivot_target t
on s.product_id = t.product_id
and s.code = t.code
and s.column#name = t.column#name
and decode(s.column#value, t.column#value, 1, 0) = 1
where
s.product_id is null
or t.product_id is null
order by product_id, code, column#name, row_source
/
This approach can also be combined with the macro approach, but its not as efficient for this case of comparing row column differences.
It is good to know that pipelined functions can be combined with macros because we may encounter cases where some of the program logic cannot be migrated to a macro.
prompt 5.2-compare-macro-and-pipelined-unpivot.sql
prompt macros can also be combined pipelined functions if the pipelined logic could not be converted to a macro
with unpivot_source as (
select s.product_id, s.code, u.*
from
(
select product_id, code, json_object(*) as jdoc
from products_source
) s,
dynamic_json.unpivot_json_row(s.jdoc) u
), unpivot_target as (
select s.product_id, s.code, u.*
from
(
select product_id, code, json_object(*) as jdoc
from products_target
) s,
dynamic_json.unpivot_json_row(s.jdoc) u
)
select
row_source
, product_id
, code
, column#name
, column#value
from
find_row_differences(unpivot_source, unpivot_target, columns(product_id, code, column#name))
order by product_id, code, column#name, row_source
/
The main advantage of this pipelined approach is that it will work with older versions of Oracle. The main disadvantage is the cost of calling a PL/SQL based table function from SQL, even with the pipelining optimization.
The real advantage of the macro approach is that everything is actually running in pure SQL with no context switching. The sql macro function is just used to help the SQL engine come up with the final query during parsing. This advantage is significant that any use case for PL/SQL embedded in SQL should be reexamined for the possibility of using macros.
Best Regards,
Anthony Harper
Leave a comment