Table Data Comparison: Column Differences

declare
    l_row_differences boolean := true;
    l_next_step varchar2(100);
begin
    if l_row_differences then
        l_next_step := 'unpivot to show column differences';
        dbms_output.put_line(l_next_step);
    end if;
end;

In Table Data Comparison – Row Differences we looked at comparing sets of rows from two sources to see which rows did not match. This is a very useful technique if you only want to know which rows are different, but it doesn’t identify which columns in these rows are different.

In this article we will look at approaches to determine the exact column differences between rows that are different. This query pattern requires a lot of manual SQL that is much easier to implement with SQL table macros. Defining these macros is a great workout in using the polymorphic table function types to access column information from query expressions dynamically.

Getting Started

The source code for all articles here on practicalplsql.org is available in the practicalplsql public repository on github in the scripts/articles directory organized by concept and article. You can use create.user.practicalplsql.sql to create the practicalplsql schema. Sample data for articles can be found in the scripts/examples directory organized by example name.

Scripts for this post are located in column-differences. I will be building on the sql queries, functions and macros from row-differences, and dynamic-unpivot-dataguide, they have been included in the source for this post with some modifications.

For the examples I will be using a simple table populated with products, replicated to two tables with some differences. This example is located at examples/table-data-compare. The example can be created by running the create_example.sql script. Larger datasets to compare can be found in the external tables at examples/geodata-ext-tables.

All scripts should be executed with serveroutput on, so this repetition is omitted for clarity. Some queries or code build on previous examples, these omit the repeated code for brevity, the source code has no omissions. Finally, when several scripts produce the same correct output, the output will only be shown once to avoid repetition.

Using Unpivot To Find Column Diffferences

If we unpivot the columns in each row, keeping row identifiers, the same approaches that we used to determine row differences can be used to identify columns that are different for each row.

The columns to be unpivoted must all be the same datatype and the columns to be unpivoted must all be explicitly specified. Use include nulls in the unpivot so that these columns show up in the final results. For simplicity, these first examples only unpivot the varchar2 columns in the test tables.

Once the rows are unpivoted to column value combinations, we can use the full outer join method to compare these decomposed rows by joining on the row identifiers, column names and column values. This example joins the column values using decode to make the join null aware.

prompt 1.1-query-unpivot-for-column-differences.sql
prompt unpivot to decompose rows and use full outer join comparison with decode

with unpivot_source as (
    select product_id, code, column_name, column_value
    from 
        products_source
        unpivot include nulls (
            column_value for column_name in (
                name, description, style)
        )    
), unpivot_target as (
    select product_id, code, column_name, column_value
    from 
        products_target
        unpivot include nulls (
            column_value for column_name in (
                name, description, style)
        )    
)
select 
    case when s.product_id is null then 'target' else 'source' end as row_source
    , coalesce(s.product_id, t.product_id) as product_id
    , coalesce(s.code, t.code) as code
    , coalesce(s.column_name, t.column_name) as column_name
    , coalesce(s.column_value, t.column_value) as column_value
from
    unpivot_source s
    full outer join unpivot_target t
        on s.product_id = t.product_id 
        and s.code = t.code
        and s.column_name = t.column_name
        and decode(s.column_value, t.column_value, 1, 0) = 1
where
    s.product_id is null or t.product_id is null
order by product_id, column_name, row_source
/

Now we see only the actual column differences between the rows in each table.

ROW_SOURCE PRODUCT_ID CODE   COLUMN_NAME  COLUMN_VALUE                  
---------- ---------- ------ ------------ ------------------------------
source              1 P-ES   DESCRIPTION  Mt. Everest Summit            
target              1 P-ES   DESCRIPTION  Mount Everest Summit 
         
source              2 P-EB   DESCRIPTION  Mt. Everest Basecamp          
target              2 P-EB   DESCRIPTION  Mount Everest Basecamp   
     
source              3 P-FD   NAME         Fujiyama Dawn                 
target              3 P-FD   NAME         Fuji Dawn          
           
source              4 P-FS   NAME         Fujiyama Sunset               
target              4 P-FS   NAME         Fuji Sunset      
             
source              6 PC-ES  DESCRIPTION  Mt. Everest postcards         
target              6 PC-ES  DESCRIPTION  Mount Everest postcards      
 
source              6 PC-ES  STYLE        5x7                           
target              6 PC-ES  STYLE        Monochrome      
              
source              8 PC-K2  STYLE        Color                         
target              8 PC-K2  STYLE        (null)      
                  
source              9 PC-S   DESCRIPTION  Mount Shasta postcards        
source              9 PC-S   NAME         Shasta Postcards              
source              9 PC-S   STYLE        5x7                           

17 rows selected.

We could also use the json_equal approach for the comparison of the decomposed rows.

prompt 1.2-query-unpivot-for-column-differences.sql
prompt unpivot to decompose rows and use full outer join comparison with json_equal

with unpivot_source as (
    select product_id, code, column_name, column_value
    from 
        products_source
        unpivot include nulls (
            column_value for column_name in (
                name, description, style)
        )    
), unpivot_target as (
    select product_id, code, column_name, column_value
    from 
        products_target
        unpivot include nulls (
            column_value for column_name in (
                name, description, style)
        )    
)
select 
    coalesce(s.row_source, t.row_source) as row_source
    , coalesce(s.product_id, t.product_id) as product_id
    , coalesce(s.code, t.code) as code
    , coalesce(s.column_name, t.column_name) as column_name
    , coalesce(s.column_value, t.column_value) as column_value
from
    (
        select 'source' as row_source, json_object(b.*) as jdoc, b.* from unpivot_source b
    ) s
    full outer join (
        select 'target' as row_source, json_object(b.*) as jdoc, b.* from unpivot_target b
    ) t
        on s.product_id = t.product_id 
        and s.code = t.code
        and s.column_name = t.column_name
        and json_equal(s.jdoc, t.jdoc)
where
    s.product_id is null or t.product_id is null
order by product_id, column_name, row_source
/

Use Row Compare Macro with Unpivoted Row Data

We have already seen how a macro can hide the implementation of determining the comparison between rows…this can also be used to simplify column comparisons. Let’s create a row compare macro and see how we can use it to help compare columns.

prompt 1.3-macro-row-differences.sql
prompt create a macro to find row differences using full outer join and json_equal
prompt don't include the json rows in the results

create or replace function find_row_differences(
    source_data in dbms_tf.table_t, 
    target_data in dbms_tf.table_t, 
    id_columns in dbms_tf.columns_t
    ) return varchar2 
    sql_macro(table)
is
    l_sql varchar2(4000);
    l_id_columns varchar2(4000);
    l_coalesce varchar2(4000);
    l_all_columns varchar2(4000);
    l_join_cols varchar2(4000);
    l_column varchar2(128);
    c_lf constant varchar2(1) := chr(10);
begin
    select listagg(column_value,',') 
    into l_id_columns 
    from table(id_columns);
    
    for c in 1..source_data.column.count loop
        l_column := source_data.column(c).description.name;
        l_coalesce := l_coalesce 
            || ', coalesce(s.' || l_column 
            || ', t.' || l_column 
            || ') as ' || l_column || c_lf;
        l_all_columns := l_all_columns || ', ' || l_column || c_lf;
    end loop;
    
    for i in 1..id_columns.count loop
        l_join_cols := l_join_cols 
            || case when l_join_cols is not null then ' and ' end 
            || 's.' || id_columns(i) || ' = t.' || id_columns(i);
    end loop;
    
    l_sql := q'!
    select 
        coalesce(s.row_source, t.row_source) as row_source 
        ##COALESCE_COLUMNS##
    from
        (
        select 
            'source' as row_source
            ##ALL_COLUMNS##
            , json_object(*) as jdoc
        from source_data 
        ) s full outer join
        (
        select 
            'target' as row_source
            ##ALL_COLUMNS##
            , json_object(*) as jdoc
        from target_data
        ) t 
            on ##JOIN_ID_COLUMNS## 
            and json_equal(s.jdoc, t.jdoc) 
    where 
        s.##FIRST_ID_COLUMN## is null 
        or t.##FIRST_ID_COLUMN## is null
    order by ##ID_COLUMNS##, row_source
    !';
    l_sql := replace(l_sql,'##ALL_COLUMNS##', l_all_columns);
    l_sql := replace(l_sql,'##ID_COLUMNS##', l_id_columns);
    l_sql := replace(l_sql,'##COALESCE_COLUMNS##', l_coalesce);
    l_sql := replace(l_sql,'##JOIN_ID_COLUMNS##', l_join_cols);
    l_sql := replace(l_sql,'##FIRST_ID_COLUMN##', id_columns(1));
    
    return l_sql;
end find_row_differences;
/

Using the macro makes it clear that a column level comparision will unpivot the rows to decompose and then apply our standard row comparision logic.

prompt 1.4-query-unpivot-with-row-compare-macro.sql
prompt unpivot to decompose rows and then use row compare macro

with unpivot_source as (
    select product_id, code, column_name, column_value
    from 
        products_source
        unpivot include nulls (
            column_value for column_name in (
                name, description, style)
        )    
), unpivot_target as (
    select product_id, code, column_name, column_value
    from 
        products_target
        unpivot include nulls (
            column_value for column_name in (
                name, description, style)
        )    
)
select 
    row_source
    , product_id
    , code
    , column_name
    , column_value
from find_row_differences(
    unpivot_source
    , unpivot_target
    , columns(product_id, code, column_name))
/

Even though the final comparison logic is now wrapped in the row compare macro, there is still a bit of code required to perform the initial unpivot. As the number of columns to be unpivoted increases, the amount of custom code required goes up for both unpivot expressions.

If we include columns that have varying datatypes, the unpivot would become more complex because all columns to be unpivoted need to be converted to the same datatype.

prompt 1.5-query-unpivot-different-datatypes-with-row-compare-macro.sql
prompt datatypes need to be the same for unpivot
prompt unpivot to decompose rows and then use row compare macro

with unpivot_source as (
    select product_id, code, column_name, column_value
    from 
        (
        select 
            product_id
            , code
            , name
            , description
            , style
            , to_char(msrp) as msrp  --convert numbers explicitly to varchar
        from products_source
        )
        unpivot include nulls (
            column_value for column_name in (
                name, description, style, msrp)
        )    
), unpivot_target as (
    select product_id, code, column_name, column_value
    from 
        (
        select 
            product_id
            , code
            , name
            , description
            , style
            , to_char(msrp) as msrp  --convert numbers explicitly to varchar
        from products_target
        )
        unpivot include nulls (
            column_value for column_name in (
                name, description, style, msrp)
        )    
)
select 
    row_source
    , product_id
    , code
    , column_name
    , column_value
from find_row_differences(
    unpivot_source
    , unpivot_target
    , columns(product_id, code, column_name))
/

Coercing the datatypes prior to unpivoting allows the column differences to be found for mixed datatype columns. The resulting values will have to be in the same common datatype used for the unpivot.

...  
source              3 P-FD   MSRP         20                            
target              3 P-FD   MSRP         19                            
source              3 P-FD   NAME         Fujiyama Dawn                 
target              3 P-FD   NAME         Fuji Dawn                     
...

Convert Datatypes Using A Macro

To get to a final macro that simplifies comparing column differences, we need to create an unpivot macro. To handle mixed datatypes we will also need to a way to easily consolidate datatypes for the unpivot. The macro should exclude columns that don’t need to be converted for the unpivot.

Instead of converting datatypes and unpivoting in a single macro, we can separate these functionalities into two macros. The first will be responsible for coercing datatypes to varchar. The second will just perform an unpivot.

The simplest form of converting column datatypes would be to simple use to_char(col) on all columns to be converted.

prompt 2.1-macro-convert-column-datatypes.sql
prompt create a macro to coerce datatypes for unpivot

create or replace function convert_columns_to_varchar(
    data in dbms_tf.table_t, 
    exclude_cols in dbms_tf.columns_t
    ) return varchar2 
    sql_macro(table)
is
    l_column varchar2(128);
    l_all_columns varchar2(4000);
    l_sql varchar2(4000);
begin
    
    for i in 1..data.column.count loop
        l_column := data.column(i).description.name;
        if l_column not member of exclude_cols then
            l_all_columns := l_all_columns || ','
                || 'to_char(' || l_column || ') as ' || l_column;
        else
            l_all_columns := l_all_columns || ','
                || l_column;
        end if;
    end loop;
    l_all_columns := trim(leading ',' from l_all_columns);
    
    l_sql := q'[
         select ##ALL_COLUMNS## from data 
    ]';
    l_sql := replace(l_sql, '##ALL_COLUMNS##', l_all_columns);

    return l_sql;
end convert_columns_to_varchar;
/

Because we used to_char for all conversions, this approach relies on the session’s nls settings for implicitly converting dates and timestamps. If the session had the nls_date_format set to a format that did not include times, all date columns would be compared without the time component.

alter session set nls_date_format = 'YYYY-MM-DD';

prompt Test the macro with different data types
prompt using a simple to_char for conversion makes datetime conversions dependent on nls settings
with base (id, str, num, dt) as (
    select 
        level
        , chr(level + 64)
        , level + 64
        , sysdate + level
    from dual
    connect by level <= 5
)
select id, str, num, dt
from convert_columns_to_varchar(base, columns(id))
/

 ID STR NUM DT        
--- --- --- ----------
  1 A   65  2024-11-30
  2 B   66  2024-12-01
  3 C   67  2024-12-02
  4 D   68  2024-12-03
  5 E   69  2024-12-04

This approach may be too simple if we are using it to find exact differences between columns. Using the type information from the dbms_tf.table_t column collection we can create more precise conversions for various datatypes that are based on constants for the format models.

Experimenting with some different types shows that the type values are not quite the same as the constants in the dbms_types package. Since the column_type_name procedure is part of dbms_tf, it looks like the best way to interpret the column types used by the column object.

prompt 2.2-macro-convert-column-datatypes-based-on-type.sql
prompt create a macro to coerce datatypes for unpivot
prompt customize conversion for source datatypes

create or replace function convert_columns_to_varchar(
    data in dbms_tf.table_t, 
    exclude_cols in dbms_tf.columns_t
    ) return varchar2 
    sql_macro(table)
is
    l_column varchar2(128);
    l_all_columns varchar2(4000);
    l_sql varchar2(4000);
    l_type_name varchar2(100);
    c_fmt_number varchar2(100) := '''tm9''';
    c_fmt_date   varchar2(100) := '''yyyy-mm-dd hh24:mi:ss''';
    c_fmt_ts varchar2(50) := '''yyyy-mm-dd hh24:mi:ss.ff9''';
    c_fmt_tstz varchar2(50) := '''yyyy-mm-dd hh24:mi:ss.ff9 tzh:tzm''';
    c_fmt_tsltz varchar2(50) := '''yyyy-mm-dd hh24:mi:ss.ff9 tzr''';
begin
    
    for i in 1..data.column.count loop
        l_column := data.column(i).description.name;
        if l_column not member of exclude_cols then
            l_type_name := dbms_tf.column_type_name(data.column(i).description);
            l_all_columns := l_all_columns || ','
                || case
                    when l_type_name = 'NUMBER' 
                        then 'to_char(' || l_column || ',' || c_fmt_number || ')'
                    when l_type_name in ('DATE','TYPE-CODE: 13') 
                        then 'to_char(' || l_column || ',' || c_fmt_date || ')'
                    when l_type_name in ('TIMESTAMP') 
                        then 'to_char(' || l_column || ',' || c_fmt_ts || ')'
                    when l_type_name in ('TIMESTAMP WITH TIMEZONE', 'TYPE-CODE: 188') 
                        then 'to_char(' || l_column || ',' || c_fmt_tstz || ')'
                    when l_type_name in ('TIMESTAMP WITH LOCAL TIMEZONE') 
                        then 'to_char(' || l_column || ',' || c_fmt_tsltz || ')'
                    when l_type_name in ('INTERVAL DAY TO SECOND','INTERVAL YEAR TO MONTH') 
                        then 'to_char(' || l_column || ')'
                    when l_type_name in ('CLOB') 
                        then 'to_char(' || l_column || ')'
                    else l_column
                end
                || ' as ' || l_column;
        else 
            l_all_columns := l_all_columns || ',' || l_column;
        end if;
    end loop;
    l_all_columns := trim(leading ',' from l_all_columns);
    
    l_sql := q'[
         select ##ALL_COLS## from data 
    ]';
    l_sql := replace(l_sql, '##ALL_COLS##', l_all_columns);
    return l_sql;
end convert_columns_to_varchar;
/

Testing this macro we see that details for date values are no longer dependent on nls settings:

prompt Test the macro with different data types
prompt use explicit datetime formats to preserve detail

alter session set nls_date_format = 'YYYY-MM-DD';

with base (id, str, num, dt) as (
    select 
        level
        , chr(level + 64)
        , level + 64
        , sysdate + level + power(level,level)/(24*60*60)
    from dual
    connect by level <= 5
)
select id, str, num, dt
from convert_columns_to_varchar(base, columns(id))
/

 ID STR NUM DT                  
--- --- --- --------------------
  1 A   65  2024-11-30 15:01:36 
  2 B   66  2024-12-01 15:01:39 
  3 C   67  2024-12-02 15:02:02 
  4 D   68  2024-12-03 15:05:51 
  5 E   69  2024-12-04 15:53:40

Creating The Dynamic Unpivot Macro

We can build a macro to dynamically unpivot columns using the column collection of the dbms_tf.table_t type that is used to pass data sources to table macros. A second parameter of type dbms_tf.columns_t can be used to exclude columns from the unpivot because we don’t want to unpivot the columns used to identify rows.

Because the data source to unpivot may actually include columns called column_name and column_value, I will use column#name and column#value to represent the extra columns created by the unpivot.

prompt 3.1-simple-unpivot-macro.sql
prompt create a simple macro to dynamically unpivot columns
prompt don't convert datatypes before unpivoting

create or replace function dynamic_unpivot_columns(
    data in dbms_tf.table_t, 
    exclude_cols in dbms_tf.columns_t
    ) return varchar2 
    sql_macro(table)
is
    l_column varchar2(128);
    l_sql varchar2(4000);
    l_unpivot_columns varchar2(4000);
    l_exclude varchar2(4000);
begin
    select listagg(column_value, ',') into l_exclude 
    from table(exclude_cols);
    
    for i in 1..data.column.count loop
        l_column := data.column(i).description.name;
        if l_column not member of exclude_cols then
            l_unpivot_columns := l_unpivot_columns || ',' || l_column;
        end if;
    end loop;
    l_unpivot_columns := trim(leading ',' from l_unpivot_columns);
    
    l_sql := q'[
    select ##EXCLUDE_COLS##, column#name, column#value
    from 
        data
        unpivot include nulls(column#value for column#name in (##UNPIVOT_COLS##))
    ]';
    l_sql := replace(l_sql, '##EXCLUDE_COLS##', l_exclude);
    l_sql := replace(l_sql, '##UNPIVOT_COLS##', l_unpivot_columns);

    return l_sql;
end dynamic_unpivot_columns;
/

Testing the macro we see that any column not in the exclude_cols collection has been unpivoted correctly.

with base as (
    select product_id, code, name, description
    from products_source
    order by product_id
    fetch first 2 rows only
)
select * 
from dynamic_unpivot_columns(base, columns(product_id, code))
/

PRODUCT_ID CODE   COLUMN#NAME     COLUMN#VALUE        
---------- ------ --------------- --------------------
         1 P-ES   NAME            Everest Summit      
         1 P-ES   DESCRIPTION     Mt. Everest Summit  
         2 P-EB   NAME            Everest Basecamp    
         2 P-EB   DESCRIPTION     Mt. Everest Basecamp

Combining Macro Calls: The Problem

At this point we have created table macros for all three of the functionalities that we need to accomplish a dynamic row and column comparison: compare_row_differences, convert_datatypes_to_varchar and dynamic_unpivot_columns. These modular elements need to be combined for the final solution, but this becomes a bit problematic.

The primary problem is that table macro calls cannot actually be nested when they are invoked. A macro has to be called within a FROM clause, it cannot be passed as an argument to another macro.

Polymorphic table functions can be nested by wrapping the nested ptf in a CTE expression. Table macros are a form of polymorphic table function, but they are not supported inside a with query clause. This restriction means that we have no option to nest calls to sql table macros.

prompt 3.2-test-combining-macro.sql
prompt there is no syntax option to combine table macros

prompt restriction of macros inside of CTE expressions prevents simple nested calls

with base as (
    select  product_id, code, name, msrp
    from products_source
    order by product_id
    fetch first 2 rows only
), converted_datatypes as (
    select product_id, code, name, msrp
    from convert_columns_to_varchar(base, columns(product_id, code))
)
select *
from dynamic_unpivot_columns(converted_datatypes, columns(product_id, code))
/
--ORA-64630: unsupported use of SQL macro: use of SQL macro inside WITH clause is not supported

prompt table macros can only accept a table name or CTE expression name
prompt nesting the macro calls will not work

with base as (
    select  product_id, code, name, msrp
    from products_source
    order by product_id
    fetch first 2 rows only
)
select * 
from dynamic_unpivot_columns(
    convert_columns_to_varchar(base, columns(product_id, code))
    , columns(product_id, code))
/
--ORA-64629: table SQL macro can only appear in FROM clause of a SQL statement

The only solution to this is to nest the macro calls within the actual macros. In the unpivot macro this works because we want the dynamic unpivot to always flatten the data types of columns to be unpivoted.

Parameter passing for the data source using a dbms_tf.table_t type is simple, just include the parameter name in the generated sql to be replaced with the actual data source when the final sql is generated. The correct replacement will occur for the nested macro.

Passing the columns collection for columns to exclude is harder. Instead of just using the parameter name, we have to convert it back to the sql syntax for passing these collections of columns: columns(n, m).

prompt 3.3-nested-unpivot-macro.sql
prompt create a nested macro to dynamically unpivot columns with different datatypes
prompt nest the call to convert datatypes to varchar inside the macro
prompt cannot pass dbms_tf.columns_t directly, convert back to columns(m, n) for the nested call

create or replace function dynamic_unpivot_columns(
    data in dbms_tf.table_t, 
    exclude_cols in dbms_tf.columns_t
    ) return varchar2 
    sql_macro(table)
is
    l_column varchar2(128);
    l_sql varchar2(4000);
    l_unpivot_columns varchar2(4000);
    l_exclude varchar2(4000);
begin
    select listagg(column_value, ',') into l_exclude 
    from table(exclude_cols);
    
    for i in 1..data.column.count loop
        l_column := data.column(i).description.name;
        if l_column not member of exclude_cols then
            l_unpivot_columns := l_unpivot_columns || ',' || l_column;
        end if;
    end loop;
    l_unpivot_columns := trim(leading ',' from l_unpivot_columns);
    
    l_sql := q'[
    select ##EXCLUDE_COLS##, column#name, column#value
    from 
        ( select * from convert_columns_to_varchar(data, columns(##EXCLUDE_COLS##))) 
        unpivot include nulls(column#value for column#name in (##UNPIVOT_COLS##))
    ]';
    l_sql := replace(l_sql, '##EXCLUDE_COLS##', l_exclude);
    l_sql := replace(l_sql, '##UNPIVOT_COLS##', l_unpivot_columns);

    return l_sql;
end dynamic_unpivot_columns;
/

Testing the new form of the dynamic_unpivot_columns macro shows that the nested call to the convert_columns_to_varchar macro is working correctly.

with base as (
    select product_id, code, name, description, msrp
    from products_source
    order by product_id
    fetch first 2 rows only
)
select * 
from dynamic_unpivot_columns(base, columns(product_id, code))
/

PRODUCT_ID CODE   COLUMN#NAME     COLUMN#VALUE        
---------- ------ --------------- --------------------
         1 P-ES   NAME            Everest Summit      
         1 P-ES   DESCRIPTION     Mt. Everest Summit  
         1 P-ES   MSRP            30                  
         2 P-EB   NAME            Everest Basecamp    
         2 P-EB   DESCRIPTION     Mt. Everest Basecamp
         2 P-EB   MSRP            30

The Column Differences Macro

Since it worked to nest the convert_columns_to_varchar macro inside the dynamic_unpivot_columns macro, can we reuse the find_row_differences macro? The first nesting only worked because we could write the inner macro call as select * from convert_columns_to_varchar(data, columns(…)). To reuse the find_row_differences macro we would have to pass the results of dynamic_unpivot_columns for each data source to the find_row_differences macro… even inside the macro we still can’t call two nested macros.

prompt 4.1-macro-row-column-differences.sql
prompt create a simple macro to compare row and column differences
prompt no syntax options for passing results of one macro call to another macro
prompt cannot reuse find_row_differences macro

create or replace function find_row_column_differences(
    source_data in dbms_tf.table_t
    , target_data in dbms_tf.table_t
    , id_columns in dbms_tf.columns_t
    ) return varchar2 
    sql_macro(table)
is
    l_sql varchar2(4000);
    l_id_columns varchar2(4000);
begin
    select listagg(column_value,',') 
    into l_id_columns 
    from table(id_columns);
    
    l_sql := q'!
    select *
    from find_row_differences(
        dynamic_unpivot_columns(source_data, columns(##ID_COLUMNS##))
        , dynamic_unpivot_columns(target_data, columns(##ID_COLUMNS##))
        , columns(##ID_COLUMNS##))
    !';
    
    l_sql := replace(l_sql,'##ID_COLUMNS##', l_id_columns);
    
    return l_sql;
end find_row_column_differences;
/

prompt test the macro to show row and column differences
select * from find_row_column_differences(products_source, products_target, columns(product_id, code))
/

--SQL Error: ORA-64629: table SQL macro can only appear in FROM clause of a SQL statement

The only way around this is to reproduce the logic for find_row_differences so that there is only a single layer of nesting within the macro.

prompt 4.2-nested-macro-row-column-differences.sql
prompt create a nested macro to compare row and column differences
prompt use nested invocation of dynamic_unpivot_columns macro
prompt cannot reuse find_row_differences macro
prompt the only option is to reproduce the comparison logic from find_row_differences

create or replace function find_row_column_differences(
    source_data in dbms_tf.table_t
    , target_data in dbms_tf.table_t
    , id_columns in dbms_tf.columns_t
    ) return varchar2 
    sql_macro(table)
is
    l_sql varchar2(4000);
    l_id_columns varchar2(4000);
    l_coalesce_id_columns varchar2(4000);
    l_join_id_columns varchar2(4000);
    l_column varchar2(128);
begin
    select listagg(column_value,',') 
    into l_id_columns 
    from table(id_columns);
    
    for i in 1..id_columns.count loop
        l_column := id_columns(i);
        
        l_coalesce_id_columns := l_coalesce_id_columns 
            || ', coalesce(s.' || l_column 
            || ', t.' || l_column 
            || ') as ' || l_column;
            
        l_join_id_columns := l_join_id_columns 
            || case when l_join_id_columns is not null then ' and ' end 
            || 's.' || l_column || ' = t.' || l_column;
    end loop;
    
    l_sql := q'!
    select 
        coalesce(s.row#source, t.row#source) as row#source 
        ##COALESCE_ID_COLUMNS##
        , coalesce(s.column#name, t.column#name) as column#name
        , coalesce(s.column#value, t.column#value) as column#value
    from
        (
            select 
                'source' as row#source
                , ##ID_COLUMNS##
                , column#name
                , column#value
                , json_object(column#value) as json#column
            from dynamic_unpivot_columns(source_data, columns(##ID_COLUMNS##))
        ) s full outer join
        (
            select 
                'target' as row#source
                , ##ID_COLUMNS##
                , column#name
                , column#value
                , json_object(column#value) as json#column
            from dynamic_unpivot_columns(target_data, columns(##ID_COLUMNS##))
        ) t 
            on ##JOIN_COLUMNS## 
            and s.column#name = t.column#name 
            and json_equal(s.json#column, t.json#column)
    where 
        s.##FIRST_ID_COLUMN## is null 
        or t.##FIRST_ID_COLUMN## is null
    order by ##ID_COLUMNS##, column#name, row#source
    !';
    
    l_sql := replace(l_sql,'##ID_COLUMNS##', l_id_columns);
    l_sql := replace(l_sql,'##COALESCE_ID_COLUMNS##', l_coalesce_id_columns);
    l_sql := replace(l_sql,'##JOIN_COLUMNS##', l_join_id_columns);
    l_sql := replace(l_sql,'##FIRST_ID_COLUMN##', id_columns(1));
    
    return l_sql;
end find_row_column_differences;
/

Testing this macro yields the same results as the first query in this post without having to write either the unpivot or the comparision logic.

select * from find_row_column_differences(products_source, products_target, columns(product_id, code))
/

ROW#SOURCE PRODUCT_ID CODE   COLUMN#NAME  COLUMN#VALUE                  
---------- ---------- ------ ------------ ------------------------------
source              1 P-ES   DESCRIPTION  Mt. Everest Summit            
target              1 P-ES   DESCRIPTION  Mount Everest Summit  
        
source              2 P-EB   DESCRIPTION  Mt. Everest Basecamp          
target              2 P-EB   DESCRIPTION  Mount Everest Basecamp   
     
source              3 P-FD   MSRP         20                            
target              3 P-FD   MSRP         19                     
       
source              3 P-FD   NAME         Fujiyama Dawn                 
target              3 P-FD   NAME         Fuji Dawn             
        
source              4 P-FS   NAME         Fujiyama Sunset               
target              4 P-FS   NAME         Fuji Sunset              
     
source              6 PC-ES  DESCRIPTION  Mt. Everest postcards         
target              6 PC-ES  DESCRIPTION  Mount Everest postcards       

source              6 PC-ES  STYLE        5x7                           
target              6 PC-ES  STYLE        Monochrome               
     
source              8 PC-K2  STYLE        Color                         
target              8 PC-K2  STYLE        (null)                  
      
source              9 PC-S   DESCRIPTION  Mount Shasta postcards        
source              9 PC-S   MSRP         9                             
source              9 PC-S   NAME         Shasta Postcards              
source              9 PC-S   STYLE        5x7

Table macros make it very simple to find row or column differences without writing any specific SQL code at all. The restriction on using sql macros within a CTE expression is a bit problematic, as is the inability to nest macro calls directly. Since macros are actually polymorphic table functions, it seems likely that at least one of these restrictions will eventually be removed.

The macro approach is not only efficient in terms of writing the code, since everything is executed in SQL it is just as fast as writing the entire query in SQL explicitly.

A Pipelined Table Function Alternative For Unpivoting

The first draft of this article was started with the idea that we could convert a row to json and dynamically unpivot it using a json_dataguide inside of a pipelined function. This approach sidesteps of the complexity of dealing with polymorphic table function types for dynamic tables and column information by converting to json and using the available dynamic json information.

Compile the dynamic_json package included in the source for the article (package.dynamic_json.spec and package.dynamic_json.body) to try the following example.

prompt 5.1-compare-using-pipelined-unpivot.sql
prompt without using table macros we can use a pipelined function to unpivot the results
with unpivot_source as (
    select 'source' as row_source, s.product_id, s.code, u.*
    from
        (
            select product_id, code, json_object(*) as jdoc 
            from products_source 
        ) s,
        dynamic_json.unpivot_json_row(s.jdoc) u
), unpivot_target as (
    select 'target' as row_source, s.product_id, s.code, u.*
    from
        (
            select product_id, code, json_object(*) as jdoc 
            from products_target
        ) s,
        dynamic_json.unpivot_json_row(s.jdoc) u
)
select 
    coalesce(s.row_source, t.row_source) as row_source
    , coalesce(s.product_id, t.product_id) as product_id
    , coalesce(s.code, t.code) as code
    , coalesce(s.column#name, t.column#name) as column#name
    , coalesce(s.column#value, t.column#value) as column#value
from 
    unpivot_source s
    full outer join unpivot_target t
        on s.product_id = t.product_id
        and s.code = t.code
        and s.column#name = t.column#name
        and decode(s.column#value, t.column#value, 1, 0) = 1
where
    s.product_id is null 
    or t.product_id is null
order by product_id, code, column#name, row_source
/

This approach can also be combined with the macro approach, but its not as efficient for this case of comparing row column differences.

It is good to know that pipelined functions can be combined with macros because we may encounter cases where some of the program logic cannot be migrated to a macro.

prompt 5.2-compare-macro-and-pipelined-unpivot.sql
prompt macros can also be combined pipelined functions if the pipelined logic could not be converted to a macro
with unpivot_source as (
    select s.product_id, s.code, u.*
    from
        (
            select product_id, code, json_object(*) as jdoc 
            from products_source 
        ) s,
        dynamic_json.unpivot_json_row(s.jdoc) u
), unpivot_target as (
    select s.product_id, s.code, u.*
    from
        (
            select product_id, code, json_object(*) as jdoc 
            from products_target
        ) s,
        dynamic_json.unpivot_json_row(s.jdoc) u
)
select 
    row_source
    , product_id
    , code
    , column#name
    , column#value
from 
    find_row_differences(unpivot_source, unpivot_target, columns(product_id, code, column#name)) 
order by product_id, code, column#name, row_source
/

The main advantage of this pipelined approach is that it will work with older versions of Oracle. The main disadvantage is the cost of calling a PL/SQL based table function from SQL, even with the pipelining optimization.

The real advantage of the macro approach is that everything is actually running in pure SQL with no context switching. The sql macro function is just used to help the SQL engine come up with the final query during parsing. This advantage is significant that any use case for PL/SQL embedded in SQL should be reexamined for the possibility of using macros.

Best Regards,

Anthony Harper