Quantcast
Channel: Philipp Salvisberg's Blog
Viewing all articles
Browse latest Browse all 118

Using UTL_XML.PARSEQUERY for SQL Dependency Analysis

$
0
0

Last week I had a talk at Oracle’s OpenWorld 2011 titled Modern PL/SQL Code Checking and Dependency Analysis.

The problem I described in chapter 4 was to find all view columns using the column UNIT_COST of the table COSTS in the SH schema. Other usages of this column (e.g. in where or order by clauses) have to be ignored. To solve this problem within the Oracle Database Server 11.2 a parser is necessary (at least I’m not aware of another solution). Even a DBA_DEPENDENCY_COLUMNS view as described in Rob van Wijk’s post is not enough to solve this problem.

However, in this particular case no custom or 3rd party parser is necessary. Oracle provides a procedure named PARSEQUERY in the PL/SQL package UTL_XML which is in fact well suited to solve this problem as I will show later. First, I’d like explain which columns should be found by a dependency analysis procedure based on some sample views.

Oracle’s sales history demo schema SH provides view named PROFITS, which is defined as follows:

CREATE OR REPLACE VIEW PROFITS AS
SELECT s.channel_id,
       s.cust_id,
       s.prod_id,
       s.promo_id,
       s.time_id,
       c.unit_cost,
       c.unit_price,
       s.amount_sold,
       s.quantity_sold,
       c.unit_cost * s.quantity_sold TOTAL_COST
  FROM costs c, sales s
 WHERE c.prod_id = s.prod_id
   AND c.time_id = s.time_id
   AND c.channel_id = s.channel_id
   AND c.promo_id = s.promo_id;

The columns using COSTS.UNIT_COSTS are highlighted.

The following view uses the column TOTAL_COSTS in GROSS_MARGIN (line 14) and GROSS_MARGIN_PERCENT (lines 14 and 15). The usage is not evident at the first glance since it is based on the column GROSS_MARGIN (line 4) of the named query GM and the column COST (line 8) in GM’s subquery. This kind of dependencies need to be identified.

CREATE OR REPLACE VIEW GROSS_MARGINS AS
WITH 
   gm AS (
      SELECT time_id, revenue, revenue - cost AS gross_margin
        FROM (
           SELECT time_id,
                  unit_price * quantity_sold AS revenue,
                  total_cost AS cost
             FROM profits
        )
   )
SELECT t.fiscal_year,
       SUM(revenue) AS revenue,
       SUM(gross_margin) AS gross_margin,
       round(100 * SUM(gross_margin) / SUM(revenue), 2) 
          AS gross_margin_percent
  FROM gm
 INNER JOIN times t ON t.time_id = gm.time_id
 GROUP BY t.fiscal_year
 ORDER BY t.fiscal_year;

The next view does not present the data of COSTS.UNIT_COST as a column, even if the view depends on the table COSTS

CREATE OR REPLACE VIEW REVENUES AS
SELECT fiscal_year, revenue
  FROM gross_margins;

The last view uses COSTS.UNIT_COST but not as part of a column expression and therefore has not to be reported. The usage in the order by clause is considered save.

CREATE OR REPLACE VIEW SALES_ORDERED_BY_GM AS
SELECT channel_id,
       cust_id,
       prod_id,
       promo_id,
       time_id,
       amount_sold,
       quantity_sold
  FROM profits
 ORDER BY (unit_price - unit_cost) DESC;

So, the following result of the dependency analysis is expected:

SCHEMAVIEWCOLUMN
SHPROFITSUNIT_COST
SHPROFITSTOTAL_COST
SHGROSS_MARGINSGROSS_MARGIN
SHGROSS_MARGINSGROSS_MARGIN_PERCENT

Exactly this result is created by the following query

SELECT *
  FROM TABLE(coldep_pkg.get_dep('sh', 'costs', 'unit_cost'));

Now I just list all the code snippets I’ve written to create this result. Please note, that this is considered just a proof-of-concept code to show how UTL_XML.PARSEQUERY could be used for SQL dependency analysis in conjunction with Oracle dictionary views. This means that this is not a complete implementation. For example wild cards (*) is not handled which may lead to missing dependencies. Additionally table/view sources are not checked which may lead to false positives (in case a column is used in multiple view/table sources). – Please feel free to complete the code. However, an update is highly appreciated ;-)

GRANT EXECUTE ON SYS.UTL_XML TO SH;

CREATE OR REPLACE TYPE "SH"."COLDEP_TYP" AS 
OBJECT (schema_name VARCHAR2(30), 
        view_name varchar2(30), 
        column_name VARCHAR2(30))
/
CREATE OR REPLACE TYPE "SH"."COLDEP_L" IS TABLE OF coldep_typ
/

CREATE OR REPLACE PACKAGE "SH"."COLDEP_PKG" IS
   FUNCTION parse_query(p_query IN VARCHAR2) RETURN xmltype;

   FUNCTION get_dep(p_schema_name IN VARCHAR2,
                    p_object_name IN VARCHAR2,
                    p_column_name IN VARCHAR2) RETURN coldep_l
      PIPELINED;

   FUNCTION process_view(p_schema_name IN VARCHAR2,
                         p_view_name   IN VARCHAR2,
                         p_column_name IN VARCHAR2,
                         p_query       IN CLOB) RETURN coldep_l;
END coldep_pkg;
/
CREATE OR REPLACE PACKAGE BODY "SH"."COLDEP_PKG" IS
   FUNCTION parse_query(p_query IN VARCHAR2) RETURN xmltype IS
      v_clob CLOB;
      v_xml  xmltype;
   BEGIN
      dbms_lob.createtemporary(v_clob, TRUE);
      -- parse query and get XML as CLOB
      sys.utl_xml.parsequery(USER, p_query, v_clob);
      -- create XMLTYPE from CLOB 
      v_xml := xmltype.createxml(v_clob);
      dbms_lob.freetemporary(v_clob);
      RETURN v_xml;
   END parse_query;

   FUNCTION get_dep(p_schema_name IN VARCHAR2,
                    p_object_name IN VARCHAR2,
                    p_column_name IN VARCHAR2) RETURN coldep_l
      PIPELINED IS
   BEGIN
      -- query dictionary dependencies
      FOR v_dep IN (SELECT d.owner AS schema_name,
                           d.name  AS view_name,
                           v.text  AS query_text
                      FROM all_dependencies d
                     INNER JOIN all_views v
                        ON v.owner = d.owner
                           AND v.view_name = d.name
                     WHERE d.referenced_owner = upper(p_schema_name)
                           AND d.referenced_name = upper(p_object_name)
                           AND d.type = 'VIEW')
      LOOP
         -- process every fetched view
         FOR v_views IN (
            SELECT VALUE(pv) coldep
              FROM TABLE(process_view(v_dep.schema_name,
                                      v_dep.view_name,
                                      p_column_name,
                                      v_dep.query_text)) pv)
         LOOP
            -- return column usages in v_dep.view_name
            PIPE ROW(v_views.coldep);
            -- get column usages of views using v_dep.view_name (recursive calls)
            FOR v_recursive IN (
               SELECT VALUE(dep) coldep
                 FROM TABLE(get_dep(v_views.coldep.schema_name,
                                    v_views.coldep.view_name,
                                    v_views.coldep.column_name)) dep)
            LOOP
               -- return column usages of recursive call
               PIPE ROW(v_recursive.coldep);
            END LOOP;
         END LOOP;
      END LOOP;
   END get_dep;

   FUNCTION process_view(p_schema_name IN VARCHAR2,
                         p_view_name   IN VARCHAR2,
                         p_column_name IN VARCHAR2,
                         p_query       IN CLOB) RETURN coldep_l IS
      v_search_l       coldep_l := coldep_l(coldep_typ(NULL,
                                                       NULL,
                                                       p_column_name));
      v_xml            xmltype;
      v_previous_count INTEGER := 0;
      v_coldep_l       coldep_l := coldep_l();
   BEGIN
      -- parse view query
      v_xml := parse_query(p_query);
      -- get inline dependencies from secondary select lists
      -- TODO: handle table/view source and wildcard properly 
      WHILE v_previous_count < v_search_l.count
      LOOP
         v_previous_count := v_search_l.count;
         FOR v_secondary IN (
            SELECT nvl(x.alias_name, x.column_reference) AS alias_name
              FROM (SELECT t.select_list_item,
                           t.alias_name,
                           extractvalue(VALUE(c), 'COLUMN') AS column_reference
                      FROM xmltable('//SELECT_LIST_ITEM[ancestor::FROM or ancestor::WITH]'
                              passing v_xml 
                              columns select_list_item xmltype path '//SELECT_LIST_ITEM',
                                      alias_name VARCHAR2(30) path '//COLUMN_ALIAS') t,
                           TABLE(xmlsequence(extract(select_list_item, '//COLUMN'))) c) x
             WHERE upper(x.column_reference) IN (SELECT upper(column_name) 
                                                   FROM TABLE(v_search_l))
               AND upper(alias_name) NOT IN (SELECT upper(column_name)
                                              FROM TABLE(v_search_l)))
         LOOP
            -- add internal column usage
            v_search_l.extend;
            v_search_l(v_search_l.count) := coldep_typ(NULL,
                                                       NULL,
                                                       v_secondary.alias_name);
         END LOOP;
      END LOOP;
      -- analyze primary select list
      -- TODO: handle table/view source and wildcard properly 
      FOR v_primary IN (
         SELECT x.column_id, atc.column_name
           FROM (SELECT t.select_list_item,
                        t.column_id,
                        extractvalue(VALUE(c), 'COLUMN') AS column_reference
                   FROM xmltable('//SELECT_LIST_ITEM[not (ancestor::FROM) and not (ancestor::WITH)]'
                           passing v_xml 
                           columns column_id FOR ordinality,
                                   select_list_item xmltype path '//SELECT_LIST_ITEM') t,
                        TABLE(xmlsequence(extract(select_list_item, '//COLUMN'))) c) x
                  INNER JOIN all_tab_columns atc
                     ON atc.owner = p_schema_name
                    AND atc.table_name = p_view_name
                    AND atc.column_id = x.column_id
                  WHERE upper(x.column_reference) IN (SELECT upper(column_name)
                                                        FROM TABLE(v_search_l))
                  ORDER BY x.column_id)
      LOOP
         -- add external column usage
         v_coldep_l.extend;
         v_coldep_l(v_coldep_l.count) := coldep_typ(p_schema_name,
                                                    p_view_name,
                                                    v_primary.column_name);
      END LOOP;
      -- return column dependencies   
      RETURN v_coldep_l;
   END process_view;
END coldep_pkg;
/

Below you find the XML parser output of the query defined in the view GROSS_MARGINS. The model becomes quite clear, even if I could not find a schema description.

<QUERY>
  <WITH>
    <WITH_ITEM>
      <QUERY_ALIAS>GM</QUERY_ALIAS>
      <QUERY>
        <SELECT>
          <SELECT_LIST>
            <SELECT_LIST_ITEM>
              <COLUMN_REF>
                <COLUMN>TIME_ID</COLUMN>
              </COLUMN_REF>
            </SELECT_LIST_ITEM>
            <SELECT_LIST_ITEM>
              <COLUMN_REF>
                <COLUMN>REVENUE</COLUMN>
              </COLUMN_REF>
            </SELECT_LIST_ITEM>
            <SELECT_LIST_ITEM>
              <SUB>
                <COLUMN_REF>
                  <COLUMN>REVENUE</COLUMN>
                </COLUMN_REF>
                <COLUMN_REF>
                  <COLUMN>COST</COLUMN>
                </COLUMN_REF>
              </SUB>
              <COLUMN_ALIAS>GROSS_MARGIN</COLUMN_ALIAS>
            </SELECT_LIST_ITEM>
          </SELECT_LIST>
        </SELECT>
        <FROM>
          <FROM_ITEM>
            <QUERY>
              <SELECT>
                <SELECT_LIST>
                  <SELECT_LIST_ITEM>
                    <COLUMN_REF>
                      <TABLE>PROFITS</TABLE>
                      <COLUMN>TIME_ID</COLUMN>
                    </COLUMN_REF>
                  </SELECT_LIST_ITEM>
                  <SELECT_LIST_ITEM>
                    <MUL>
                      <COLUMN_REF>
                        <TABLE>PROFITS</TABLE>
                        <COLUMN>UNIT_PRICE</COLUMN>
                      </COLUMN_REF>
                      <COLUMN_REF>
                        <TABLE>PROFITS</TABLE>
                        <COLUMN>QUANTITY_SOLD</COLUMN>
                      </COLUMN_REF>
                    </MUL>
                    <COLUMN_ALIAS>REVENUE</COLUMN_ALIAS>
                  </SELECT_LIST_ITEM>
                  <SELECT_LIST_ITEM>
                    <COLUMN_REF>
                      <TABLE>PROFITS</TABLE>
                      <COLUMN>TOTAL_COST</COLUMN>
                    </COLUMN_REF>
                    <COLUMN_ALIAS>COST</COLUMN_ALIAS>
                  </SELECT_LIST_ITEM>
                </SELECT_LIST>
              </SELECT>
              <FROM>
                <FROM_ITEM>
                  <TABLE>PROFITS</TABLE>
                </FROM_ITEM>
              </FROM>
            </QUERY>
          </FROM_ITEM>
        </FROM>
      </QUERY>
    </WITH_ITEM>
  </WITH>
  <SELECT>
    <SELECT_LIST>
      <SELECT_LIST_ITEM>
        <COLUMN_REF>
          <TABLE_ALIAS>T</TABLE_ALIAS>
          <COLUMN>FISCAL_YEAR</COLUMN>
        </COLUMN_REF>
      </SELECT_LIST_ITEM>
      <SELECT_LIST_ITEM>
        <SUM>
          <COLUMN_REF>
            <COLUMN>REVENUE</COLUMN>
          </COLUMN_REF>
        </SUM>
        <COLUMN_ALIAS>REVENUE</COLUMN_ALIAS>
      </SELECT_LIST_ITEM>
      <SELECT_LIST_ITEM>
        <SUM>
          <COLUMN_REF>
            <COLUMN>GROSS_MARGIN</COLUMN>
          </COLUMN_REF>
        </SUM>
        <COLUMN_ALIAS>GROSS_MARGIN</COLUMN_ALIAS>
      </SELECT_LIST_ITEM>
      <SELECT_LIST_ITEM>
        <ROUND>
          <DIV>
            <MUL>
              <LITERAL>100</LITERAL>
              <SUM>
                <COLUMN_REF>
                  <COLUMN>GROSS_MARGIN</COLUMN>
                </COLUMN_REF>
              </SUM>
            </MUL>
            <SUM>
              <COLUMN_REF>
                <COLUMN>REVENUE</COLUMN>
              </COLUMN_REF>
            </SUM>
          </DIV>
          <LITERAL>2</LITERAL>
        </ROUND>
        <COLUMN_ALIAS>GROSS_MARGIN_PERCENT</COLUMN_ALIAS>
      </SELECT_LIST_ITEM>
    </SELECT_LIST>
  </SELECT>
  <FROM>
    <FROM_ITEM>
      <JOIN>
        <INNER/>
        <JOIN_TABLE_1>
          <QUERY_ALIAS>GM</QUERY_ALIAS>
        </JOIN_TABLE_1>
        <JOIN_TABLE_2>
          <TABLE>TIMES</TABLE>
          <TABLE_ALIAS>T</TABLE_ALIAS>
        </JOIN_TABLE_2>
        <ON>
          <EQ>
            <COLUMN_REF>
              <TABLE>TIMES</TABLE>
              <TABLE_ALIAS>T</TABLE_ALIAS>
              <COLUMN>TIME_ID</COLUMN>
            </COLUMN_REF>
            <COLUMN_REF>
              <TABLE_ALIAS>GM</TABLE_ALIAS>
              <COLUMN>TIME_ID</COLUMN>
            </COLUMN_REF>
          </EQ>
        </ON>
      </JOIN>
    </FROM_ITEM>
  </FROM>
  <GROUP_BY>
    <EXPRESSION_LIST>
      <EXPRESSION_LIST_ITEM>
        <COLUMN_REF>
          <TABLE_ALIAS>T</TABLE_ALIAS>
          <COLUMN>FISCAL_YEAR</COLUMN>
        </COLUMN_REF>
      </EXPRESSION_LIST_ITEM>
    </EXPRESSION_LIST>
  </GROUP_BY>
  <ORDER_BY>
    <ORDER_BY_LIST>
      <ORDER_BY_LIST_ITEM>
        <COLUMN_REF>
          <TABLE_ALIAS>T</TABLE_ALIAS>
          <COLUMN>FISCAL_YEAR</COLUMN>
        </COLUMN_REF>
      </ORDER_BY_LIST_ITEM>
    </ORDER_BY_LIST>
  </ORDER_BY>
</QUERY>

Please note that UTL_XML.PARSEQUERY is suited for extended query dependency analysis only. DML may be parsed, but the resulting model is incomplete with 11.2.0.2 (e.g. clauses missing in the select statement are not included in the model, like the SET clause in an update statement). If you need to analyze PL/SQL beyond PL/Scope you still may need a 3rd party parser.


Viewing all articles
Browse latest Browse all 118

Trending Articles