Quantcast
Channel: Philipp Salvisberg's Blog
Viewing all 118 articles
Browse latest View live

Column-less Table Access

$
0
0

While writing some JUnit tests after fixing bugs in dependency analysis views, I came up with the following query:

SELECT owner, object_type, object_name, operation, table_name
  FROM tvd_object_usage_v
MINUS
SELECT owner, object_type, object_name, operation, table_name
  FROM tvd_object_col_usage_v

The first view tvd_object_usage_v contains all table/view usages per object. The second view tvd_object_col_usages_v contains all column usages per object.

The idea was to check the completeness of the second view tvd_object_col_usages_v. I believed that there cannot be an object usage without one or more corresponding column usages. Therefore I assumed the query above should retrieve now rows, but obviously I was plain wrong.

Here are some examples of column-less table accesses:

SELECT sys_guid() 
  FROM dual;

SELECT COUNT(*) 
  FROM bonus;

SELECT rownum AS row_num
  FROM dual
CONNECT BY rownum <= 1000;

SELECT e.empno, e.ename
  FROM emp e, dept d;

Based on that I’ve build the test case as follows:

INSERT INTO tvd_captured_sql_t
   (cap_id, cap_source)
VALUES
   (-1007,
    'SELECT sys_guid() FROM dual;
     SELECT COUNT(*) FROM bonus;
     SELECT rownum AS row_num FROM dual CONNECT BY rownum <= 1000;  
     SELECT e.empno, e.ename FROM emp e, dept d;');
COMMIT;

tvdca.sh user=tvdca password=tvdca host=groemitz sid=phs112

SQL> SELECT operation, table_name
  2    FROM tvd_sql_usage_v
  3   WHERE cap_id = -1007;

OPERAT TABLE_NAME
------ ------------------------------
SELECT DUAL
SELECT BONUS
SELECT DUAL
SELECT EMP
SELECT DEPT

SQL> SELECT operation, table_name, column_name
  2    FROM tvd_sql_col_usage_v
  3   WHERE cap_id = -1007;

OPERAT TABLE_NAME                     COLUMN_NAME
------ ------------------------------ ------------------------------
SELECT EMP                            EMPNO
SELECT EMP                            ENAME

SQL> SELECT operation, table_name
  2    FROM tvd_sql_usage_v
  3   WHERE cap_id = -1007
  4  MINUS
  5  SELECT operation, table_name
  6    FROM tvd_sql_col_usage_v
  7   WHERE cap_id = -1007;

OPERAT TABLE_NAME
------ ------------------------------
SELECT BONUS
SELECT DEPT
SELECT DUAL

These tests are now part of my TVDCA test suite to ensure column-less table access is handled appropriately ;-) 

BTW, here is an excerpt of my JUnit test:

@Test
public void testColumnLessTableAccess() {
	String tabSql = "SELECT COUNT(*) FROM tvd_sql_usage_v WHERE cap_id = -1007 AND table_name LIKE :table_name";
	String colSql = "SELECT COUNT(*) FROM tvd_sql_col_usage_v WHERE cap_id = -1007 AND table_name LIKE :table_name and column_name LIKE :column_name";
	int count;
	Map<String, String> namedParameters = new HashMap<String, String>();
	// all tables
	namedParameters.put("table_name", "%");
	namedParameters.put("column_name", "%");
	count = jdbcTemplate.queryForObject(tabSql, namedParameters,
			Integer.class);
	Assert.assertEquals(5, count);
	count = jdbcTemplate.queryForObject(colSql, namedParameters,
			Integer.class);
	Assert.assertEquals(2, count);
}


Multi-temporal Features in Oracle 12c

$
0
0

Oracle 12c has a feature called Temporal Validity. With Temporal Validity you can add one or more valid time dimensions to a table using existing columns, or using columns automatically created by the database. This means that Oracle offers combined with Flashback Data Archive native bi-temporal and even multi-temporal historization features. This blog post explains the different types of historization, when and how to use them and positions the most recent Oracle 12c database features.

Semantics and Granularity of Periods

In Flashback Data Archive Oracle defines periods with a half-open interval. This means that a point in time x is part of a period if x >= the start of the period and x < the end of the period. It is no surprise that Oracle uses also half-open intervals for Temporal Validity. The following figure visualizes the principle:

Period, Semantics and Granularity

Fig. 1: Semantics and Granularity of Periods

The advantage of a half-open interval is that the end of a preceding period is identical with the start of the subsequent period. Thus there is no gap and the granularity of a period (year, month, day, second, millisecond, nanosecond, etc.) is irrelevant. The disadvantage is that querying data at a point in time using a traditional WHERE clause is a bit more verbose compared to closed intervals since BETWEEN conditions are not applicable.

Furthermore, Oracle uses NULL for -∞ und +∞. Considering this information the WHERE clause to filter the currently valid periods looks as follows:

WHERE (vt_start IS NULL OR vt_start <= SYSTIMESTAMP)
  AND (vt_end IS NULL OR vt_end > SYSTIMESTAMP)

Use of Temporal Periods

In an entity-relationship model temporal periods may be used for master or reference data. For transactions or positions we do not need temporal periods since the data itself contains one or more timestamps. Corrections may be done through a reversal or difference posting logic, similar to bookkeeping transactions.

The situation is similar in a dimensional model. Dimensions correspond to master and reference data and may have a temporal period (e.g. slowly changing dimensions type 2). Facts do not have temporal periods. Instead they are modeled with one or more relationships to the time dimension. A fact is immutable. Changes are applied through new facts using a reversal or difference posting logic.

Transaction Time – TT

A flight data recorder collects and records various metrics during a flight to allow the reconstruction of the past. The transaction or system time in a data model is comparable to the functionality of such a flight data recorder. A table with a transaction time axis allows to query the current and the past state, but changes in the past or in the future are not possible.

Example: Scott becomes a manager. The change of the job description from “Analyst” to “Manager” is entered into the system on the April, 15 2013 at 15:42:42. The previous description Analyst is terminated at this point in time and the new description Manager becomes current at exactly the same point in time.

Oracle supports the transaction time with Flashback Data Archive (formally known as Total Recall). Using Flashback Data Archive you may query a consistent state of the past.

SCNSession ASession B
1INSERT INTO emp
   (empno, ename, job, sal, deptno)
VALUES
   (4242, 'CARTER', 'CLERK', '2400', 20);
2SELECT COUNT(*)
  FROM emp; -- 15 rows
3SELECT COUNT(*)
  FROM emp; -- 14 rows
4COMMIT;

Tab. 1: Consistent View of the Past

What is the result of the query “SELECT COUNT(*) FROM emp AS OF SCN 3” based on the table  1 above? – 14 rows. This is a good and reasonable representation of the past. However, it also shows, that the consistent representation of the past is a matter of definition and in this case it does not represent situation of session A.

Valid Time – VT

The valid time describes the period  during which something in the real world is considered valid. This period is independent of the entry into the system and therefore needs to be maintained explicitly. Changes and queries are supported in the past as well as in the future.

Example: Scott becomes a manager. The change of the job description from “Analyst” to “Manager” is valid from January, 1 2014. The previous description Analyst is terminated at this point in time and the new description Manager becomes valid at exactly the same point in time. It is irrelevant when this change is entered into the System.

Decision Time – DT

The decision time describes the date and time a decision has been made. This point in time is independent of an entry into the System and is not directly related to the valid time period. Future changes are not possible.

Example: Scott becomes manager. The decision to change the job description from “Analyst” to “Manager” has been made on March, 24 2013. The previous job description Analyst is terminated on the decision time axis at this point in time and the new description Manager becomes current at exactly the same point in time on the decision time axis. It is irrelevant when this change is entered into the System and it is irrelevant when Scott may call himself officially a manager.

Historization Types

On one hand the historization types are based on the time dimensions visualized in figure 2 and on the other hand categorized on the combination of these time dimensions. In this post only the most popular and generic time periods are covered. However, depending on the requirements additional, specific time periods are conceivable.

Historization Types

Fig. 2: Historization Types

Non-temporal models do not have any time dimensions (e.g. EMP and DEPT in Schema SCOTT).

Uni-temporal models use just one time dimension (e.g. transaction time or valid time).

Bi-temporal models use exactly two time dimensions (e.g. transaction time and valid time).

Multi-temporal models use at least three time dimensions.

Tri-temporal  models are based exactly on three time dimensions.

Temporal Validity

The feature Temporal Validity covers the DDL and DML enhancements in Oracle 12c concerning temporal data management. The statements CREATE TABLE, ALTER TABLE and DROP TABLE have been extended by a new PERIOD FOR clause. Here is an example:

SQL> ALTER TABLE dept ADD (
  2     vt_start DATE,
  3     vt_end   DATE,
  4     PERIOD FOR vt (vt_start, vt_end)
  5  );

SQL> SELECT * FROM dept;	

    DEPTNO DNAME          LOC           VT_START   VT_END
---------- -------------- ------------- ---------- ----------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON

VT names the period and is a hidden column. The association of the VT period to the VT_START and VT_END column is stored in the Oracle Data Dictionary in the table SYS_FBA_PERIOD. You need a dedicated ALTER TABLE call for every additional period.

For every period a  constraint is created to enforce positive time periods (VT_START < VT_END). But it is not possible to define temporal constraints, e.g. prohibit overlapping periods, gaps, or orphaned parent/child periods.

Oracle 12c does not deliver support for temporal DML. Desirable would be for example:

  • insert, update delete for a given period
  • update a subset of columns for a given period
  • merge of connected and identical periods

Hence temporal changes have to be implemented as a series of conventional DML. Here is an example:

SQL> UPDATE dept SET vt_end = DATE '2014-01-01' WHERE deptno = 30;

SQL> INSERT INTO dept (deptno, dname, loc, vt_start)
  2       VALUES (30, 'SALES', 'SAN FRANCISCO', DATE '2014-01-01');

SQL> SELECT * FROM dept WHERE deptno = 30 ORDER BY vt_start NULLS FIRST;

 DEPTNO    DNAME          LOC           VT_START   VT_END
---------- -------------- ------------- ---------- ----------
        30 SALES          CHICAGO                  2014-01-01
        30 SALES          SAN FRANCISCO 2014-01-01

Temporal Flashback Query

The feature Temporal Flashback Query covers query enhancements in Oracle 12c concerning temporal data. Oracle extended the existing Flashback Query interfaces. The FLASHBACK_QUERY_CLAUSE of the SELECT statement has been extended by a PERIOD FOR clause. Here is an example:

SQL> SELECT *
  2    FROM dept AS OF PERIOD FOR vt DATE '2015-01-01'
  3   ORDER BY deptno;

    DEPTNO DNAME          LOC           VT_START   VT_END
---------- -------------- ------------- ---------- ----------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          SAN FRANCISCO 2014-01-01
        40 OPERATIONS     BOSTON

Instead of “AS OF PERIOD FOR” you may also use “VERSIONS PERIOD FOR”. However, it is important to notice that you may not define multiple PERIOD FOR clauses . Hence you need to filter additional temporal periods in the WHERE clause.

The PERIOD FOR clause is not applicable for views. For views the enhancement in the PL/SQL package DBMS_FLASHBACK_ARCHIVE are interesting, especially the procedures ENABLE_AT_VALID_TIME and DISABLE_ASOF_VALID_TIME to manage a temporal context. Here is an example:

SQL> BEGIN
  2     dbms_flashback_archive.enable_at_valid_time(
  3        level      => 'ASOF', 
  4        query_time => DATE '2015-01-01'
  5     );
  6  END;
  7  /

SQL> SELECT * FROM dept ORDER BY deptno;

    DEPTNO DNAME          LOC           VT_START    VT_END
---------- -------------- ------------- ---------- ----------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          SAN FRANCISCO 2014-01-01
        40 OPERATIONS     BOSTON

Currently it is not possible to define a temporal period and therefore the context is applied for every temporal period. In these cases you have to set the context via the WHERE clause.

A limitation of Oracle 12.1.0.1 is that Temporal Flashback Query predicates are not applied in multitenant configuration. The PERIOD FOR clause in the SELECT statement and the DBMS_FLASHBACK_ARCHIVE.ENABLE_AT_VALID_TIME calls are simply ignored. This limitation has been lifted with Oracle 12.1.0.2.

Another limitation is, that Oracle 12 does not provide support for temporal joins and temporal aggregations.

Tri-temporal Data Model

The following data model is based on the EMP/DEPT model in the schema SCOTT. The table EMPV implements three temporal dimensions:

  • Transaction time (TT) with Flashback Data Archive
  • Valid time (VT) with Temporal Validity
  • Decision time (DT) with Temporal Validity
Tri-temporal Data Model

Fig. 3: Tri-temporal Data Model

The table EMP is reduced to the primary key (EMPNO) which is not temporal. This allows to define and enable the foreign key constraint EMPV_EMP_MGR_FK.

The following six events will be represented with this model.

NoTransaction Time (TT)Valid Time (VT)Decision Time (DT)Action
#11Initial load from SCOTT.EMP table
#22
1990-01-01Change name from SCOTT to Scott
#331991-04-01Scott leaves the company
#441991-10-01Scott rejoins
#551989-01-01Change job from ANALYST TO Analyst
#662014-01-012013-03-24Change job to Manager and double salary

Tab. 2: Events

After the processing of all 6 events the periods for the employee 7788 (Scott) in the table EMPV may be queried as follows. The transaction time is represented as the System Change Number SCN.

SQL> SELECT dense_rank() OVER(ORDER BY versions_startscn) event_no, empno, ename, job,
  2         sal, versions_startscn tt_start, versions_endscn tt_end,
  3         to_char(vt_start,'YYYY-MM-DD') vt_start, to_char(vt_end,'YYYY-MM-DD') vt_end,
  4         to_CHAR(dt_start,'YYYY-MM-DD') dt_start, to_char(dt_end,'YYYY-MM-DD') dt_end
  5    FROM empv VERSIONS BETWEEN SCN MINVALUE AND MAXVALUE
  6   WHERE empno = 7788 AND versions_operation IN ('I','U')
  7   ORDER BY tt_start, vt_start NULLS FIRST, dt_start NULLS FIRST;

# EMPNO ENAME JOB       SAL TT_START   TT_END VT_START   VT_END     DT_START   DT_END
-- ----- ----- ------- ----- -------- -------- ---------- ---------- ---------- ----------
 1  7788 SCOTT ANALYST  3000  2366310  2366356
 2  7788 SCOTT ANALYST  3000  2366356  2366559            1990-01-01
 2  7788 Scott ANALYST  3000  2366356  2366408 1990-01-01
 3  7788 Scott ANALYST  3000  2366408  2366559 1990-01-01 1991-04-01
 4  7788 Scott ANALYST  3000  2366424  2366559 1991-10-01
 5  7788 SCOTT ANALYST  3000  2366559                     1989-01-01
 5  7788 SCOTT Analyst  3000  2366559          1989-01-01 1990-01-01
 5  7788 Scott Analyst  3000  2366559          1990-01-01 1991-04-01
 5  7788 Scott Analyst  3000  2366559  2366670 1991-10-01
 6  7788 Scott Analyst  3000  2366670          1991-10-01                       2013-03-24
 6  7788 Scott Analyst  3000  2366670          1991-10-01 2014-01-01 2013-03-24
 6  7788 Scott Manager  6000  2366670          2014-01-01            2013-03-24

7 rows have been changed or added based on the event #5 at the transaction time 2366559. It clearly shows that DML operations in a temporal model are not trivial. All the more a support in that area for VT and DT is missed.

The next query filters the data for Scott on the transaction time (SYSDATE=default), valid time (2014-01-01) and decision time (2013-04-01). This way the result is reduced exactly to a single row.

SQL> SELECT empno, ename, job, sal,
  2         to_char(vt_start,'YYYY-MM-DD') AS vt_start,
  3         to_char(vt_end,'YYYY-MM-DD') AS vt_end,
  4         to_CHAR(dt_start,'YYYY-MM-DD') AS dt_start,
  5         to_char(dt_end,'YYYY-MM-DD') AS dt_end
  6    FROM empv AS OF period FOR dt DATE '2013-04-01'
  7   WHERE empno = 7788 AND
  8         (vt_start <= DATE '2014-01-01' OR vt_start IS NULL) AND
  9         (vt_end > DATE '2014-01-01' OR vt_end IS NULL)
 10   ORDER BY vt_start NULLS FIRST, dt_start NULLS FIRST;

EMPNO ENAME JOB       SAL VT_START   VT_END     DT_START   DT_END
----- ----- ------- ----- ---------- ---------- ---------- ----------
 7788 Scott Manager  6000 2014-01-01            2013-03-24

Queries on multi-temporal data are relatively simple if all time periods are filtered at a point in time. The AS OF PERIOD clause (for DT) simplifies the query, but the complexity of a traditional WHERE condition (for VT) is not much higher.

Conclusion

The support for temporal data management in Oracle 12c is based on sound concepts, but the implementation is currently incomplete. I miss mainly a temporal DML API, temporal integrity constraints, temporal joins and temporal aggregations. I recommend to use Oracle’s semantics for periods (half-open intervals, NULL for +/- infinity) in existing models, to simplify the migration to Temporal Validity.

In the real world we use a lot of temporal dimensions, consciously or unconsciously at the same time. However, in data models every additional temporal dimension increases the complexity significantly. Data models are simplifications of the real world, based on requirements and a limited budget. I do not recommend to use bi-temporality or even multi-temporality as an universal design pattern. Quite the contrary I recommend to determine and document the reason for a temporal dimension per entity to ensure that temporal dimensions are used consciously and not modeled unnecessarily.

Oracle’s Flashback Data Archive is a good, transparent and since Oracle 11.2.0.4 also a cost free option to implement requirements regarding the transaction time. For all other time dimensions such as the valid time and the decision time I recommend to use standardized tooling to apply DML on temporal data.

Last update on 2015-10-24, amendments to match limitations of Oracle version 12.1.0.2.4.

Trivadis PL/SQL & SQL CodeChecker for SQL Developer Released

$
0
0

A half a year ago Trivadis released a command line utility to scan code within a directory tree for guideline violations of the Trivadis PL/SQL & SQL Coding Guidelines Version 2.0. This is tool is perfectly suited to process millions of lines of code, but an integration into Oracle SQL Developer was missing until now.

This SQL Developer extension checks the editor content per mouse click or keyboard shortcut. Simply navigate through the issues using the cursor keys to highlight the linked code sections in the corresponding editor.

tvdcc_sqldev_issues

Additionally a detailed HTML report tab is populated containing all metrics you know from the command line tool, such as McCabe’s cyclomatic complexity, Halstead’s volume, the maintainability index or the number of statements.

If you do not like all guideline checks you may configure a whitelist and blacklist in the SQL Developer preferences to shape the output according your needs.

Trivadis PL/SQL & SQL CodeChecker for SQL Developer is available for free and licensed under a Creative Commons Attribution- NonCommercial-NoDerivs 3.0 Unported License. The full functionality is provided and is not limited in time and volume.

See Download for more information or simply register the TVDCC update center https://www.salvis.com/update/tvdcc in SQL Developer.

Ready for Oracle 12c

$
0
0

The Oracle 12c grammar is now supported in the new versions of the Trivadis CodeChecker, CodeChecker for SQL Developer and CodeAnalyzer. The following example code, copied from a colleague at Trivadis, shows how to insert rows while querying a view. This might not be the most appropriate way to implement auditing, but it shows in a few lines of code the power of plsql_declarations within a 12c SELECT statement.

tvdcc_sqldev_12c

The Trivadis CodeChecker for SQL Developer processes this example flawlessly.  It works with the new row_pattern_clauserow_limiting_clause, cross_outer_apply_clause or LATERAL clause as well. However, TVDCC might be a bit picky about the coding style.

Get ready for 12c and grab your copy of the command line tools or the SQL Developer extension from the download area.

Cannot Install Extensions in SQL Developer 4 on Mac OS X

$
0
0

Today I could not install any SQL Developer extension on my Mac OS X machine. I did not get an error message during the installation. After a restart of SQL Developer the extension simply was missing. When I tried to re-install it – selecting “Check for updates…” in the “Help” menu – I’ve got the following message:

sqldev4.1.19.07

Restarting SQL Developer did not help. This message was shown again and no extension was installed. I’ve tried to remove the $HOME/.sqldeveloper directory and reinstalled SQL Developer, but the problem persisted. I’ve tried SQL Developer version 4.0.3.16.84 and the brand new version 4.1.0.19.07. Same result.

What was the problem?

After some analysis I found the root cause. SQL Developer creates a file named jdeveloper-deferred-updates.txt in the directory $HOME/.sqldeveloper (e.g. /Users/phs/.sqldeveloper). This file is read and copied into a temporary directory as part of the installation process. On non-Windows platforms the name of the temporary directory is $TMPDIR/$USER (e.g. /var/folders/lf/8g3r0ts900gfdfn2xxkn9yz00000gn/T/phs). If a file with such a name already exists, the directory cannot be created and the whole installation of the extension fails.

What is the solution (workaround)?

Open a terminal window (e.g. type terminal in the spotlight window) and execute the following command to delete the existing temporary file, which is causing the name conflict:

rm $TMPDIR/$USER

Afterwards restart SQL Developer and install the extension. Restart SQL Developer once again to complete the installation.

Introducing PL/SQL Unwrapper for SQL Developer

$
0
0

I’m using from time to time the free service Unwrap it! or Niels Teusink’s Python script unwrap.py to unwrap PL/SQL code. Recently I’m confronted more with wrapped code since a customer is about to migrate to a new banking platform which is using wrapped PL/SQL code extensively. While investigating migration errors we experienced that unwrapping the called PL/SQL packages helped us a lot to identify the root cause faster. But since the unwrapping and debugging process is still a bit cumbersome for a series of PL/SQL packages a colleague asked me: “Wouldn’t it be nice if we could unwrap PL/SQL packages directly in SQL Developer?” and I answered “This should be simple. I’ve already written an extension for SQL Developer and the code in unwrap.py does not look too complicated.”

And on a rainy weekend I analyzed Niels Teusink’s public domain Phyton script unwrap.py and used it as a starting point for the development of a PL/SQL Unwrapper for SQL Developer.

#!/usr/bin/python
#
# This script unwraps Oracle wrapped plb packages, does not support 9g
# Contact: niels at teusink net / blog.teusink.net
#
# License: Public domain
#
import re
import base64
import zlib
import sys

# simple substitution table
charmap = [0x3d, 0x65, 0x85, 0xb3, 0x18, 0xdb, 0xe2, 0x87, 0xf1, 0x52, 0xab, 0x63, 0x4b, 0xb5, 0xa0, 0x5f, 0x7d, 0x68, 0x7b, 0x9b, 0x24, 0xc2, 0x28, 0x67, 0x8a, 0xde, 0xa4, 0x26, 0x1e, 0x03, 0xeb, 0x17, 0x6f, 0x34, 0x3e, 0x7a, 0x3f, 0xd2, 0xa9, 0x6a, 0x0f, 0xe9, 0x35, 0x56, 0x1f, 0xb1, 0x4d, 0x10, 0x78, 0xd9, 0x75, 0xf6, 0xbc, 0x41, 0x04, 0x81, 0x61, 0x06, 0xf9, 0xad, 0xd6, 0xd5, 0x29, 0x7e, 0x86, 0x9e, 0x79, 0xe5, 0x05, 0xba, 0x84, 0xcc, 0x6e, 0x27, 0x8e, 0xb0, 0x5d, 0xa8, 0xf3, 0x9f, 0xd0, 0xa2, 0x71, 0xb8, 0x58, 0xdd, 0x2c, 0x38, 0x99, 0x4c, 0x48, 0x07, 0x55, 0xe4, 0x53, 0x8c, 0x46, 0xb6, 0x2d, 0xa5, 0xaf, 0x32, 0x22, 0x40, 0xdc, 0x50, 0xc3, 0xa1, 0x25, 0x8b, 0x9c, 0x16, 0x60, 0x5c, 0xcf, 0xfd, 0x0c, 0x98, 0x1c, 0xd4, 0x37, 0x6d, 0x3c, 0x3a, 0x30, 0xe8, 0x6c, 0x31, 0x47, 0xf5, 0x33, 0xda, 0x43, 0xc8, 0xe3, 0x5e, 0x19, 0x94, 0xec, 0xe6, 0xa3, 0x95, 0x14, 0xe0, 0x9d, 0x64, 0xfa, 0x59, 0x15, 0xc5, 0x2f, 0xca, 0xbb, 0x0b, 0xdf, 0xf2, 0x97, 0xbf, 0x0a, 0x76, 0xb4, 0x49, 0x44, 0x5a, 0x1d, 0xf0, 0x00, 0x96, 0x21, 0x80, 0x7f, 0x1a, 0x82, 0x39, 0x4f, 0xc1, 0xa7, 0xd7, 0x0d, 0xd1, 0xd8, 0xff, 0x13, 0x93, 0x70, 0xee, 0x5b, 0xef, 0xbe, 0x09, 0xb9, 0x77, 0x72, 0xe7, 0xb2, 0x54, 0xb7, 0x2a, 0xc7, 0x73, 0x90, 0x66, 0x20, 0x0e, 0x51, 0xed, 0xf8, 0x7c, 0x8f, 0x2e, 0xf4, 0x12, 0xc6, 0x2b, 0x83, 0xcd, 0xac, 0xcb, 0x3b, 0xc4, 0x4e, 0xc0, 0x69, 0x36, 0x62, 0x02, 0xae, 0x88, 0xfc, 0xaa, 0x42, 0x08, 0xa6, 0x45, 0x57, 0xd3, 0x9a, 0xbd, 0xe1, 0x23, 0x8d, 0x92, 0x4a, 0x11, 0x89, 0x74, 0x6b, 0x91, 0xfb, 0xfe, 0xc9, 0x01, 0xea, 0x1b, 0xf7, 0xce]

def decode_base64_package(base64str):
	base64dec = base64.decodestring(base64str)[20:] # we strip the first 20 chars (SHA1 hash, I don't bother checking it at the moment)
	decoded = ''
	for byte in range(0, len(base64dec)):
		decoded += chr(charmap[ord(base64dec[byte])])
	return zlib.decompress(decoded)
	

sys.stderr.write("=== Oracle 10g/11g PL/SQL unwrapper 0.2 - by Niels Teusink - blog.teusink.net ===\n\n" )
if len(sys.argv) < 2:
	sys.stderr.write("Usage: %s infile.plb [outfile]\n" % sys.argv[0])
	sys.exit(1)

infile = open(sys.argv[1])
outfile = None
if len(sys.argv) == 3:
	outfile = open(sys.argv[2], 'w')

lines = infile.readlines()
for i in range(0, len(lines)):
	# this is really naive parsing, but works on every package I've thrown at it
	matches = re.compile(r"^[0-9a-f]+ ([0-9a-f]+)$").match(lines[i])
	if matches:
		base64len = int(matches.groups()[0], 16)
		base64str = ''
		j = 0
		while len(base64str) < base64len:
			j+=1
			base64str += lines[i+j]
		base64str = base64str.replace("\n","")
		if outfile:
			outfile.write(decode_base64_package(base64str) + "\n")
		else:
			print decode_base64_package(base64str)

Even if this code looked straight forward on the first sight, it took me a moment or two to understand it. In fact I googled and found the following information helpful:

After flipping through all these pages I had some second thoughts about publishing an unwrapper, especially since David, Pete and Anton were a bit secretive about certain details such as the substitution table. Obviously I decided to publish it nonetheless. Is this really harmful? There are already a couple of other 10g unwrapper available, such as:

In the end this is just another PL/SQL Unwrapper. However, I believe it delivers some additional value, if Oracle’s SQL Developer is the IDE of your choice. This is how it looks like on Windows:

unwrapper-windows

The wrapped code will be replaced in the editor by the unwrapped code…

unwrapper-windows-2

…you have to pay attention to not save the unwrapped code by accident.

Grab your copy of Trivadis PL/SQL Unwrapper from the download area. I hope it is useful.

Update for PL/SQL Cop and PL/SQL Analyzer

$
0
0

Some people asked me to announce the availability of new versions of products on my web site. I guess a blog entry and a Twitter announcement should do the job. Today I’ve released the following three updates:

These products are always affected by a  grammar change to SQL*Plus, SQL or PL/SQL. The goal is to process all all valid SQL*Plus, SQL and PL/SQL code, however some limitations are documented here (e.g. a table alias named “inner” is not supported).

The links on the products above will show the associated changelog. The latest entries are mostly about bug fixing. If you are using the trial/preview version of PL/SQL Cop or PL/SQL Analyzer you might be glad to hear, that the included license is valid thru April, 30 2016.

Download the newest version from here.

Outer Join Operator (+) Restrictions in 12.1.0.2?

$
0
0

I’m currently reviewing a draft of Roger Troller’s updated PL/SQL and SQL Coding Guidelines version 3.0. One guideline recommends to use ANSI join syntax. The mentioned reasons are

ANSI join syntax does not have as many restrictions as the ORACLE join syntax has. Furthermore ANSI join syntax supports the full outer join. A third advantage of the ANSI join syntax is the separation of the join condition from the query filters.

While I read this I wondered which restrictions still exist for ORACLE join syntax nowadays and searched for “(+)” in the current Error Messages documentation (E49325-06) and found the following error messages:

  • ORA-01417: a table may be outer joined to at most one other table
  • ORA-01719: outer join operator (+) not allowed in operand of OR or IN
  • ORA-01799: a column may not be outer-joined to a subquery
  • ORA-25156: old style outer join (+) cannot be used with ANSI joins
  • ORA-30563: outer join operator (+) is not allowed here

In the 9.2 documentation (A96525-01) I found the following additional messages:

  • ORA-01416: two tables cannot be outer-joined to each other
  • ORA-01468: a predicate may reference only one outer-joined table

I’ve written SQL statements to produce the error message listed above on a 9.2.0.8 Oracle database and ran them on a 12.1.0.2 database as well to see which restrictions still exist for the outer join operator (+) as basis for my feedback to Roger. While writing the queries I thought this might be an interesting topic to blog about.

Examples

SELECT s.*, p.*
  FROM sh.sales s, sh.products p
 WHERE p.prod_id = s.prod_id(+)
       AND p.supplier_id(+) = s.channel_id;

An ORA-01416 is thrown in 9.2.0.8 and in 12.1.0.2. You cannot formulate such a query using ANSI join. Doing something like that does not make sense. It is not a relevant restriction. But it is interesting to see that an ORA-01416 is thrown in Oracle 12.1.0.2, even if this error message is not documented anymore.

SELECT s.*, c.*, p.*
  FROM sh.sales s, sh.customers c, sh.products p
 WHERE p.prod_id = s.prod_id(+)
       AND c.cust_id = s.cust_id(+);

An ORA-01417 is thrown in 9.2.0.8 but not in 12.1.0.2.

SELECT s.*, p.*
  FROM sh.sales s, sh.products p
 WHERE p.prod_id(+) = s.prod_id(+);

An ORA-01468 is thrown in 9.2.0.8 and in 12.1.0.2. You cannot formulate such a query using ANSI join. It could have been a way to formulate a full outer join, but something like that is not supported with Oracle join syntax. ORA-01468 is not documented in Oracle 12.1.0.2, but nonetheless this error is thrown. I do not consider this a relevant restriction for Oracle join-syntax.

SELECT s.*, p.*
  FROM sh.sales s, sh.products p
 WHERE p.prod_id(+) = s.prod_id
   AND p.prod_category(+) IN ('Boys', 'Girls');

An ORA-01719 is thrown in 9.2.0.8 but not in 12.1.0.2.

SELECT s.*
  FROM sh.sales s
 WHERE s.time_id(+) = (SELECT MAX(t.time_id)
                         FROM sh.times t);

An ORA-01799 is thrown in 9.2.0.8 and in 12.1.0.2. You cannot formulate such a query using ANSI join. Of course you may rewrite this to a valid Oracle join or ANSI join query. Here’s an example:

SELECT s.*, t.max_time_id
  FROM sh.sales s,
       (SELECT MAX(t.time_id) AS max_time_id
          FROM sh.times t) t
 WHERE s.time_id(+) = t.max_time_id;

Because the restriction applies to ANSI join as well, I do not consider this a relevant restriction for Oracle join syntax.

SELECT s.*, c.*, p.*
  FROM sh.sales s, sh.customers c
  JOIN sh.products p
    ON (p.prod_id = s.prod_id)
 WHERE c.cust_id = s.cust_id(+);

An ORA-25156 is thrown in 9.2.0.8 and in 12.1.0.2. This is not really a restriction for Oracle join syntax. The grammar simply does not support to mix join syntax variants.

SELECT lpad(' ', (LEVEL - 1) * 3) || to_char(e.empno) || ' ' || 
       e.ename(+) || 
       ' ' || d.dname AS emp_name
  FROM scott.emp e, scott.dept d
 WHERE e.deptno(+) = d.deptno
CONNECT BY PRIOR e.empno(+) = e.mgr
 START WITH e.ename(+) = 'KING'
 ORDER BY rownum, e.empno(+);

An ORA-30563 is thrown in 9.2.0.8 and 12.1.0.2. Interesting is that if you remove the (+) on the highlighted line 2 the query works on 9.2.0.8 but not on 12.1.0.2. Using the (+) in a CONNECT BY clause, START WITH clause, or ORDER BY clause does not make sense. It is not possible using ANSI-join as well. The important part is the join itself on line 5 and this is working in conjunction with a CONNECT BY. Therefore I do consider this an irrelevant restriction for the Oracle join syntax.

Summary

The results of the example relevant statements are summarized in the following table.

Error message by test SQLRelevant outer join restriction?Result in 9.2.0.8Result in 12.1.0.2
ORA-01416 two tables cannot be outer-joined to each otherNoErrorError
ORA-01417: a table may be outer joined to at most one other tableYesErrorOK
ORA-01468 a predicate may reference only one outer-joined tableNoErrorError
ORA-01719: outer join operator (+) not allowed in operand of OR or INYesErrorOK
ORA-01799: a column may not be outer-joined to a subqueryNoErrorError
ORA-25156: old style outer join (+) cannot be used with ANSI joinsNoErrorError
ORA-30563: outer join operator (+) is not allowed hereNoErrorError

Table 1: Outer join operator (+) restrictions in 9.2.0.8 and 12.1.0.2


In the most current Oracle version no relevant limitations exist regarding the Oracle join syntax. Hence choosing ANSI join syntax just because in the past some limitations existed is doing the right for the wrong reasons… I favor the ANSI join syntax because filter and join conditions are clearly separated. For full outer joins, there is simply no better performance option than to use ANSI join syntax. See also also Chris Antognini’s post about native full outer join.

Monitoring PL/SQL Code Evolution With PL/SQL Cop for SonarQube

$
0
0

Last week I’ve presented the PL/SQL Cop tool suite to a customer in Germany. While preparing the demo I had taken my first deeper look at the PL/SQL Cop SonarQube plugin, written by Peter Rohner, a fellow Trivadian. I was impressed how well the additional PL/SQL Cop metrics integrate into SonarQube and how easy it is to monitor the code evolution.

Before I show the code evolution I will go through the metric definitions based on fairly simple example. If you are not interested in the math, then feel free to skip reading the metric sections.

Password_Check, Version 0.1

As a starting point I use the following simplified password verification procedure, which ensures that every password contains a digit. I know this procedure is not a candidate for “good PL/SQL code”, but nonetheless it is based on a real live example. The goal of this piece of code is to explain some metrics, before starting to improve the code.

CREATE OR REPLACE PROCEDURE PASSWORD_CHECK (in_password IN VARCHAR2) IS -- NOSONAR
   co_digitarray CONSTANT STRING(10)     := '0123456789';
   co_one        CONSTANT SIMPLE_INTEGER := 1;
   co_errno      CONSTANT SIMPLE_INTEGER := -20501;
   co_errmsg     CONSTANT STRING(100)    := 'Password must contain a digit.';
   l_isdigit     BOOLEAN;
   l_len_pw      PLS_INTEGER;
   l_len_array   PLS_INTEGER;
BEGIN
   -- initialize variables
   l_isdigit := FALSE;
   l_len_pw := LENGTH(in_password);
   l_len_array := LENGTH(co_digitarray);
   <<check_digit>>
   FOR i IN co_one .. l_len_array
   LOOP
      <<check_pw_char>>
      FOR j IN co_one .. l_len_pw
      LOOP
         IF SUBSTR(in_password, j, co_one) = SUBSTR(co_digitarray, i, co_one) THEN
            l_isdigit := TRUE;
            GOTO check_other_things;
         END IF;
      END LOOP check_pw_char;
   END LOOP check_digit;
   <<check_other_things>>
   NULL;
   
   IF NOT l_isdigit THEN
      raise_application_error(co_errno, co_errmsg);
   END IF;
END password_check;
/

After running this code through PL/SQL Cop I get the following metrics. I show here just the SonarQube output, but the results are the same for the command line utility and the SQL Developer extension of SQL Cop.

password_check_v0.1

Simple Metrics

Here are the definitions of the simple metrics shown above.

  • Bytes – the number of bytes (1039)
  • Lines – the number of physical lines – lines separated by OS specific line separator (33)
  • Comment Lines – the number of comment lines – see line 10 (1)
  • Blank Lines – the number of empty lines – see line 28 (1)
  • Lines Of Code  – Lines minus comment lines minus blank lines (31)
  • Commands – the number of commands from a SQL*Plus point of view – see CREATE OR REPLACE PROCEDURE (1)
  • Functions – the number of program units – the password_check procedure (1)
  • Statements – the number of PL/SQL statements – 4 assignments, 2 FOR loops, 2 IF statements, 1 GOTO statement, 1 NULL statement, 1 procedure call (11)
  • Files – the number of files processed (1)
  • Directories – the number of directories processed (1)
  • Issues – the number of Trivadis PL/SQL & SQL Coding Guideline violations – Guideline 39: Never use GOTO statements in your code (1)

Simple metrics such as Lines of Code are an easy way to categorise the programs in a project. But the program with the most lines of code does not necessarily have to be the most complex one. Other metrics are better suited to identify the complex parts of a project. But it is important to have a good idea how such metrics are calculated, because no single metric is perfect. I typically identify programs to have a closer look at by a combination of metrics such as lines of code, statements, cyclomatic complexity and the number of severe issues.

See SonarQube documentation for the further metric definitions. Please note, that PL/SQL Cop does not calculate all metrics and some metrics are calculated a bit differently, e.g. Comment Lines.

SQALE Rating

SonarQube rates a project using the SQALE Rating which is is based on the Technical Dept Ratio and calculated as follows:

$$\text'tdr' = 100⋅{\text'Technical Debt'}/{\text'Development Cost'}$$

where

  • $\text'tdr'$ is defined as the technical dept ratio (1.6%)
  • $\text'Technical Debt'$ is defined as the estimated time to fix the issues, PL/SQL Cop defines the time to fix per issue type (0.25 hours)
  • $\text'Development Cost'$ is defined as the estimated time to develop the source code from scratch, the SonarQube default configuration is 30 minutes per Line of Code, you may amend the value on the Technical Dept page (15.5 hours)

The ranges for the SQALE Rating values A (very good) to E (very bad) are based on the SQALE Method Definition Document. SonarQube uses the default rating scheme “0.1,0.2,0.5,1” which may be amended on the Technical Debt page. The rating scheme defines the rating thresholds for A, B, C and D. Higher values lead to an E rating. Here is another way to represent the default rating scheme:

  • A: $\text'tdr'$ <= 10%
  • B: $\text'tdr'$ > 10% and $\text'tdr'$ <= 20%
  • C: $\text'tdr'$ > 20% and $\text'tdr'$ <= 50%
  • D: $\text'tdr'$ > 50% and $\text'tdr'$ <= 100%
  • E: $\text'tdr'$ > 100%

Based on the default SQALE Rating scheme, a project rated as “E” should be rewritten from scratch, since it would take more time to fix all issues.

McCabe’s Cyclomatic Complexity

Thomas J. McCabe introduced 1976 the metric Cyclomatic Complexity which counts the number of paths in the source code. SonarQube uses this metric to represent the complexity of a program. PL/SQL Cop calculates the cyclomatic complexity as follows:

$$M=E-N+2P$$

where

  • $M$ is defined as the cyclomatic complexity (6)
  • $E$ is defined as the number of edges (15)
  • $N$ is defined as the number of nodes (11)
  • $P$ is defined as the number of connected components/programs (1)

The higher the cyclomatic complexity, the more difficult it is to maintain the code.

Please note that PL/SQL Cop V1.0.16 adds an additional edge for ELSE branches in IF/CASE statements, for PL/SQL blocks and for GOTO statements. I consider this a bug. However, Toad Code Analysis (Xpert) calculates the Cyclomatic Complexity the very same way.

PL/SQL Cop calculates the Cyclomatic Complexity per program unit and provides the aggregated Max. Cyclomatic Complexity on file level.

Halstead Volume

Maurice H. Halstead introduced 1977 the metric Halstead Volume which defines the complexity based on the vocabulary and the total number of words/elements used within a program. In his work Halstead showed also how to express the complexity of academic abstracts using his metrics. PL/SQL Cop calculates the Halstead volume as follows:

$$V=N⋅log_2n$$

where

  • $V$ is defined as the Halstead Volume. (489.7)
  • $N$ is defined as the program length. $N=N_1+N_2$ (94)
  • $n$ is defined as the program vocabulary. $n=n_1+n_2$ (37)
  • $N_1$ is defined as the total number of operators (42)
  • $N_2$ is defined as the total number of operands (52)
  • $n_1$ is defined as the number of distinct operators (11)
  • $n_2$ is defined as the number of distinct operands (26)

using

  • the following operators: if, then, elsif, case, when, else, loop, for-loop, forall-loop, while-loop, exit, exit-when, goto, return, close, fetch, open, open-for, open-for-using, pragma, exception, procedure-call, assignment, function-call, sub-block, parenthesis, and, or, not, eq, ne, gt, lt, ge, le, semicolon, comma, colon, dot, like, between, minus, plus, star, slash, percent
  • the following operands: identifier, string, number

The higher the Halstead volume, the more difficult it is to maintain the code.

PL/SQL Cop calculates the Halstead Volume per program unit and provides the aggregated Max. Halstead Volume on file level.

Maintainability Index

Paul Oman and Jack Hagemeister introduced 1991 the metric Maintainability Index which weighs comments and combines it with Halstead Volume and Cyclomatic Complexity. PL/SQL Cop calculates the maintainability index as follows:

$$\text'MI'=\text'MI'woc+\text'MI'cw$$

where

  • $\text'MI'$ is defined as the Maintainablity Index (102.2)
  • $\text'MI'woc$ is defined as the $\text'MI'$ without comments. $\text'MI'woc=171−5.2⋅log_eaveV−0.23⋅aveM−16.2⋅log_eaveLOC$ (86.617)
  • $\text'MI'cw$ is defined as the $\text'MI'$ comment weight. $\text'MI'cw=50⋅sin(√{{2.4⋅aveC}/{aveLOC}})$ (15.549)
  • $aveV$ is defined as the average Halstead volume. $aveV={∑unitLOC⋅V}/{fileLOC}$ (489.7)
  • $aveM$ is defined as the average cyclomatic complexity. $aveM={∑unitLOC⋅M}/{fileLOC}$ (6)
  • $aveLOC$ is defined as the average lines of code including comments. $aveLOC={∑unitLOC}/{units}$ (24)
  • $aveC$ is defined as the average lines of comment. $aveC={∑unitC}/{units}$ (1)
  • $unitLOC$ is defined as the number of lines in a PL/SQL unit, without declare section (24)
  • $fileLOC$ is defined as the number of lines in source file (33)
  • $units$ is defined as the number of PL/SQL units in a file (1)
  • $unitC$ is defined as the number of comment lines in a PL/SQL unit (1)

The lower the maintainability index, the more difficult it is to maintain the code.

PL/SQL Cop calculates the Maintainability Index per program unit and provides the aggregated Min. Maintainability Index on file level.

Password_Check, Version 0.2 – Better

To get rid of the GOTO I’ve rewritten the procedure to use regular expressions to look for digits within the password. The code looks now as follows:

CREATE OR REPLACE PROCEDURE PASSWORD_CHECK (in_password IN VARCHAR2) IS
BEGIN
   IF NOT REGEXP_LIKE(in_password, '\d') THEN
      raise_application_error(-20501, 'Password must contain a digit.');
   END IF;
END;
/

After loading the new version into SonarQube, the dashboard looks as follows:

password_check_v0.2

Almost all metrics look better now. But instead of 1 major issue I have now 5 minor ones. This leads to a higher Technical Dept Ratio and a bad trend in this area. So let’s see what these minor issues are.

issues_password_check_v0.2

I consider all guideline violations as not worth to fix and marked them as “won’t fix”. After reloading the unchanged password_check.sql the SonarQube dashboard looks as follows:

password_check_v0.2e

 

The differences/improvement to the previous version is shown in parenthesis.

Password_Check, Version 0.3 – Even Better?

The version 0.2 code looks really good. No technical debt, no issues. A complexity of 2 and just 7 lines of code. But is it possible to improve this code further? Technically yes, especially since we know how the Maintainability Index is calculated. We could simply reduce the Lines of Code as follows:

CREATE OR REPLACE PROCEDURE PASSWORD_CHECK(in_password IN VARCHAR2)IS BEGIN IF NOT REGEXP_LIKE(in_password,'\d')THEN raise_application_error(-20501,'Password must contain a digit.');END IF;END;
/

And after loading the new version into SonarQube the dashboard looks as follows:

password_check_v0.3

Reducing the number of lines from 7 to 2 leads to better Maintainability Index, but the number of statements, the Cyclomatic Complexity and Halstead Volume are still the same. The change from version 0.2 to 0.3 reduces the readability of the code and has a negative value. That clearly shows, that the Maintainability Index has its flaws (see also https://avandeursen.com/2014/08/29/think-twice-before-using-the-maintainability-index/). There are various ways to discourage such kind of changes in a project. Using a formatter/beautifier with agreed settings is my favourite.

Code Evolution

SonarQube shows the metrics of the latest two versions in the dashboard. Use the Time Machine page to show metrics of more than two versions of project.

timemachine2

Or use the Compare page to compare metrics between versions or projects.

compare

Conclusion

Every metric has its flaws, for example

  • Lines of Code does not account for the code complexity
  • Cyclomatic Complexity does not account for the length of a program and the complexity of a statement
  • Halstead Volume does not account for the number of paths in the program
  • Maintainability index cannot distinguish between useful and useless comments and does not account for code formatting

But these metrics are still useful to identify complex programs, to measure code evolution (improvements, degradations) and to help you writing better PL/SQL, if you do not trust in metrics blindly.

PL/SQL Cop Meets oddgen

$
0
0

Until August 2015 it never occurred to me that one could use non-PL/SQL code within conditional compilation blocks. Back than we discussed various template engine options as foundation for oddgen – the Oracle community’s dictionary-driven code generator.

oddgen supports nowadays the in-database template engines FTLDB and tePLSQL. Both tools may access templates stored in PL/SQL packages using a selection directive. Here’s the package body of a generator using FTLDB:

CREATE OR REPLACE PACKAGE BODY ftldb_hello_world IS

$IF FALSE $THEN
--%begin generate_ftl
<#assign object_type = template_args[0]/>
<#assign object_name = template_args[1]/>
BEGIN
   sys.dbms_output.put_line('Hello ${object_type} ${object_name}!');
END;
${"/"}
--%end generate_ftl
$END

   FUNCTION generate(in_object_type IN VARCHAR2,
                     in_object_name IN VARCHAR2) RETURN CLOB IS
      l_result CLOB;
      l_args varchar2_nt;
   BEGIN
      l_args := NEW varchar2_nt(in_object_type, in_object_name);
      l_result := ftldb_api.process_to_clob(in_templ_name => $$PLSQL_UNIT || '%generate_ftl',
                                            in_templ_args => l_args);
      RETURN l_result;
   END generate;
END ftldb_hello_world;
/

The template is stored within the lines 4 to 11. It’s easy to see that the target code is PL/SQL, but the template itself contains various parts which do not comply with the PL/SQL language. The $IF on line 3 ensures that the template is compiled only when the condition is met. Never, in this case. You may be surprised, but yes, this trick really works.

However, if I check this code with PL/SQL Cop for SQL Developer 1.0.12 I get the following result:

plsqlcop1.12

Bad. This version of PL/SQL Cop cannot parse this code successfully, since it expects valid PL/SQL code within the conditional compilation blocks. While it has some advantages to include conditional PL/SQL code in a code analysis, it is simply worthless if the code cannot be parsed at all.

Therefore I released today new versions of all PL/SQL parser based products supporting non-PL/SQL code within conditional compilation blocks. And the result in PL/SQL Cop for SQL Developer 1.0.13 is:

plsqlcop1.13

Good. You see, this version parses such code without problems. There are still some limitations regarding the support of conditional compilation in DECLARE sections, but I’m glad that the parser is becoming more and more complete.

So it is time to update PL/SQL Analyzer, PL/SQL Cop and PL/SQL Cop for SQL Developer.

Thanks oddgen for driving this improvement.

How to Integrate Your PL/SQL Generators in SQL Developer

$
0
0

About three weeks ago Steven Feuerstein tweeted in his tip #501 a link to a generator for the WHEN clause in DML triggers on Oracle Live SQL. Back than I refactored the generator for oddgen – the Oracle community’s dictionary-driven code generator – and published the result on Oracle Live SQL as well. Some days ago Steven tweeted in tip #514 about generating a standardised table DDL and I thought about a short moment to refactor this generator as well, but decided against it. There are a lot of generators around which write their result to DBMS_OUTPUT or into intermediate/helper tables and I believe that it would be more helpful to show how such generators could be integrated into oddgen for SQL Developer. If you are overwhelmed by the length of this blog post (as I was) then I suggest that you scroll down to the bottom and look at the 39 seconds of audio-less video to see a generator in action.

1. Install oddgen for SQL Developer

I assume that you are already using SQL Developer 4.x. If not then it is about time that you grab the latest version from here and install it. It’s important to note that oddgen requires version 4 of SQL Developer and won’t run on older versions 3.x, 2.x and 1.x.

SQL Developer comes with a lot of “internal” extensions, but third party extensions need to be installed explicitly. To install oddgen for SQL Developer I recommend to follow the steps in installation via update center on oddgen.org. If this is not feasible because your company’s network restricts the internet access then download the latest version and install it from file.

To enable the oddgen window, select “Generators” from the “View” menu as shown in the following picture:

menu_view_generators

You are ready for the next steps, when the Generators window appears in the lower left corner within SQL Developer.

oddgen_generators_window

2. Install the Original Generator

If you are going to integrate your existing generator into SQL Developer this step sounds irrelevant. However, I find it useful to install and try generators in a fresh environment to ensure I have not missed some dependencies. I’ve got Steven Feuerstein’s permission to use his generator for this post. It’s a standalone PL/SQL procedure without dependencies. We install the generator “as is” in our database. I will use a schema named oddgen, but you may use another user/schema of course. See the create user DDL on Github if you are interested to know how I’ve setup the oddgen user.

-- 1:1 from https://livesql.oracle.com/apex/livesql/file/content_DBEO1MIGH5ZQOILUVV85I1UQC.html
CREATE OR REPLACE PROCEDURE gen_table_ddl ( 
   entity_in     IN VARCHAR2, 
   entities_in   IN VARCHAR2 DEFAULT NULL, 
   add_fky_in    IN BOOLEAN DEFAULT TRUE, 
   prefix_in     IN VARCHAR2 DEFAULT NULL, 
   in_apex_in    IN BOOLEAN DEFAULT FALSE) 
IS 
   c_table_name    CONSTANT VARCHAR2 (100) 
      := prefix_in || NVL (entities_in, entity_in || 's') ; 
 
   c_pkycol_name   CONSTANT VARCHAR2 (100) := entity_in || '_ID'; 
 
   c_user_code     CONSTANT VARCHAR2 (100) 
      := CASE 
            WHEN in_apex_in THEN 'NVL (v (''APP_USER''), USER)' 
            ELSE 'USER' 
         END ; 
 
   PROCEDURE pl (str_in                   IN VARCHAR2, 
                 indent_in                IN INTEGER DEFAULT 3, 
                 num_newlines_before_in   IN INTEGER DEFAULT 0) 
   IS 
   BEGIN 
      FOR indx IN 1 .. num_newlines_before_in 
      LOOP 
         DBMS_OUTPUT.put_line (''); 
      END LOOP; 
 
      DBMS_OUTPUT.put_line (LPAD (' ', indent_in) || str_in); 
   END; 
BEGIN 
   pl ('CREATE TABLE ' || c_table_name || '(', 0); 
   pl (c_pkycol_name || ' INTEGER NOT NULL,'); 
   pl ('created_by VARCHAR2 (132 BYTE) NOT NULL,'); 
   pl ('changed_by VARCHAR2 (132 BYTE) NOT NULL,'); 
   pl ('created_on DATE NOT NULL,'); 
   pl ('changed_on DATE NOT NULL'); 
   pl (');'); 
 
   pl ('CREATE SEQUENCE ' || c_table_name || '_SEQ;', 0, 1); 
   pl ( 
         'CREATE UNIQUE INDEX ' 
      || c_table_name 
      || ' ON ' 
      || c_table_name 
      || '(' 
      || c_pkycol_name 
      || ');', 
      0, 
      1); 
   pl ( 
         'CREATE OR REPLACE TRIGGER ' 
      || c_table_name 
      || '_bir  
      BEFORE INSERT ON ' 
      || c_table_name, 
      0, 
      1); 
   pl ('FOR EACH ROW DECLARE', 3); 
   pl ('BEGIN', 3); 
   pl ('IF :new.' || c_pkycol_name || ' IS NULL', 6); 
   pl ( 
         'THEN :new.' 
      || c_pkycol_name 
      || ' := ' 
      || c_table_name 
      || '_seq.NEXTVAL; END IF;', 
      6); 
 
   pl (':new.created_on := SYSDATE;', 6); 
   pl (':new.created_by := ' || c_user_code || ';', 6); 
   pl (':new.changed_on := SYSDATE;', 6); 
   pl (':new.changed_by := ' || c_user_code || ';', 6); 
   pl ('END ' || c_table_name || '_bir;', 3); 
 
   pl ('CREATE OR REPLACE TRIGGER ' || c_table_name || '_bur', 0, 1); 
   pl ('BEFORE UPDATE ON ' || c_table_name || ' FOR EACH ROW', 3); 
   pl ('DECLARE', 3); 
   pl ('BEGIN', 3); 
   pl (':new.changed_on := SYSDATE;', 6); 
   pl (':new.changed_by := ' || c_user_code || ';', 6); 
   pl ('END ' || c_table_name || '_bur;', 3); 
 
   pl ('ALTER TABLE ' || c_table_name || ' ADD  
      (CONSTRAINT ' || c_table_name, 
       0, 
       1); 
   pl ( 
         'PRIMARY KEY (' 
      || c_pkycol_name 
      || ')  
       USING INDEX ' 
      || c_table_name 
      || ' ENABLE VALIDATE);', 
      3); 
 
   IF add_fky_in 
   THEN 
      pl ( 
            'ALTER TABLE ' 
         || c_table_name 
         || ' ADD (CONSTRAINT fk_' 
         || c_table_name, 
         0, 
         1); 
      pl ('FOREIGN KEY (REPLACE_id)  
     REFERENCES qdb_REPLACE (REPLACE_id)', 3); 
      pl ('ON DELETE CASCADE ENABLE VALIDATE);', 3); 
   END IF; 
END;
/

3. Understand the Input and Output of the Original Generator

Before we start writing a wrapper for the original generator we need to understand its API. The purpose of the generator is described on Oracle Live SQL by Steven Feuerstein as follows:

I follow a few standards for table definitions, including: table name is plural; four standard audit columns (created by/when, updated by/when) with associated triggers; primary key name is [entity]_id, and more. This procedure (refactored from PL/SQL Challenge, the quiz website plsqlchallenge.oracle.com) gives me a consisting start point, from which I then add entity-specific columns, additional foreign keys, etc. Hopefully you will find it useful, too!

The procedure gen_table_ddl expects the following 5 input parameters (see highlighted lines 3 to 7 above):

Parameter NameDatatypeOptional?DefaultComments
entity_invarchar2Noused to name the primary key column
entities_invarchar2Yesentity_in || 's'used to name table, sequence, index, triggers and constraints
add_fky_inbooleanYestruetrue: generates a template for a foreign key constraint
false: does not generate a foreign key constraint template
prefix_invarchar2Yesnullprefix for all object names (named by entities_in)
in_apex_inbooleanYesfalsetrue: uses apex built-in variable APP_USER to populate created_by and changed_by. Uses pseudo column USERS only if APP_USER is empty
false: always use pseudo column USER to populate created_by and changed_by

By now we should have a decent understanding of the procedure input. But how is the output generated? It’s a procedure after all and there are no output parameters defined. Line 30 reveals the output mechanism. Every line is produced by the nested procedure pl which writes the result to the server output using the DBMS_OUTPUT package.

4. Understand the Basics of the oddgen PL/SQL Interface

When selecting a database connection, the oddgen extension searches the data dictionary for PL/SQL packages implementing the oddgen PL/SQL interface. Basically it looks for a package functions with the following signature:

FUNCTION generate(in_object_type IN VARCHAR2,
                  in_object_name IN VARCHAR2,
                  in_params      IN t_param) RETURN CLOB;

The interface is designed for generators based on existing database object types such as tables. Therefore it expects the object_type and the object_name as parameter one and two. For our generator the third parameter is the most interesting one. It allows us to pass additional parameters to the generator. The data type t_param is an associative array and based on the following definition:

SUBTYPE string_type IS VARCHAR2(1000 CHAR);
SUBTYPE param_type IS VARCHAR2(60 CHAR);
TYPE t_param IS TABLE OF string_type INDEX BY param_type;

Through in_params we may pass an unlimited number of key-value pairs to an oddgen generator.

But the oddgen interface is also responsible to define the representation in the GUI. Let’s look at an example of another generator named “Dropall”:

oddgen_dropall_generator

The “Dropall” node is selected and in the status bar its description is displayed. Under this node you find the object types “Indexes” and “Tables” but also an artificial object type named “All”. Under the object type nodes you find the list of all associated object names. This structure supports the following features:

  1. Generate code through a simple double-click on an object name node
  2. Select multiple object name nodes of the same object type to generate code via context-menu
  3. Show a dialog via context menu for selected object name nodes to change generator parameters

When a generator is called, the selected object name and its associated object type are passed to the generator. Always, without exception. However, for artificial object types and object names it might be okay to ignore these parameters in the generator implementation.

See the oddgen PL/SQL interface documentation on oddgen.org if you are interested in the details.

For the next steps it’s just important to know that we have to define the default behaviour of a generator and that a generator provides some information for the GUI only.

5. Write the Wrapper

The following screenshot shows our generator in SQL Developer after selecting “Generate…” from the context menu on the node “Snippet”:

oddgen_table_ddl_generator

The package specification for this generator looks as follows:

CREATE OR REPLACE PACKAGE gen_table_ddl_oddgen_wrapper IS
   SUBTYPE string_type IS VARCHAR2(1000 CHAR);
   SUBTYPE param_type IS VARCHAR2(60 CHAR);
   TYPE t_string IS TABLE OF string_type;
   TYPE t_param IS TABLE OF string_type INDEX BY param_type;
   TYPE t_lov IS TABLE OF t_string INDEX BY param_type;

   FUNCTION get_name RETURN VARCHAR2;

   FUNCTION get_description RETURN VARCHAR2;

   FUNCTION get_object_types RETURN t_string;

   FUNCTION get_object_names(in_object_type IN VARCHAR2) RETURN t_string;

   FUNCTION get_params RETURN t_param;

   FUNCTION get_ordered_params RETURN t_string;

   FUNCTION get_lov RETURN t_lov;

   FUNCTION generate(in_object_type IN VARCHAR2,
                     in_object_name IN VARCHAR2,
                     in_params      IN t_param) RETURN CLOB;
END gen_table_ddl_oddgen_wrapper;
/

I’m going to explain some parts of the of the wrapper implementation based on the package body for this oddgen wrapper:

CREATE OR REPLACE PACKAGE BODY gen_table_ddl_oddgen_wrapper IS
   co_entity   CONSTANT param_type := 'Entity name (singular, for PK column)';
   co_entities CONSTANT param_type := 'Entity name (plural, for object names)';
   co_add_fky  CONSTANT param_type := 'Add foreign key?';
   co_prefix   CONSTANT param_type := 'Object prefix';
   co_in_apex  CONSTANT param_type := 'Data populated through APEX?';

   FUNCTION get_name RETURN VARCHAR2 IS
   BEGIN
      RETURN 'Table DDL snippet';
   END get_name;

   FUNCTION get_description RETURN VARCHAR2 IS
   BEGIN
      RETURN 'Steven Feuerstein''s starting point, from which he adds entity-specific columns, additional foreign keys, etc.';
   END get_description;

   FUNCTION get_object_types RETURN t_string IS
   BEGIN
      RETURN NEW t_string('TABLE');
   END get_object_types;

   FUNCTION get_object_names(in_object_type IN VARCHAR2) RETURN t_string IS
   BEGIN
      RETURN NEW t_string('Snippet');
   END get_object_names;

   FUNCTION get_params RETURN t_param IS
      l_params t_param;
   BEGIN
      l_params(co_entity) := 'employee';
      l_params(co_entities) := NULL;
      l_params(co_add_fky) := 'Yes';
      l_params(co_prefix) := NULL;
      l_params(co_in_apex) := 'No';
      RETURN l_params;
   END get_params;

   FUNCTION get_ordered_params RETURN t_string IS
   BEGIN
      RETURN NEW t_string(co_entity, co_entities, co_add_fky, co_prefix);
   END get_ordered_params;

   FUNCTION get_lov RETURN t_lov IS
      l_lov t_lov;
   BEGIN
      l_lov(co_add_fky) := NEW t_string('Yes', 'No');
      l_lov(co_in_apex) := NEW t_string('Yes', 'No');
      RETURN l_lov;
   END get_lov;

   FUNCTION generate(in_object_type IN VARCHAR2,
                     in_object_name IN VARCHAR2,
                     in_params      IN t_param) RETURN CLOB IS
      l_lines    sys.dbms_output.chararr;
      l_numlines INTEGER := 10; -- buffer size
      l_result   CLOB;
   
      PROCEDURE enable_output IS
      BEGIN
         sys.dbms_output.enable(buffer_size => NULL); -- unlimited size
      END enable_output;
   
      PROCEDURE disable_output IS
      BEGIN
         sys.dbms_output.disable;
      END disable_output;
   
      PROCEDURE call_generator IS
      BEGIN
         gen_table_ddl(entity_in   => in_params(co_entity),
                       entities_in => in_params(co_entities),
                       add_fky_in  => CASE
                                         WHEN in_params(co_add_fky) = 'Yes' THEN
                                          TRUE
                                         ELSE
                                          FALSE
                                      END,
                       prefix_in   => in_params(co_prefix),
                       in_apex_in  => CASE
                                         WHEN in_params(co_in_apex) = 'Yes' THEN
                                          TRUE
                                         ELSE
                                          FALSE
                                      END);
      END call_generator;
   
      PROCEDURE copy_dbms_output_to_result IS
      BEGIN
         sys.dbms_lob.createtemporary(l_result, TRUE);
         <<read_dbms_output_into_buffer>>
         WHILE l_numlines > 0
         LOOP
            sys.dbms_output.get_lines(l_lines, l_numlines);
            <<copy_buffer_to_clob>>
            FOR i IN 1 .. l_numlines
            LOOP
               sys.dbms_lob.append(l_result, l_lines(i) || chr(10));
            END LOOP copy_buffer_to_clob;
         END LOOP read_dbms_output_into_buffer;
      END copy_dbms_output_to_result;
   BEGIN
      enable_output;
      call_generator;
      copy_dbms_output_to_result;
      disable_output;
      RETURN l_result;
   END generate;
END gen_table_ddl_oddgen_wrapper;
/

On line 1 to 6 constants for every parameter are defined. The values are used as labels in the GUI.

The function get_name (line 10) defines the name used in the GUI for this generator.

The function get_description (line 15) returns a description of the generator. The description is shown in the status bar, as tool-tip and in the generator dialog.

The function get_object_types (line 20) defines the valid object types. We’ve chosen the object type “TABLE” because it represents the target code quite well. Using a known object types leads also to a nice icon representation.

The function get_object_names (line 25) defines the valid object names for an object type.  The parameter in_object_type is not used since our list is static and contains just one value “Snippet”.

The function get_params (line 31-35) defines the list of input parameters for the generator (beside object type and object name). Here we define our five parameters with their default values. The default values are important to generate meaningful code when double clicking on the object name node “Snippet”. So, by default a table named “employees” with a foreign key template and triggers for non-APEX usage is generated.

The function get_ordered_params (line 41) defines the order of the parameters in the generator dialog. Such a definition is necessary since the original order ist lost. That’s expected behaviour for an associative array indexed by string. The default order by name is not very intuitive in this case.

The function get_lovs  (line 47-48) defines the list-of-values per input parameter. We use “Yes” and “No” for the boolean parameters co_add_fky and co_in_apex since oddgen supports string parameters only. However, in the GUI typical boolean value pairs such as “Yes-No”, “1-0”, “true-false” are recognised  based on the list-of-values definition and are represented as checkboxes. Hence, for the user it does not matter that technically no boolean parameters are used.

The function generate (line 103-107) defines the steps to produce the generated code. Each step is represented by a nested procedure call:

  • enable output – enables the dbms_output with unlimited buffer size
  • call_generator – calls the original generator code using the parameters passed by oddgen
  • copy_dbms_output_to_result – copies the output of the original generator into the result CLOB
  • disable_output – disables dbms_output

Finally the generated code is returned as CLOB.

6. Grant Access

To ensure that the generator is available for every user connecting to the instance, we have to grant access rights on the PL/SQL wrapper package. Granting the package to public is probably the easiest way.

GRANT EXECUTE ON gen_table_ddl_oddgen_wrapper TO PUBLIC;

 

7. Run in SQL Developer

Now we may run the generator in SQL Developer. The following video is 39 seconds long, contains no audio signal and shows how to generate the DDLs  with default parameters and how to run the generator with amended parameters to generate the DDLs for a table to be used in an APEX application.

 

8. Conclusion

Every PL/SQL based code generator producing a document (CLOB, XMLTYPE, JSON), messages via DBMS_OUTPUT or records in tables can be integrated into SQL Developer using the oddgen extension. The effort depends on the number of parameters and their valid values. For simple generators this won’t take more than a few minutes, especially if you are an experienced oddgen user.

I hope you found this post useful. Your comments and feedback is very much appreciated.

PL/SQL Bulk Unwrap

$
0
0

406 days ago I’ve released PL/SQL Unwrapper for SQL Developer version 0.1.1 and blogged about it. With this extension you can unwrap the content of a SQL Developer window. Time for an update. With the new verison 1.0 you can unwrap multiple selected objects with a few mouse clicks. In this blog post I show how.

1. Install Extensions

I assume that you are already using SQL Developer 4.0.2 or higher. If not then it is about time that you grab the latest version from here and install it. It’s important to note that the extensions won’t run in older versions of SQL Developer.

Configure the update centers http://update.salvis.com/ and http://update.oddgen.org/ to install the extensions for SQL Developer:

updates_oddgen_unwrapper

If you cannot use update center because your company’s network restricts the internet access then download the latest versions and install them from file.

Why download oddgen for SQL Developer? Because the bulk unwrap feature is implemented as oddgen plugin. Unwrapping an editor content works without oddgen, but for bulk unwrap you need oddgen.

2. Setup Test Environment

If you have a schema in your Oracle database with wrapped code you may skip this step and use this schema for bulk unwrap.

For the test environment I’ve used Morten Braten’s Alexandria PL/SQL Utility Library. Clone or download the library from GitHub. To install the library you need a dedicated user. Create such a user as SYS on your Oracle database instance as follows:

CREATE USER ax IDENTIFIED BY ax 
DEFAULT TABLESPACE users
TEMPORARY TABLESPACE temp;

ALTER USER ax QUOTA UNLIMITED ON users;

GRANT connect, resource TO ax;
GRANT execute ON dbms_crypto TO ax;

Then run the install.sql script in the setup directory of the Alexandria PL/SQL Utility Library as user AX.

@install.sql

Wrap the PL/SQL code except package and type specifications in schema AX by running the script wrap_schema.sql:

3. Bulk Unwrap

Start SQL Developer and open a connection as user AX on your database.

If the oddgen window is not visible then select “Generators” from the “View” menu as shown in the following picture:

menu_view_generators

Afterwards the Generators window appears in the lower left corner within SQL Developer.

generators

Select the open connection in combo box of the Generators window. Open the “PL/SQL Unwrapper” node and the “Package Bodies” node to show all wrapped package body names.

generators2

Select some or all package body nodes and press Return to generate the unwrapped code in a new worksheet. Afterwards you just may execute the generated code. Add “SET DEFINE OFF” at the start of the script to ensure unwrapped code containing ampersand (&) characters is processed correctly. Another option is to configure a connection startup script (login.sql) to change the default behaviour.

The following audioless video shows in just 56 seconds the whole bulk unwrapping process in detail.

 

I hope you find this new feature useful.

Trivadis PL/SQL & SQL Coding Guidelines Version 3.1

$
0
0

The latest version 3.1 of the Trivadis PL/SQL & SQL Coding Guidelines has 150 pages. More than 90 additional pages compared to version 2.0. Roger Troller did a tremendous job in updating and extending an already comprehensive document while making it simpler to read and easier understand. In this post I will emphasise some changes I consider relevant.

New Guideline Categorisation Scheme

In version 2.0 coding guidelines are categorised by icons for information, caution, performance relevance, maintainability and readability. A guideline is associated exactly with one icon. Here’s an example:

guideline_12

In version 3.1 the characteristics for changeability, efficiency, maintainability, portability, reliability, reusability, security and testability as defined by the Software Quality Assessment based on Lifecycle Expectations (SQALE) methodology are used to categorise guidelines. A guideline is associated with one or more SQALE characteristics. Additionally a guideline is assigned to a severity (blocker, critical, major, minor, info). So guidelines are categorised in two dimensions: SQALE characteristics and severity. These categorisations are used to filter guidelines in SonarQube or PL/SQL Cop to be enabled or disabled. It’s not by chance that SonarQube is using exactly these categorisations.

Here’s the same example as above using this new guideline categorisation scheme:

guideline_2150

In this excerpt you see other changes as well. The reference to the CodeXpert rule is gone, the guideline 12 got a new identifier 2150 and there is a good and bad example.

Good and Bad Examples for Every Guideline

In version 2.0 some guidelines had no examples, some just an excerpt of an example, some just a good and some just a bad example. Now in version 3.1 almost every guideline has a complete bad and a complete good example. With complete I mean that they are executable in SQL*Plus, SQLcl or within your IDE of choice. Why “almost”? For example, there is this guideline 65/7210 which says “Try to keep your packages small. Include only few procedures and functions that are used in the same context”. So, in some cases it is just not feasible/helpful to include a complete example.

For me as the guy who is responsible to write rules to check the compliance of the guidelines, good and bad examples are essentials for unit testing. Such examples also help the developer to understand guidelines. That’s why we include these examples in PL/SQL Cop.

New Guidelines

Beside some changes in categorisation and presentation of the guidelines, there are some new guidelines which I’d like to mention here:

IDGuidelineSeveritySQALE Characteristics
2230Try to use SIMPLE_INTEGER datatype when appropriate.MinorEfficiency
3150Try to use identity columns for surrogate keys.MinorMaintainability, Reliability
3160Avoid virtual columns to be visible.MajorMaintainability, Reliability
3170Always use DEFAULT ON NULL declarations to assign default values to table columns if you refuse to store NULL values.MajorReliability
3180Always specify column names instead of positional references in ORDER BY clauses.MajorChangeability, Reliability
3190Avoid using NATURAL JOIN.MajorChangeability, Reliability
5010Try to use a error/logging framework for your application.CriticalReliability, Reusability, Testability
7460Try to define your packaged/standalone function to be deterministic if appropriate.MajorEfficiency
7810Do not use SQL inside PL/SQL to read sequence numbers (or SYSDATE)MajorEfficiency, Maintainability
8120Never check existence of a row to decide whether to create it or not.MajorEfficiency, Reliability
8310Always validate input parameter size by assigning the parameter to a size limited variable in the declaration section of program unit.MinorMaintainability, Reliability, Reusability, Testability
8410Always use application locks to ensure a program unit only running once at a given time.MinorEfficiency, Reliability
8510Always use dbms_application_info to track programm process transientlyMinorEfficiency, Reliability

Deprecated Guidelines

The guideline 54 “Avoid use of EXCEPTION_INIT pragma for a -20,NNN error” is not part of the document anymore.

New Guideline Identifiers

All guidelines got a new identifier. The first digit identifies the chapter of the document, e.g. “1” for “4.1 General”, “2” for “4.2 Variables & Types”,  etc. The second digit is reserved for the sub-chapters and the remaining digits are just for ordering purposes. The gaps in the numbering scheme should allow to add future guidelines at the right place without renumbering everything (again).

There is an appendix to map old guideline identifiers to new ones. This should simplify the change to version 3.1. Here’s an excerpt:

appendix_a

Tool Support

PL/SQL Cop is mentioned in the guidelines. However, currently only the Trivadis PL/SQL & SQL Guidelines Version 2.0 are supported. But sometime in Q4 of 2016 an update supporting version 3.1 should be available.

Download

Get your copy of the Trivadis PL/SQL & SQL Guidelines Version 3.1 from here.

Bitemp Remodeler v0.1.0 Released

$
0
0

I’ve been working on a flexible table API generator for Oracle Databases since several months. A TAPI generator doesn’t sound like a real innovation. But this one contains some features you probably have not seen before in TAPI generator and hopefully will like it as much as I do.

In this post I will not explain the feature set thoroughly. Instead I will more or less focus on one of my favourite features.

Four models

The generators knows the following four data models.

four_models

If your table is based on one of these four models you may

  1. simply generate a table API for it or
  2. switch to another model and optionally generate a table API as well.

Option 2) is extraordinary, since it will preserve the existing data. E.g. it will preserve the content of the flashback data archive when you switch your model from uni-temporal transaction-time to a bi-temporal model even if the flashback archive tables need to be moved to another table. Furthermore it will keep the interface for the latest table the same. No application change required. Everything with just a few mouse clicks. If this sounds interesting for you, then have a look at https://github.com/oddgen/bitemp/blob/master/README.md where the concept is briefly explained or join me my session “oddgen – Bi-temporal Table API in Action” at the More than just – Performance Days 2016. Remote participation is still possible.

Option 1) is what we had since years. It was part of Oracle Designer, it’s part of SQL Developer in a simplified way and there are a some more or less simple table API generators around. So no big deal. However, when you choose option 1), there is one part which is really cool. The hook API package concept.

The Hook API

The problem with a lot of table API solution is, that there is typically no developer friendly way to include the business logic. I’ve seen the following:

  • Manual changes of the generated code, which is for various reason not a good solution.
  • External hooks, e.g. in XML files, in INI files, relational tables, etc. and merged at generation time into the final code. Oracle Designer worked that way.
  • Code which is dynamically executed by the generator at runtime, e.g. code snippets are stored in an pre-defined way in relational tables.

But what I’ve never seen, was business logic implemented in manually crafted PL/SQL packages, separated from the PL/SQL generated code. That’s strange, because this is a common practice in Java based projects.

In Java you typically define an interface for that and configure at runtime the right implementation. In PL/SQL we may do that similarly. A PL/SQL specification is an interface definition. That just one implementation may exist for an interface is not a limiting factor in this case.

Bitemp Remodeler generates the following hook API package specification for the famous EMP table in schema SCOTT:

CREATE OR REPLACE PACKAGE emp_hook AS
   /** 
   * Hooks called by non-temporal API for table emp_lt (see package body of emp_api)
   * generated by Bitemp Remodeler for SQL Developer.
   * The body of this package is not generated. It has to be crafted and maintained manually. 
   * Since the API for table emp_lt ignores errors caused by a missing hook package body, the implementation is optional.
   *
   * @headcom
   */

   /**
   * Hook called before insert into non-temporal table emp_lt.
   *
   * @param io_new_row new Row to be inserted
   */
   PROCEDURE pre_ins (
      io_new_row IN OUT emp_ot
   );

   /**
   * Hook called after insert into non-temporal table emp_lt.
   *
   * @param in_new_row new Row to be inserted
   */
   PROCEDURE post_ins (
      in_new_row IN emp_ot
   );

   /**
   * Hook called before update non-temporal table emp_lt.
   *
   * @param io_new_row Row with updated column values
   * @param in_old_row Row with original column values
   */
   PROCEDURE pre_upd (
      io_new_row IN OUT emp_ot,
      in_old_row IN emp_ot
   );

   /**
   * Hook called after update non-temporal table emp_lt.
   *
   * @param in_new_row Row with updated column values
   * @param in_old_row Row with original column values
   */
   PROCEDURE post_upd (
      in_new_row IN emp_ot,
      in_old_row IN emp_ot
   );

   /**
   * Hook called before delete from non-temporal table emp_lt.
   *
   * @param in_old_row Row with original column values
   */
   PROCEDURE pre_del (
      in_old_row IN emp_ot
   );

   /**
   * Hook called after delete from non-temporal table emp_lt.
   *
   * @param in_old_row Row with original column values
   */
   PROCEDURE post_del (
      in_old_row IN emp_ot
   );

END emp_hook;
/

The generated table API calls before an INSERT the pre_ins procedure and after the INSERT the post_ins procedures. For DELETE and UPDATE this works the same way. On the highlighted line 5 and 6 two interested things are pointed out. The body is not generated and the body does not need to be implemented since the API ignores errors caused by a missing PL/SQL hook package body.

Technically this is solved as follows in the API package body:

CREATE OR REPLACE PACKAGE BODY emp_api AS
   --
   -- Note: SQL Developer 4.1.3 cannot produce a complete outline of this package body, because it cannot handle
   --       the complete flashback_query_clause. The following expression breaks SQL Developer:
   --
   --          VERSIONS PERIOD FOR vt$ BETWEEN MINVALUE AND MAXVALUE
   --
   --       It's expected that future versions will be able to handle the flashback_query_clause accordingly.
   --       See "Bug 24608738 - OUTLINE OF PL/SQL PACKAGE BODY BREAKS WHEN USING PERIOD FOR OF FLASHBACK_QUERY_"
   --       on MOS for details.
   --

   --
   -- Declarations to handle 'ORA-06508: PL/SQL: could not find program unit being called: "SCOTT.EMP_HOOK"'
   --
   e_hook_body_missing EXCEPTION;
   PRAGMA exception_init(e_hook_body_missing, -6508);

   --
   -- Debugging output level
   --
   g_debug_output_level dbms_output_level_type := co_off;

   --
   -- print_line
   --
   PROCEDURE print_line (
      in_proc  IN VARCHAR2,
      in_level IN dbms_output_level_type,
      in_line  IN VARCHAR2
   ) IS
   BEGIN
      IF in_level <= g_debug_output_level THEN
         sys.dbms_output.put(to_char(systimestamp, 'HH24:MI:SS.FF6'));
         CASE in_level
            WHEN co_info THEN
               sys.dbms_output.put(' INFO  ');
            WHEN co_debug THEN
               sys.dbms_output.put(' DEBUG ');
            ELSE
               sys.dbms_output.put(' TRACE ');
         END CASE;
         sys.dbms_output.put(substr(rpad(in_proc,27), 1, 27) || ' ');
         sys.dbms_output.put_line(substr(in_line, 1, 250));
      END IF;
   END print_line;

   --
   -- print_lines
   --
   PROCEDURE print_lines (
      in_proc  IN VARCHAR2,
      in_level IN dbms_output_level_type,
      in_lines IN CLOB
   ) IS
   BEGIN
      IF in_level <= g_debug_output_level THEN
         <<all_lines>>
         FOR r_line IN (
            SELECT regexp_substr(in_lines, '[^' || chr(10) || ']+', 1, level) AS line       
              FROM dual
           CONNECT BY instr(in_lines, chr(10), 1, level - 1) BETWEEN 1 AND length(in_lines) - 1
         ) LOOP
            print_line(in_proc => in_proc, in_level => in_level, in_line => r_line.line);
         END LOOP all_lines;
      END IF;
   END print_lines;


   --
   -- do_ins
   --
   PROCEDURE do_ins (
      io_row IN OUT emp_ot
   ) IS
   BEGIN
      INSERT INTO emp_lt (
                     empno,
                     ename,
                     job,
                     mgr,
                     hiredate,
                     sal,
                     comm,
                     deptno
                  )
           VALUES (
                     io_row.empno,
                     io_row.ename,
                     io_row.job,
                     io_row.mgr,
                     io_row.hiredate,
                     io_row.sal,
                     io_row.comm,
                     io_row.deptno
                  )
        RETURNING empno
             INTO io_row.empno;
      print_line(
         in_proc  => 'do_ins', 
         in_level => co_debug, 
         in_line  => SQL%ROWCOUNT || ' rows inserted.'
      );
   END do_ins;

   --
   -- do_upd
   --
   PROCEDURE do_upd (
      io_new_row IN OUT emp_ot,
      in_old_row IN emp_ot
   ) IS
   BEGIN
      UPDATE emp_lt
         SET empno = io_new_row.empno, 
             ename = io_new_row.ename, 
             job = io_new_row.job, 
             mgr = io_new_row.mgr, 
             hiredate = io_new_row.hiredate, 
             sal = io_new_row.sal, 
             comm = io_new_row.comm, 
             deptno = io_new_row.deptno
       WHERE empno = in_old_row.empno
         AND (
                 (ename != io_new_row.ename OR ename IS NULL AND io_new_row.ename IS NOT NULL OR ename IS NOT NULL AND io_new_row.ename IS NULL) OR
                 (job != io_new_row.job OR job IS NULL AND io_new_row.job IS NOT NULL OR job IS NOT NULL AND io_new_row.job IS NULL) OR
                 (mgr != io_new_row.mgr OR mgr IS NULL AND io_new_row.mgr IS NOT NULL OR mgr IS NOT NULL AND io_new_row.mgr IS NULL) OR
                 (hiredate != io_new_row.hiredate OR hiredate IS NULL AND io_new_row.hiredate IS NOT NULL OR hiredate IS NOT NULL AND io_new_row.hiredate IS NULL) OR
                 (sal != io_new_row.sal OR sal IS NULL AND io_new_row.sal IS NOT NULL OR sal IS NOT NULL AND io_new_row.sal IS NULL) OR
                 (comm != io_new_row.comm OR comm IS NULL AND io_new_row.comm IS NOT NULL OR comm IS NOT NULL AND io_new_row.comm IS NULL) OR
                 (deptno != io_new_row.deptno OR deptno IS NULL AND io_new_row.deptno IS NOT NULL OR deptno IS NOT NULL AND io_new_row.deptno IS NULL)
             );
      print_line(
         in_proc  => 'do_upd', 
         in_level => co_debug, 
         in_line  => SQL%ROWCOUNT || ' rows updated.'
      );
   END do_upd;

   --
   -- do_del
   --
   PROCEDURE do_del (
      in_row IN emp_ot
   ) IS
   BEGIN
      DELETE 
        FROM emp_lt
       WHERE empno = in_row.empno;
      print_line(
         in_proc  => 'do_del', 
         in_level => co_debug, 
         in_line  => SQL%ROWCOUNT || ' rows deleted.'
      );
   END do_del;

   --
   -- ins
   --
   PROCEDURE ins (
      in_new_row IN emp_ot
   ) IS
      l_new_row emp_ot;
   BEGIN
      print_line(in_proc => 'ins', in_level => co_info, in_line => 'started.');
      l_new_row := in_new_row;
      <<pre_ins>>
      BEGIN
         emp_hook.pre_ins(io_new_row => l_new_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END pre_ins;
      do_ins(io_row => l_new_row);
      <<post_ins>>
      BEGIN
         emp_hook.post_ins(in_new_row => l_new_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END post_ins;
      print_line(in_proc => 'ins', in_level => co_info, in_line => 'completed.');
   END ins;

   --
   -- upd
   --
   PROCEDURE upd (
      in_new_row IN emp_ot,
      in_old_row IN emp_ot
   ) IS
      l_new_row emp_ot;
   BEGIN
      print_line(in_proc => 'upd', in_level => co_info, in_line => 'started.');
      l_new_row := in_new_row;
      <<pre_upd>>
      BEGIN
         emp_hook.pre_upd(io_new_row => l_new_row, in_old_row => in_new_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END pre_upd;
      do_upd(io_new_row => l_new_row, in_old_row => in_old_row);
      <<post_upd>>
      BEGIN
         emp_hook.post_upd(in_new_row => l_new_row, in_old_row => in_old_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END post_upd;
      print_line(in_proc => 'upd', in_level => co_info, in_line => 'completed.');
   END upd;

   --
   -- del
   --
   PROCEDURE del (
      in_old_row IN emp_ot
   ) IS
   BEGIN
      print_line(in_proc => 'del', in_level => co_info, in_line => 'started.');
      <<pre_del>>
      BEGIN
         emp_hook.pre_del(in_old_row => in_old_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END pre_del;
      do_del(in_row => in_old_row);
      <<post_del>>
      BEGIN
         emp_hook.post_del(in_old_row => in_old_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END post_del;
      print_line(in_proc => 'del', in_level => co_info, in_line => 'completed.');
   END del;

   --
   -- set_debug_output
   --
   PROCEDURE set_debug_output (
      in_level IN dbms_output_level_type DEFAULT co_off
   ) IS
   BEGIN
      g_debug_output_level := in_level;
   END set_debug_output;

END emp_api;
/

CREATE OR REPLACE PACKAGE BODY emp_api AS
   --
   -- Note: SQL Developer 4.1.3 cannot produce a complete outline of this package body, because it cannot handle
   --       the complete flashback_query_clause. The following expression breaks SQL Developer:
   --
   --          VERSIONS PERIOD FOR vt$ BETWEEN MINVALUE AND MAXVALUE
   --
   --       It's expected that future versions will be able to handle the flashback_query_clause accordingly.
   --       See "Bug 24608738 - OUTLINE OF PL/SQL PACKAGE BODY BREAKS WHEN USING PERIOD FOR OF FLASHBACK_QUERY_"
   --       on MOS for details.
   --

   --
   -- Declarations to handle 'ORA-06508: PL/SQL: could not find program unit being called: "SCOTT.EMP_HOOK"'
   --
   e_hook_body_missing EXCEPTION;
   PRAGMA exception_init(e_hook_body_missing, -6508);

   --
   -- Debugging output level
   --
   g_debug_output_level dbms_output_level_type := co_off;

   --
   -- print_line
   --
   PROCEDURE print_line (
      in_proc  IN VARCHAR2,
      in_level IN dbms_output_level_type,
      in_line  IN VARCHAR2
   ) IS
   BEGIN
      IF in_level <= g_debug_output_level THEN
         sys.dbms_output.put(to_char(systimestamp, 'HH24:MI:SS.FF6'));
         CASE in_level
            WHEN co_info THEN
               sys.dbms_output.put(' INFO  ');
            WHEN co_debug THEN
               sys.dbms_output.put(' DEBUG ');
            ELSE
               sys.dbms_output.put(' TRACE ');
         END CASE;
         sys.dbms_output.put(substr(rpad(in_proc,27), 1, 27) || ' ');
         sys.dbms_output.put_line(substr(in_line, 1, 250));
      END IF;
   END print_line;

   --
   -- print_lines
   --
   PROCEDURE print_lines (
      in_proc  IN VARCHAR2,
      in_level IN dbms_output_level_type,
      in_lines IN CLOB
   ) IS
   BEGIN
      IF in_level <= g_debug_output_level THEN
         <<all_lines>>
         FOR r_line IN (
            SELECT regexp_substr(in_lines, '[^' || chr(10) || ']+', 1, level) AS line       
              FROM dual
           CONNECT BY instr(in_lines, chr(10), 1, level - 1) BETWEEN 1 AND length(in_lines) - 1
         ) LOOP
            print_line(in_proc => in_proc, in_level => in_level, in_line => r_line.line);
         END LOOP all_lines;
      END IF;
   END print_lines;


   --
   -- do_ins
   --
   PROCEDURE do_ins (
      io_row IN OUT emp_ot
   ) IS
   BEGIN
      INSERT INTO emp_lt (
                     empno,
                     ename,
                     job,
                     mgr,
                     hiredate,
                     sal,
                     comm,
                     deptno
                  )
           VALUES (
                     io_row.empno,
                     io_row.ename,
                     io_row.job,
                     io_row.mgr,
                     io_row.hiredate,
                     io_row.sal,
                     io_row.comm,
                     io_row.deptno
                  )
        RETURNING empno
             INTO io_row.empno;
      print_line(
         in_proc  => 'do_ins', 
         in_level => co_debug, 
         in_line  => SQL%ROWCOUNT || ' rows inserted.'
      );
   END do_ins;

   --
   -- do_upd
   --
   PROCEDURE do_upd (
      io_new_row IN OUT emp_ot,
      in_old_row IN emp_ot
   ) IS
   BEGIN
      UPDATE emp_lt
         SET empno = io_new_row.empno, 
             ename = io_new_row.ename, 
             job = io_new_row.job, 
             mgr = io_new_row.mgr, 
             hiredate = io_new_row.hiredate, 
             sal = io_new_row.sal, 
             comm = io_new_row.comm, 
             deptno = io_new_row.deptno
       WHERE empno = in_old_row.empno
         AND (
                 (ename != io_new_row.ename OR ename IS NULL AND io_new_row.ename IS NOT NULL OR ename IS NOT NULL AND io_new_row.ename IS NULL) OR
                 (job != io_new_row.job OR job IS NULL AND io_new_row.job IS NOT NULL OR job IS NOT NULL AND io_new_row.job IS NULL) OR
                 (mgr != io_new_row.mgr OR mgr IS NULL AND io_new_row.mgr IS NOT NULL OR mgr IS NOT NULL AND io_new_row.mgr IS NULL) OR
                 (hiredate != io_new_row.hiredate OR hiredate IS NULL AND io_new_row.hiredate IS NOT NULL OR hiredate IS NOT NULL AND io_new_row.hiredate IS NULL) OR
                 (sal != io_new_row.sal OR sal IS NULL AND io_new_row.sal IS NOT NULL OR sal IS NOT NULL AND io_new_row.sal IS NULL) OR
                 (comm != io_new_row.comm OR comm IS NULL AND io_new_row.comm IS NOT NULL OR comm IS NOT NULL AND io_new_row.comm IS NULL) OR
                 (deptno != io_new_row.deptno OR deptno IS NULL AND io_new_row.deptno IS NOT NULL OR deptno IS NOT NULL AND io_new_row.deptno IS NULL)
             );
      print_line(
         in_proc  => 'do_upd', 
         in_level => co_debug, 
         in_line  => SQL%ROWCOUNT || ' rows updated.'
      );
   END do_upd;

   --
   -- do_del
   --
   PROCEDURE do_del (
      in_row IN emp_ot
   ) IS
   BEGIN
      DELETE 
        FROM emp_lt
       WHERE empno = in_row.empno;
      print_line(
         in_proc  => 'do_del', 
         in_level => co_debug, 
         in_line  => SQL%ROWCOUNT || ' rows deleted.'
      );
   END do_del;

   --
   -- ins
   --
   PROCEDURE ins (
      in_new_row IN emp_ot
   ) IS
      l_new_row emp_ot;
   BEGIN
      print_line(in_proc => 'ins', in_level => co_info, in_line => 'started.');
      l_new_row := in_new_row;
      <<pre_ins>>
      BEGIN
         emp_hook.pre_ins(io_new_row => l_new_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END pre_ins;
      do_ins(io_row => l_new_row);
      <<post_ins>>
      BEGIN
         emp_hook.post_ins(in_new_row => l_new_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END post_ins;
      print_line(in_proc => 'ins', in_level => co_info, in_line => 'completed.');
   END ins;

   --
   -- upd
   --
   PROCEDURE upd (
      in_new_row IN emp_ot,
      in_old_row IN emp_ot
   ) IS
      l_new_row emp_ot;
   BEGIN
      print_line(in_proc => 'upd', in_level => co_info, in_line => 'started.');
      l_new_row := in_new_row;
      <<pre_upd>>
      BEGIN
         emp_hook.pre_upd(io_new_row => l_new_row, in_old_row => in_new_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END pre_upd;
      do_upd(io_new_row => l_new_row, in_old_row => in_old_row);
      <<post_upd>>
      BEGIN
         emp_hook.post_upd(in_new_row => l_new_row, in_old_row => in_old_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END post_upd;
      print_line(in_proc => 'upd', in_level => co_info, in_line => 'completed.');
   END upd;

   --
   -- del
   --
   PROCEDURE del (
      in_old_row IN emp_ot
   ) IS
   BEGIN
      print_line(in_proc => 'del', in_level => co_info, in_line => 'started.');
      <<pre_del>>
      BEGIN
         emp_hook.pre_del(in_old_row => in_old_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END pre_del;
      do_del(in_row => in_old_row);
      <<post_del>>
      BEGIN
         emp_hook.post_del(in_old_row => in_old_row);
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END post_del;
      print_line(in_proc => 'del', in_level => co_info, in_line => 'completed.');
   END del;

   --
   -- set_debug_output
   --
   PROCEDURE set_debug_output (
      in_level IN dbms_output_level_type DEFAULT co_off
   ) IS
   BEGIN
      g_debug_output_level := in_level;
   END set_debug_output;

END emp_api;
/

Now you may ask what the performance impact of these e_hook_body_missing exceptions are. I’ve done a small test and called a procedure without and with implemented body 1 million times. The overhead of the missing body exception is about 7 microseconds per call. Here’s the test output from SQL Developer, the  relevant lines 51 and 89 are highlighted.

SQL> SET FEEDBACK ON
SQL> SET ECHO ON
SQL> SET TIMING ON
SQL> DROP PACKAGE dummy_api;

Package DUMMY_API dropped.

Elapsed: 00:00:00.027
SQL> DROP PACKAGE dummy_hook;

Package DUMMY_HOOK dropped.

Elapsed: 00:00:00.030
SQL> CREATE OR REPLACE PACKAGE dummy_hook AS
   PROCEDURE pre_ins;
END dummy_hook;
/

Package DUMMY_HOOK compiled

Elapsed: 00:00:00.023
SQL> CREATE OR REPLACE PACKAGE dummy_api AS
   PROCEDURE ins;
END dummy_api;
/

Package DUMMY_API compiled

Elapsed: 00:00:00.034
SQL> CREATE OR REPLACE PACKAGE BODY dummy_api AS
   e_hook_body_missing EXCEPTION;
   PRAGMA exception_init(e_hook_body_missing, -6508);  
   PROCEDURE ins IS
   BEGIN
      BEGIN
         dummy_hook.pre_ins;
      EXCEPTION
         WHEN e_hook_body_missing THEN
            NULL;
      END pre_ins;
      dbms_output.put('.');
   END ins;
END dummy_api;
/

Package body DUMMY_API compiled

Elapsed: 00:00:00.040
SQL> -- without hook body
SQL> BEGIN
   FOR i IN 1..1E6 LOOP
      dummy_api.ins;
   END LOOP;
END;
/

PL/SQL procedure successfully completed.

Elapsed: 00:00:07.878
SQL> CREATE OR REPLACE PACKAGE BODY dummy_hook AS
   PROCEDURE pre_ins IS
   BEGIN
      dbms_output.put('-');
   END pre_ins;
END dummy_hook;
/

Package body DUMMY_HOOK compiled

Elapsed: 00:00:00.029
SQL> -- with hook body
SQL> BEGIN
   FOR i IN 1..1E6 LOOP
      dummy_api.ins;
   END LOOP;
END;
/

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.632

It make sense to provide a body with a NULL implementation to avoid the small overhead of handling the missing body exception.

Nonetheless, the way how the business logic is separated from the generated code, is one of the many things I like about Bitemp Remodeler.

Download Bitemp Remodeler from the Download section on my blog or install it directly via the SQL Developer update site http://update.oddgen.org/

Why and How Using the accessible_by_clause

$
0
0

The accessible_by_clause was introduced in Oracle Database 12 Release 1 and extended in Release 2. If you do no know this feature, I suggest to have a look into the documentation or read Steven Feuerstein’s blog post.

In this blog post I talk about how to use this feature properly.

Consider you have a schema the_api and there’s a package math with the following signature:

CREATE OR REPLACE PACKAGE the_api.math AS
   /**
   * Calculates the sum of all integers found in a string.
   *
   * @param in_integers string containing integers to be summarized
   * @returns sum, NULL if no integers are found
   */
   FUNCTION get_sum(
      in_integers IN VARCHAR2
   ) RETURN INTEGER DETERMINISTIC;

   /**
   * Calculates the digit sum of an integer.
   *
   * @param in_integer input integer to calculate cross sum from 
   * @returns cross sum, NULL if input is NULL
   */
   FUNCTION get_cross_sum(
      in_integer IN INTEGER
   ) RETURN INTEGER DETERMINISTIC;
END math;
/

The next query uses the provided functions get_sum and get_cross_sum:

SELECT the_api.math.get_sum('What is the sum of 5, 7, 13 and 17?') AS the_sum,
       the_api.math.get_cross_sum(3456789) AS the_cross_sum
  FROM dual;

   THE_SUM THE_CROSS_SUM
---------- -------------
        42            42

No accessible_by_clause

In an Oracle Database 11g the package body might be implemented like this:

CREATE OR REPLACE PACKAGE BODY the_api.math AS
   FUNCTION get_sum(
      in_integers IN sys.ora_mining_number_nt
   ) RETURN INTEGER DETERMINISTIC IS
      l_result INTEGER;
   BEGIN
      SELECT sum(column_value)
        INTO l_result
        FROM table(in_integers);
      RETURN l_result;
   END get_sum;

   FUNCTION to_int_table(
      in_integers IN VARCHAR2,
      in_pattern  IN VARCHAR2 DEFAULT '[0-9]+'
   ) RETURN sys.ora_mining_number_nt DETERMINISTIC IS
      l_result sys.ora_mining_number_nt := sys.ora_mining_number_nt();
      l_pos    INTEGER := 1;
      l_int    INTEGER;
   BEGIN
      <<integer_tokens>>
      LOOP
         l_int := to_number(regexp_substr(in_integers, in_pattern, 1, l_pos));
         EXIT integer_tokens WHEN l_int IS NULL;
         l_result.EXTEND;
         l_result(l_pos) := l_int;
         l_pos := l_pos + 1;
      END LOOP integer_tokens;
      RETURN l_result;
   END to_int_table;

   FUNCTION get_sum(
      in_integers IN VARCHAR2
   ) RETURN INTEGER DETERMINISTIC IS
   BEGIN
      RETURN get_sum(to_int_table(in_integers));
   END get_sum;

   FUNCTION get_cross_sum(
      in_integer IN INTEGER
   ) RETURN INTEGER DETERMINISTIC IS
   BEGIN
      RETURN get_sum(to_int_table(to_char(in_integer), '[0-9]'));       
   END get_cross_sum;
END math;
/

The private functions get_sum and to_int_table are doing the real work. Here are some issues with this code

1. Use of undocumented collection type sys.ora_mining_number_nt

The private functions are avoiding the use of an own type, something like CREATE TYPE t_integer_type IS TABLE OF INTEGER. This shortcut is hidden and not part of the API. It is not super elegant, but quite common and easy to fix, when Oracle decides to remove this collection type in a future release or to protect it by an accessible_by_clause. So I do not consider this a real problem and will not deal with it in this blog post.

2. Private function definitions must be ordered according its usage

The private functions are listed at the top of the package body, hence no forward declarations are necessary. Forward declarations lead to some confusion, since IDEs do not distinguish between declarations and definitions in the outline window and you often end up selecting the wrong one. However, without forward declaration you have to order your private functions according its usage, which might break your domain specific ordering logic.

3. Private functions are not documented

I usually document the signature of a PL/SQL unit in the package specification only, that’s supported by PLDoc. Hence the private functions are treated like second class citizens and left undocumented.

4. Private functions cannot be unit tested

I’m not really a testing advocate. But as a developer I’d like to know if my code works. I have to run it somehow. Usually more than once to get a working result. Hence I create scripts or unit tests. It is not possible to unit test the private functions directly. They have to be tested through the public functions get_sum and get_cross_sum. In this case I’d like to test the private function to_int_table directly.

The accessible_by_clause can address issues 2, 3 and 4 without implicitly extending the API.

Package-Level accessible_by_clause

In Oracle Database 12c Release 1 the accessible_by_clause was introduced on package level. This allows us to move the private functions from package math into into a dedicated package math_internal with restricted access.  Here’s the refactoring result:

CREATE OR REPLACE PACKAGE the_api.math_internal  
   ACCESSIBLE BY (PACKAGE the_api.math, PACKAGE the_api.test_math_internal) 
AS
   /**
   * Calculates the sum of all integers in a collection.
   *
   * @param in_integers collection of integers to be summarized
   * @returns sum, NULL if collection is empty
   */
   FUNCTION get_sum(
     in_integers IN sys.ora_mining_number_nt
   ) RETURN INTEGER DETERMINISTIC;

   /**
   * Finds integer tokens in string.
   *
   * @param in_integers string containing integers to be tokenized
   * @param in_pattern regular expression for integers
   * @returns table of integers
   */
    FUNCTION to_int_table(
      in_integers IN VARCHAR2,
      in_pattern  IN VARCHAR2 DEFAULT '[0-9]+'
   ) RETURN sys.ora_mining_number_nt DETERMINISTIC;
END math_internal;
/

CREATE OR REPLACE PACKAGE the_api.math AS
   /**
   * Calculates the sum of all integers found in a string.
   *
   * @param in_integers string containing integers to be summarized
   * @returns sum, NULL if no integers are found
   */
   FUNCTION get_sum(
      in_integers IN VARCHAR2
   ) RETURN INTEGER DETERMINISTIC;

   /**
   * Calculates the digit sum of an integer.
   *
   * @param in_integer input integer to calculate cross sum from 
   * @returns cross sum, NULL if input is NULL
   */
   FUNCTION get_cross_sum(
      in_integer IN INTEGER
   ) RETURN INTEGER DETERMINISTIC;
END math;
/

The accessible_by_clause defined on line 2 restricts the access to the package math and the package test_math_internal.  It is important to note that the units referenced in the accessible_by_clause are checked when compiling PL/SQL definitions, hence it is perfectly fine to list PL/SQL units in the accessible_by_clause  which might not exist in a production environment, such as the utPLSQL unit test package test_math_internal.

With this change I address the previously mentioned issues 2, 3 and 4.

  • There are no private functions anymore, hence the order in the code is irrelevant
  • All functions are documented
  • All functions can be unit tested

Looks good, right? Yes and no. This change created some new issues.

5. Splitting code that belongs together

The original math package was reasonable small and contained the whole processing logic. Now the package is divided into two packages and the code is spread into 4 files in the VCS (2 package specification files and 2 package body files).  The accessiblity_clause is driving my PL/SQL code structure. This might be good in some cases, but in this case I do not like it.

6. Accessibility per package lead to more code splitting

Remember I just wanted to unit test the function to_int_table. Now I can also unit test the function get_sum since it is defined in the same package. It would look incomplete, if my unit tests would not cover get_sum, right? So, if I want to express that get_sum does not need an explicit unit test, I have to split the package math_internal further. For example into math_internal1 and math_internal2 and only the one containing the function to_int_table will have an accessor for the test package. This clearly shows that the granularity of the accessible_by_clause is too coarse grained.

 We can address these issues with an accessible_by_clause on unit level.

Unit-Level accessible_by_clause

Since Oracle Database 12c Release 2 the accessible_by_clause can be defined per package suprogram. This allows us to keep all subprograms in one package while addressing all previously described issues. Here’s the refactoring result:

CREATE OR REPLACE PACKAGE the_api.math AS
   /**
   * Calculates the sum of all integers found in a string.
   *
   * @param in_integers string containing integers to be summarized
   * @returns sum, NULL if no integers are found
   */
   FUNCTION get_sum(
      in_integers IN VARCHAR2
   ) RETURN INTEGER DETERMINISTIC;

   /**
   * Calculates the digit sum of an integer.
   *
   * @param in_integer input integer to calculate cross sum from 
   * @returns cross sum, NULL if input is NULL
   */
   FUNCTION get_cross_sum(
      in_integer IN INTEGER
   ) RETURN INTEGER DETERMINISTIC;

   /**
   * Calculates the sum of all integers in a collection.
   *
   * @param in_integers collection of integers to be summarized
   * @returns sum, NULL if collection is empty
   */
   FUNCTION get_sum(
     in_integers IN sys.ora_mining_number_nt
   ) RETURN INTEGER DETERMINISTIC 
   ACCESSIBLE BY (PACKAGE the_api.math);

   /**
   * Finds integer tokens in string.
   *
   * @param in_integers string containing integers to be tokenized
   * @param in_pattern regular expression for integers
   * @returns table of integers
   */
    FUNCTION to_int_table(
      in_integers IN VARCHAR2,
      in_pattern  IN VARCHAR2 DEFAULT '[0-9]+'
   ) RETURN sys.ora_mining_number_nt DETERMINISTIC
   ACCESSIBLE BY (PACKAGE the_api.math, PACKAGE the_api.test_math);
END math;
/

On line 31 the access to the overloaded function get_sum is restricted to this package. It’s semantically clear that this function cannot be unit tested. On line 44 the access to the function to_int_table is restricted to this package and the package test_math. Hence it is possible to unit test this function in the package test_math.  The package math is not split up and the access to the original private functions are properly protected.

The package body looks quite similar to the original one. I’ve just put the access restricted units at bottom, to match the order and the signature in the specification.

CREATE OR REPLACE PACKAGE BODY the_api.math AS
   FUNCTION get_sum(in_integers IN VARCHAR2) RETURN INTEGER DETERMINISTIC IS
   BEGIN
      RETURN math.get_sum(math.to_int_table(in_integers));
   END get_sum;

   FUNCTION get_cross_sum(in_integer IN INTEGER) RETURN INTEGER DETERMINISTIC IS
   BEGIN
      RETURN math.get_sum(math.to_int_table(to_char(in_integer), '[0-9]'));       
   END get_cross_sum;

   FUNCTION get_sum(
      in_integers IN sys.ora_mining_number_nt
   ) RETURN INTEGER DETERMINISTIC
      ACCESSIBLE BY (PACKAGE the_api.math)
   IS
      l_result INTEGER;
   BEGIN
      SELECT sum(column_value)
        INTO l_result
        FROM table(in_integers);
      RETURN l_result;
   END get_sum;

   FUNCTION to_int_table(
      in_integers IN VARCHAR2,
      in_pattern  IN VARCHAR2 DEFAULT '[0-9]+'
   ) RETURN sys.ora_mining_number_nt DETERMINISTIC 
      ACCESSIBLE BY (PACKAGE the_api.math, PACKAGE the_api.test_math)
   IS
      l_result sys.ora_mining_number_nt := sys.ora_mining_number_nt();
      l_pos    INTEGER := 1;
      l_int    INTEGER;
   BEGIN
      <<integer_tokens>>
      LOOP
         l_int := to_number(regexp_substr(in_integers, in_pattern, 1, l_pos));
         EXIT integer_tokens WHEN l_int IS NULL;
         l_result.EXTEND;
         l_result(l_pos) := l_int;
         l_pos := l_pos + 1;
      END LOOP integer_tokens;
      RETURN l_result;
   END to_int_table;
END math;
/

Taking about unit testing without showing a unit test is a bit inauthentic. So, here is the utPLSQL test package:

CREATE OR REPLACE PACKAGE the_api.test_math IS
   --%suite

   --%test
   PROCEDURE get_sum_1;

   --%test
   PROCEDURE get_sum_2;
 
   --%test
   PROCEDURE get_cross_sum_1;

   --%test
   PROCEDURE get_cross_sum_2;

   --%test
   PROCEDURE to_int_table_1;

   --%test
   PROCEDURE to_int_table_2;
END test_math;
/

CREATE OR REPLACE PACKAGE BODY the_api.test_math IS
   PROCEDURE get_sum_1 IS
   BEGIN
      ut.expect(42).to_equal(the_api.math.get_sum('What is the sum of 5, 7, 13 and 17?'));
   END get_sum_1;
   
   PROCEDURE get_sum_2 IS
   BEGIN
      ut.expect(CAST(NULL AS INTEGER)).to_equal(the_api.math.get_sum('What is the sum?'));
   END get_sum_2; 

   PROCEDURE get_cross_sum_1 IS
   BEGIN
      ut.expect(42).to_equal(the_api.math.get_cross_sum(3456789));
   END get_cross_sum_1;
   
   PROCEDURE get_cross_sum_2 IS
   BEGIN
      ut.expect(CAST(NULL AS INTEGER)).to_equal(the_api.math.get_cross_sum(NULL));
   END get_cross_sum_2;

   PROCEDURE to_int_table_1 IS
      l_expected sys.ora_mining_number_nt;
      l_actual   sys.ora_mining_number_nt;
   BEGIN
      l_expected := sys.ora_mining_number_nt(5, 7, 13, 17);
      l_actual := math.to_int_table('What is the sum of 5, 7, 13 and 17?');
      ut.expect(anydata.convertCollection(l_expected)).to_equal(anydata.convertCollection(l_actual));
   END to_int_table_1;

   PROCEDURE to_int_table_2 IS
      l_expected sys.ora_mining_number_nt;
      l_actual   sys.ora_mining_number_nt;
   BEGIN
      l_expected := sys.ora_mining_number_nt();
      l_actual := math.to_int_table(NULL);
      ut.expect(anydata.convertCollection(l_expected)).to_equal(anydata.convertCollection(l_actual));
   END to_int_table_2;
END test_math;
/

Running utPLSQL tests is easy, see:

SET SERVEROUTPUT ON SIZE UNLIMITED
EXECUTE ut.run('THE_API.TEST_MATH');

test_math
  get_sum_1 [.004 sec]
  get_sum_2 [.004 sec]
  get_cross_sum_1 [.004 sec]
  get_cross_sum_2 [.004 sec]
  to_int_table_1 [.009 sec]
  to_int_table_2 [.007 sec]
 
Finished in .034975 seconds
6 tests, 0 failed, 0 errored, 0 disabled, 0 warning(s)
 


PL/SQL procedure successfully completed.

Conclusion

Start using the accessible_by_clause. But using the accessible_by_clause should not drive the way you structure your PL/SQL code. It certainly should not lead to a code splitting avalanche. Hence I favour the definition of the accessible_by_clause on subprogram level.

 


MemOptimized RowStore in Oracle Database 18c

$
0
0

The MemOptimized RowStore introduced in Oracle Database 18c is designed to improve performance of simple queries accessing data via primary key columns only. An example of such a query is SELECT value FROM t WHERE key = :key where key is the only primary key column of table t. This feature is available for the following Oracle Database offerings only (see Licensing Information User Manual):

  • Oracle Database Enterprise Edition on Engineered Systems (EE-ES)
  • Oracle Database Cloud Service Enterprise Edition – Extreme Performance (DBCS EE-EP)
  • Oracle Database Exadata Cloud Service (ExaCS)

For this blog post I’ve used a Docker container running an Oracle Database 18c version 18.2.0.0.180417 on my MacBook Pro (Late 2016). The initialization parameter _exadata_feature_on=TRUE enabled technically the MemOptimized RowStore. This means that I expect the feature to work but with different performance metrics as on one of the officially supported environments.

Concept

The MemOptimized RowStore is conceptually best documented in Database Concepts. The idea is to store a heap-organized table completely in the memory within a subarea of the SGA. This subarea is named Memoptimized Pool and consists of the following two parts:

  • Memoptimize Buffer Area

This is a dedicated buffer cache for table blocks. 75% of the memoptimized pool are reserved for this buffer cache.

  • Hash Index

A hash index is a hash table/map as we now it from Java and other programming languages (associative array in PL/SQL). The primary key columns are used as key and a pointer to the block in the memoptimize buffer area is used as value. The hash index uses the other 25% of the memoptimized pool.

The size of the memoptimized pool is set by the initialization parameter MEMOPTIMIZE_POOL_SIZE. The default size is 0. Changing the value requires to restart the database. The minimum size is 100M.

The following conditions must be met to use the MemOptimized RowStore:

  1. The table is marked as MEMOPTIMIZE FOR READ. See the memoptimize_read_clause for CREATE TABLE and ALTER TABLE statements.
  2. The table is heap-organized.
  3. The table has a primary key.
  4. The primary key is not an identity column.
  5. The table is not compressed.
  6. The table is not reference-partitioned.
  7. The table has at least one segment (use SEGMENT CREATION IMMEDIATE when creating tables).
  8. The table has been loaded in the memoptimized pool using dbms_memoptimize.populate.
  9. The table fits completely in the memoptimized pool.
  10. The query must be in the format SELECT <column_list> FROM <table> WHERE <primary_key_column> = <value>. The result columns must derive from the underlying table. Multiple primary key columns are supported, in this case all primary key columns have to be defined in the where_clause. Additional predicates are not allowed.
  11. The initialization parameter STATISTICS_LEVEL must not be set to ALL.
  12. The optimizer hint GATHER_PLAN_STATISTICS must not be used.
  13. SQL trace must not be enabled.
  14. The query must not be executed within PL/SQL (neither static nor dynamic SQL are supported).
  15. The query must not be executed using the default database connection in a Java stored procedure.

If all conditions are met, then the row is fetched without a single logical I/O. In all other cases you get either an error message or the query is executed the conventional way, using at least 3 logical I/Os (1 I/O for the root index block, 1 I/O for the index leaf block, 1 I/O for the table block).

Existing applications do not need to change their code to use the MemOptimized RowStore (beside some DDL).

But how can a row be fetched without an I/O? Technically it is a new kind of I/O which is no longer accounted for the wait event consistent gets. Instead these operations are reported with new wait events:

  • memopt r lookups – counting every hash index lookup (regardless of the result)
  • memopt r hits– counting every successful hash index lookup (primary key found)
  • memopt r misses– counting every unsuccessful hash index lookup (primary key not found)

There are 65 wait events in v$statname for the MemOptimized RowStore. You find some descriptions in the Database Reference as well.

But why is this supposed to be faster than a single block access in the keep pool using a single-table hash cluster? The answer is given in the Introducing Oracle Database 18c whitepaper:

Key-value lookups then bypass the SQL execution layer and execute directly in the data access layer via an in-memory hash index.

And what are the expected performance gains? – I have not found any numbers in the documentation. Unfortunately my tests are not conclusive in this area, since I’m running them in an unsupported environment. However, I’ve found an answer on the Ask TOM website, where Maria Colgan states the following:

the rowstore can be approximately 25% faster than a single-table hash cluster

Configure Database

Before we can use the MemOptimized RowStore we have to set the size of the memoptimized pool. In this case I set the minimum size and restart the database.

ALTER SYSTEM SET memoptimize_pool_size = 100M SCOPE=SPFILE;
SHUTDOWN IMMEDIATE
STARTUP

Now the database has reserved 75 megabyte for the memoptimize buffer area and 25 megabyte for the hash index.

Create and Populate Table

Let’s create an empty table t4 with an memoptimize_read_clause.

CREATE TABLE t4 (
   key    INTEGER            NOT NULL,
   value  VARCHAR2(30 CHAR)  NOT NULL,
   CONSTRAINT t4_pk PRIMARY KEY (key)
) 
SEGMENT CREATION IMMEDIATE
MEMOPTIMIZE FOR READ;

Please note, that the primary key definition on line 4 is required to avoid an ORA-62142: MEMOPTIMIZE FOR READ feature requires NOT DEFERRABLE PRIMARY KEY constraint on the table. And without the clause on line 6 we’d get an ORA-62156: MEMOPTIMIZE FOR READ feature not allowed on segment with deferred storage.

I use the following anonymous PL/SQL block to populate the table t4 with 100,000 rows and gather table statistics.

BEGIN
   dbms_random.seed(0);
   INSERT INTO t4 (key, value)
   SELECT rownum AS key, 
          dbms_random.string('x', round(dbms_random.value(5, 30), 0)) AS value
     FROM xmltable('1 to 100000');
   COMMIT;
   dbms_stats.gather_table_stats(ownname=>USER, tabname=>'T4');
END;
/

We may now run a query in a SQL client. I’ve used SQLcl because of the comprehensive statistics when using autotrace. The output is the result of the second execution.

SET LINESIZE 100
SET AUTOTRACE ON
SELECT * FROM t4 WHERE key = 42;

       KEY VALUE                         
---------- ------------------------------
        42 UKPBW05FQ1                    

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT                                                                                   
----------------------------------------------------------------------------------------------------
Plan hash value: 1143490106                                                                         
                                                                                                    
------------------------------------------------------------------------------------------------    
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |    
------------------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT                       |       |     1 |    24 |     2   (0)| 00:00:01 |    
|   1 |  TABLE ACCESS BY INDEX ROWID READ OPTIM| T4    |     1 |    24 |     2   (0)| 00:00:01 |    
|*  2 |   INDEX UNIQUE SCAN READ OPTIM         | T4_PK |     1 |       |     1   (0)| 00:00:01 |    
------------------------------------------------------------------------------------------------    
                                                                                                    
Predicate Information (identified by operation id):                                                 
---------------------------------------------------                                                 
                                                                                                    
   2 - access("KEY"=42)                                                                             

Statistics
-----------------------------------------------------------
               1  CPU used by this session
               1  CPU used when call started
               2  DB time
              42  Requests to/from client
              42  SQL*Net roundtrips to/from client
               3  buffer is not pinned count
             598  bytes received via SQL*Net from client
           83338  bytes sent via SQL*Net to client
               3  calls to get snapshot scn: kcmgss
               2  calls to kcmgcs
               3  consistent gets
               3  consistent gets examination
               3  consistent gets examination (fastpath)
               3  consistent gets from cache
               2  execute count
               1  index fetch by key
           24576  logical read bytes from cache
               1  memopt r lookups
               1  memopt r misses
              43  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
               2  parse count (total)
               1  rows fetched via callback
               1  session cursor cache hits
               3  session logical reads
               1  sorts (memory)
            2011  sorts (rows)
               1  table fetch by rowid
              45  user calls

The execution plan on line 20 and 21 shows the intention to use the MemOptimized RowStore (READ OPTIM). But the 3 consistent gets on line 41 indicate that a conventional index access has been used. We get the proof on line 48 and 49. There was an access to the hash index (1 memopt r lookups), but no key with the value 42 has been found (1 memopt r misses). Hence the fallback to the conventional unique index access.

Populate MemOptimized RowStore

The following anonymous PL/SQL block populates the memoptimized pool for table t4.

BEGIN
   dbms_memoptimize.populate(schema_name=>USER, table_name=>'T4');
END;
/

It is important to note, that the memoptimized pool is populated in the background by a space management slave process. This call is just a request, with a own wait event memopt r populate tasks accepted.  Usually this is pretty fast, but to be sure you can check the relevant wait event before and after calling dbms_memoptimize.populate. Here’s an example:

SELECT n.name, s.value
  FROM v$sysstat s 
  JOIN v$statname n
    ON n.statistic# = s.statistic#
 WHERE n.name = 'memopt r rows populated';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
memopt r rows populated                                              100000

100,000 rows are now in the memoptimized pool. Let’s query table t4 again.

SET LINESIZE 100
SET AUTOTRACE ON
SELECT * FROM t4 WHERE key = 42;

       KEY VALUE                         
---------- ------------------------------
        42 UKPBW05FQ1                    

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT                                                                                   
----------------------------------------------------------------------------------------------------
Plan hash value: 1143490106                                                                         
                                                                                                    
------------------------------------------------------------------------------------------------    
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |    
------------------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT                       |       |     1 |    24 |     2   (0)| 00:00:01 |    
|   1 |  TABLE ACCESS BY INDEX ROWID READ OPTIM| T4    |     1 |    24 |     2   (0)| 00:00:01 |    
|*  2 |   INDEX UNIQUE SCAN READ OPTIM         | T4_PK |     1 |       |     1   (0)| 00:00:01 |    
------------------------------------------------------------------------------------------------    
                                                                                                    
Predicate Information (identified by operation id):                                                 
---------------------------------------------------                                                 
                                                                                                    
   2 - access("KEY"=42)                                                                             

Statistics
-----------------------------------------------------------
              43  Requests to/from client
              43  SQL*Net roundtrips to/from client
             605  bytes received via SQL*Net from client
           83447  bytes sent via SQL*Net to client
               3  calls to get snapshot scn: kcmgss
               2  calls to kcmgcs
               2  execute count
               1  memopt r hits
               1  memopt r lookups
              44  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
               2  parse count (total)
               1  session cursor cache count
               1  sorts (memory)
            2011  sorts (rows)
              46  user calls

As before, the execution plan on line 20 and 21 shows the intention to use the MemOptimized RowStore (READ OPTIM). But in this case there are no consistent gets. And on line 38 we have a successful hash index lookup (1 memopt r hits). A SQL query without logical I/Os, made possible by the MemOptimized RowStore.

Alternatives

What are your options, when your database does not provide a MemOptimized RowStore? I see the primarily the following alternatives:

  • Heap-organized table
  • Index-organized table
  • Single-table hash cluster

Let’s elaborate them.

1. Heap-Organized Table

CREATE TABLE t1 (
   key    INTEGER            NOT NULL,
   value  VARCHAR2(30 CHAR)  NOT NULL,
   CONSTRAINT t1_pk PRIMARY KEY (key)
)
STORAGE (BUFFER_POOL KEEP);
ALTER INDEX t1_pk STORAGE (BUFFER_POOL KEEP);

This is very similar to table t4. The storage_clauses on line 6 and 7 ensure that the table and index blocks are stored in the KEEP buffer pool.  This will reduce the physical I/Os when querying the table.

Accessing a single row requires 3 consistent gets as shown below.

SET LINESIZE 100
SET AUTOTRACE ON
SELECT * FROM t1 WHERE key = 42;

       KEY VALUE                         
---------- ------------------------------
        42 UKPBW05FQ1                    

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT                                                                                   
----------------------------------------------------------------------------------------------------
Plan hash value: 2347959165                                                                         
                                                                                                    
-------------------------------------------------------------------------------------               
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |               
-------------------------------------------------------------------------------------               
|   0 | SELECT STATEMENT            |       |     1 |    24 |     2   (0)| 00:00:01 |               
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |    24 |     2   (0)| 00:00:01 |               
|*  2 |   INDEX UNIQUE SCAN         | T1_PK |     1 |       |     1   (0)| 00:00:01 |               
-------------------------------------------------------------------------------------               
                                                                                                    
Predicate Information (identified by operation id):                                                 
---------------------------------------------------                                                 
                                                                                                    
   2 - access("KEY"=42)                                                                             

Statistics
-----------------------------------------------------------
              42  Requests to/from client
              42  SQL*Net roundtrips to/from client
               3  buffer is not pinned count
             598  bytes received via SQL*Net from client
           83426  bytes sent via SQL*Net to client
               2  calls to get snapshot scn: kcmgss
               2  calls to kcmgcs
               3  consistent gets
               3  consistent gets examination
               3  consistent gets examination (fastpath)
               3  consistent gets from cache
               1  cursor authentications
               2  execute count
               1  index fetch by key
           24576  logical read bytes from cache
              42  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
               2  parse count (total)
               1  rows fetched via callback
               3  session logical reads
               1  sorts (memory)
            2011  sorts (rows)
               1  table fetch by rowid
              45  user calls

2. Index-Organized Table

CREATE TABLE t2 (
   key    INTEGER            NOT NULL,
   value  VARCHAR2(30 CHAR)  NOT NULL,
   CONSTRAINT t2_pk PRIMARY KEY (key)
)
ORGANIZATION INDEX
STORAGE (BUFFER_POOL KEEP);

An index-organized table stores all its data within the index structure. This reduces the logical I/Os by one when accessing a single row via primary key. We also use the KEEP buffer pool to minimize physical I/Os.

Accessing a single row requires 2 consistent gets as shown below.

SET LINESIZE 100
SET AUTOTRACE ON
SELECT * FROM t2 WHERE key = 42;

       KEY VALUE                         
---------- ------------------------------
        42 UKPBW05FQ1                    

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT                                                                                   
----------------------------------------------------------------------------------------------------
Plan hash value: 2827726509                                                                         
                                                                                                    
---------------------------------------------------------------------------                         
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |                         
---------------------------------------------------------------------------                         
|   0 | SELECT STATEMENT  |       |     1 |    24 |     1   (0)| 00:00:01 |                         
|*  1 |  INDEX UNIQUE SCAN| T2_PK |     1 |    24 |     1   (0)| 00:00:01 |                         
---------------------------------------------------------------------------                         
                                                                                                    
Predicate Information (identified by operation id):                                                 
---------------------------------------------------                                                 
                                                                                                    
   1 - access("KEY"=42)                                                                             

Statistics
-----------------------------------------------------------
              42  Requests to/from client
              42  SQL*Net roundtrips to/from client
               1  buffer is not pinned count
             598  bytes received via SQL*Net from client
           83426  bytes sent via SQL*Net to client
               2  calls to get snapshot scn: kcmgss
               2  calls to kcmgcs
               2  consistent gets
               2  consistent gets examination
               2  consistent gets examination (fastpath)
               2  consistent gets from cache
               1  cursor authentications
               2  execute count
               1  index fetch by key
           16384  logical read bytes from cache
              42  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
               2  parse count (total)
               2  session logical reads
               1  sorts (memory)
            2011  sorts (rows)
              45  user calls

3. Single-Table Hash Cluster

CREATE CLUSTER c3 (key INTEGER) 
   SIZE 256
   SINGLE TABLE HASHKEYS 100000
   STORAGE (BUFFER_POOL KEEP);

CREATE TABLE t3 (
   key    INTEGER            NOT NULL,
   value  VARCHAR2(30 CHAR)  NOT NULL,
   CONSTRAINT t3_pk PRIMARY KEY (key) -- to check uniqueness only
)
CLUSTER c3 (key);

A hash cluster is quite an old Oracle feature. I do not remember when it was introduced. It’s like it has always been around. The best option for primary key based data retrieval, but a bit tricky to size. For sizing a hash cluster two parameters are important:

  • HASHKEYS

The HASHKEYS parameter (see line 3) defines the number of target buckets for the hash function. In this case I chose 100,000. Without hash collisions every key would be stored in an own bucket. But with this dataset there are up to 5 keys which get stored in the same target bucket.

  • SIZE

The SIZE parameter (see line 2) defines the number of bytes initially reserved for a target bucket of a hash function. Since I know that there are up to 5 rows within a bucket and I want a bucket to be stored completely in a single block, I chose a size large enough for 5 rows. This lead to 256 bytes, so that I can store 32 buckets in a single 8K block.

With these parameters a cluster with 3,125 blocks will be created. It is probably a bit more. How much is depending on the extent management configuration of the tablespace. This is optimal for our use case. But it is not optimal for full table scans, since we use 6-7 times more blocks than needed for a heap-organized table.

Accessing a single row in a correctly sized single-table hash cluster requires just 1 consistent get as shown below.

SET LINESIZE 100
SET AUTOTRACE ON
SELECT * FROM t3 WHERE key = 42;

       KEY VALUE                         
---------- ------------------------------
        42 UKPBW05FQ1                    

Explain Plan
-----------------------------------------------------------

PLAN_TABLE_OUTPUT                                                                                   
----------------------------------------------------------------------------------------------------
Plan hash value: 180373899                                                                          
                                                                                                    
--------------------------------------------------------------------------                          
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |                          
--------------------------------------------------------------------------                          
|   0 | SELECT STATEMENT  |      |     1 |    24 |     1   (0)| 00:00:01 |                          
|*  1 |  TABLE ACCESS HASH| T3   |     1 |    24 |     1   (0)| 00:00:01 |                          
--------------------------------------------------------------------------                          
                                                                                                    
Predicate Information (identified by operation id):                                                 
---------------------------------------------------                                                 
                                                                                                    
   1 - access("KEY"=42)                                                                             

Statistics
-----------------------------------------------------------
               1  DB time
              42  Requests to/from client
              42  SQL*Net roundtrips to/from client
               1  buffer is not pinned count
             598  bytes received via SQL*Net from client
           83491  bytes sent via SQL*Net to client
               2  calls to get snapshot scn: kcmgss
               2  calls to kcmgcs
               1  cluster key scan block gets
               1  cluster key scans
               1  consistent gets
               1  consistent gets from cache
               1  consistent gets pin
               1  consistent gets pin (fastpath)
               2  execute count
            8192  logical read bytes from cache
              42  non-idle wait count
               2  opened cursors cumulative
               1  opened cursors current
               2  parse count (total)
               1  session cursor cache hits
               1  session logical reads
               1  sorts (memory)
            2011  sorts (rows)
              45  user calls

If you size the to single-table hash cluster wrong, e.g. by using SIZE 64 HASHKEYS 500 you end up with more than 150 consistent gets to access a single row resulting in bad performance.

Sizing a single-table hash cluster is really the key for best performance. However, for mixed workloads (PK access and other accesses to retrieve many rows) sizing becomes challenging and leads to a compromise. In such scenarios a heap-organized or index-organized table is easier to apply and may even be the better option.

Performance

Now let’s compare these four options using a PL/SQL and a Java program reading the table fully via 100,000 queries. Not a smart way to do it, but it should show the performance impact of the different data structures.

PL/SQL procedure

CREATE OR REPLACE PROCEDURE p (in_table_name VARCHAR2) IS
   l_query VARCHAR2(1000 CHAR);
   l_value VARCHAR2(30 CHAR);
   l_start INTEGER;
   l_end   INTEGER;
BEGIN
   l_start := dbms_utility.get_time();
   l_query := 'SELECT value FROM ' || in_table_name || ' WHERE key = :key';
   FOR i IN 1..100000 LOOP
      EXECUTE IMMEDIATE l_query INTO l_value USING i;
   END LOOP;
   l_end := dbms_utility.get_time();
   dbms_output.put_line('read 100000 rows from ' || in_table_name || ' in ' ||
      to_char((l_end - l_start) / 100) || ' seconds.');
END p;
/

Java program

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import oracle.jdbc.driver.OracleDriver;
public class J {
   private static boolean isRunningInDatabase() {
      return System.getProperty("oracle.jserver.version") != null;
   }
   public static void m(String tableName) throws SQLException {
      Connection conn;
      if (isRunningInDatabase()) {
         conn = new OracleDriver().defaultConnection();
      } else {
         conn = DriverManager.getConnection(
            "jdbc:oracle:thin:@//localhost:1521/odb.docker", "tvdca", "tvdca");
      }
      conn.setAutoCommit(false);
      long start = System.currentTimeMillis();
      String query = "SELECT value FROM " + tableName + " WHERE key = ?";      
      PreparedStatement ps = conn.prepareStatement(query);
      for (long i = 1; i <= 100000; i++) {
         ps.setLong(1, i);
         ResultSet rs = ps.executeQuery();
         while (rs.next()) {
            rs.getString("value");
         }
         rs.close();
      }
      ps.close();
      if (!isRunningInDatabase()) {
         conn.close();
      }
      long end = System.currentTimeMillis();
      System.out.println("read 100000 rows from " + tableName + " in " +
         String.valueOf((double) (end-start)/1000) + " seconds.");      
   }
   public static void main(String[] args) throws SQLException {
      m(args[0]);
   }
}

Both programs are doing the same work. They get a table name as parameter and retrieve every row in the table via primary key access. The PL/SQL procedure runs within the database and the Java program outside of the database. The Java program needs to do 100,000 network round trips. For the PL/SQL program these are just context switches between the PL/SQL and SQL engine. Therefore, the PL/SQL procedure calls are expected to be faster than the Java program executions.

Each program has been called five times for every table. The slowest and the fastest runtimes have been ignored. The average of the remaining three runtimes are used for the following chart.

The results look plausible for “t1 – heap-organized”, “t2 – index-organized” and “t3 – hash cluster”. But the runtimes for “t4 – memoptimized” are strange. For PL/SQL and Java. This requires further analysis.

Analyzing PL/SQL Runtime for “t4 -memoptimized”

Let’s execute the PL/SQL procedure in a fresh session again.

SQL> connect tvdca/tvdca@odb
Connected.
SQL> set serveroutput on
SQL> exec p('t4')
read 100000 rows from t4 in 6.72 seconds.


PL/SQL procedure successfully completed.

SQL> SELECT n.name, s.sid, s.value
  2    FROM v$sesstat s 
  3    JOIN v$statname n
  4      ON n.statistic# = s.statistic#
  5   WHERE n.name in ('consistent gets','memopt r lookups', 'memopt r hits')
  6     AND s.value > 0
  7     AND s.sid = sys_context ('USERENV', 'SID')
  8   ORDER BY s.value desc;

NAME                                                                    SID      VALUE
---------------------------------------------------------------- ---------- ----------
consistent gets                                                         273     300025

Line 21 is interesting. 300,025 consistent gets. These are 300,000 more than expected. And there are no values for the wait events memopt r lookups and memopt r hits. This means Oracle uses a conventional access path instead of the MemOptimized RowStore. It’s the same execution plan as for “t1 – heap-organized” and the execution times are similar as well. As mentioned in the Concept chapter in the beginning, the MemOptimized RowStore cannot be used from PL/SQL.

Analyzing Java Runtime for “t4 – memoptimized”

To analyze the problem I’ve enabled SQL trace for the Java session. I was surprised by the runtime. Significantly faster with SQL trace enabled? The tkprof output revealed the reason. Here is an excerpt:

SQL ID: 03z4487kpgfv3 Plan Hash: 1143490106

SELECT value 
FROM
 t4 WHERE key = :1 


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000      0.65       7.10          0          0          0           0
Fetch   100000      0.95       7.36          0     300000          0      100000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   200001      1.60      14.47          0     300000          0      100000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 152  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  TABLE ACCESS BY INDEX ROWID T4 (cr=3 pr=0 pw=0 time=599 us starts=1 cost=2 size=24 card=1)
         1          1          1   INDEX UNIQUE SCAN T4_PK (cr=2 pr=0 pw=0 time=562 us starts=1 cost=1 size=0 card=1)(object id 87514)

See the highlighted lines 12 and 25. We are back to a conventional access path as soon as we enable SQL trace. The runtime is similar to “t1 – heap-organized” plus some SQL trace overhead. SQL trace is a dead end.

Let’s try flame graphs. Luca Canali’s wrote an excellent blog post about flame graphs for Oracle. I’ve followed Luca’s instructions to produce some flame graphs. The PNGs are shown below. You may open the SVG variant via link in a new browser tab.

The average runtime of “t4 – memoptimized” was 77.33 seconds and the average runtime of “t1 – heap-organized” was 48.21 seconds. That’s a difference of about 30 seconds. How can we find the functions in “t4 – memoptimized” which are contributing the most to this difference?

First, we assume the amount of sampled data is good enough to represent the load pattern. Second, we assume that we can calculate the runtime of a function based on the percentage shown in the flame graph. This allows the findings to be presented as follows:

Functiont1 Percentt4 Percentt1 Timet4 TimeDifference
opitsk99.53%99.37%47.9876.84-28.86

Third, we are looking for functions on a reasonable level in the call stack. In this case it is not helpful to state that the opitsk function is slower in “t4 – memoptimized”. Reasonable means, that all the function that we identify are not sampled in the same stack. This is avoiding double counting.

The following table lists 6 functions that account for more than two thirds of the total runtime of “t4 -memoptimized”. In total, they consume 36.66 seconds more than in “t1 – heap-organized”. In the tab pane above, there are flame graph variants named “marked”, highlighting these functions.

Functiont1 Percentt4 Percentt1 Timet4 TimeDifferenceNotes
Total32.98%67.97%15.9052.56-36.66
opikndf2 (from optisk)17.64%30.22%8.5023.37-14.86
kksumc (from SELECT FETCH:)6.10%19.18%2.9414.83-11.89
kpoxihFetch0.00%4.86%0.003.76-3.76Fetch from memoptimize buffer area, n/a in t1
ksupop6.24%7.40%3.015.72-2.71
ksupucg3.00%4.24%1.453.28-1.83
kpoxihLookup0.00%2.07%0.001.60-1.60Hash index lookup, n/a in t1

Someone with access to the source code of these functions could dig deeper, but I can’t. So I have to stop here. I don’t know how much my unsupported environment contributes to this bad performance. I just can hope it is a lot. If you have access to an Oracle database offering supporting the MemOptimized Rowstore, please share your performance experiences, e.g. with a comment in this blog post. Thank you.

Conclusion

If you access Oracle Databases with #NoPlsql applications, the MemOptimized RowStore might be an interesting feature to improve the performance when querying single rows from single tables via primary key. If you access Oracle Databases with #SmartDB applications, you probably do not need this feature. Therefore it is no problem that the MemOptimize Store does not work from PL/SQL. However, it is disturbing that activating SQL trace or setting STATISTICS_LEVEL = 'ALL' deactivates the MemOptimize RowStore. I hope this will be fixed in a future release.

MemOptimized RowStore in Oracle Database 18c with OCI

$
0
0

On June, 10 2018 I blogged about the MemOptimized RowStore in Oracle Database 18c. If you haven’t read this post, it is a good idea to catch up now. I showed that accessing a memoptimized table t4 via the MemOptimized RowStore was around 60% slower than accessing a heap-organized table t1. I suspected that this disappointing result was related to my unsupported Docker environment. But I was wrong.

The next day, Chris Antognini contacted me, since he planned to talk about this feature at the AOUG Anwenderkonferenz 2018. We exchanged our thoughts and shared our findings. Chris did his tests in the Oracle cloud and could also reproduce my test results. That’s interesting. Even more interesting is, that Chris translated my Java program to C and proved that the MemOptimized RowStore can be fast. That’s cool. But why didn’t it work in Java? It’s the same after all, right? No. The Java program used the JDBC thin driver and the C program OCI.

In this blog post I will show that OCI is a prerequisite for getting good performance out of the MemOptimized RowStore.

The Program

I use the Java program from my previous post. I only added 3 parameters. The JDBC URL, the username and the password. Here’s the program.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import oracle.jdbc.driver.OracleDriver;
public class J {
   private static boolean isRunningInDatabase() {
      return System.getProperty("oracle.jserver.version") != null;
   }
   public static void m(String tableName, String url, String userName, String password) throws SQLException {
      Connection conn;
      if (isRunningInDatabase()) {
         conn = new OracleDriver().defaultConnection();
      } else {
         conn = DriverManager.getConnection(
           url, userName, password);
      }
      conn.setAutoCommit(false);
      long start = System.currentTimeMillis();
      String query = "SELECT value FROM " + tableName + " WHERE key = ?";      
      PreparedStatement ps = conn.prepareStatement(query);
      for (long i = 1; i <= 100000; i++) {
         ps.setLong(1, i);
         ResultSet rs = ps.executeQuery();
         while (rs.next()) {
            rs.getString("value");
         }
         rs.close();
      }
      ps.close();
      if (!isRunningInDatabase()) {
         conn.close();
      }
      long end = System.currentTimeMillis();
      System.out.println("read 100000 rows from " + tableName + " in " +
         String.valueOf((double) (end-start)/1000) + " seconds via " + url + ".");      
   }
   public static void main(String[] args) throws SQLException {
      m(args[0], args[1], args[2], args[3]);
   }
}

I copied this program to my Docker container into the directory $ORACLE_HOME/jdbc/lib and compiled it with the following script:

export rt CLASSPATH=.:./ojdbc8.jar
javac J.java

The Test Script

I’ve run the tests for my previous post from the Eclipse IDE because it was convenient for me to set a break point on line 20 to identify the Oracle process for perf. Now, I do not need to produce flame graphs. Running the test script on the server directly will also reduce the network overhead, especially when running without Oracle Net.

Here’s the test script:

#/bin/bash

run(){
    TABLE=${1}
    URL=${2}
    for ((i=1;i<=5;i++));
    do
        echo -n "run #${i}: "
        java J ${TABLE} ${URL} ${USERNAME} ${PASSWORD}
    done
    echo ""
} 

export CLASSPATH=.:./ojdbc8.jar
export USERNAME=tvdca
export PASSWORD=tvdca

# Thin driver
run t1 "jdbc:oracle:thin:@//localhost:1521/odb.docker"
run t2 "jdbc:oracle:thin:@//localhost:1521/odb.docker"
run t3 "jdbc:oracle:thin:@//localhost:1521/odb.docker"
run t4 "jdbc:oracle:thin:@//localhost:1521/odb.docker"

# OCI driver with Oracle*Net
run t1 "jdbc:oracle:oci:@odb"
run t2 "jdbc:oracle:oci:@odb"
run t3 "jdbc:oracle:oci:@odb"
run t4 "jdbc:oracle:oci:@odb"

# OCI driver without Oracle*Net
run t1 "jdbc:oracle:oci:@"
run t2 "jdbc:oracle:oci:@"
run t3 "jdbc:oracle:oci:@"
run t4 "jdbc:oracle:oci:@"

The Result

Here’s the output of the run.sh call:

[oracle@odb180 lib]$ ./run.sh
run #1: read 100000 rows from t1 in 11.918 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #2: read 100000 rows from t1 in 11.625 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #3: read 100000 rows from t1 in 11.662 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #4: read 100000 rows from t1 in 11.574 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #5: read 100000 rows from t1 in 11.729 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.

run #1: read 100000 rows from t2 in 11.786 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #2: read 100000 rows from t2 in 12.071 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #3: read 100000 rows from t2 in 12.621 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #4: read 100000 rows from t2 in 11.913 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #5: read 100000 rows from t2 in 11.972 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.

run #1: read 100000 rows from t3 in 11.397 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #2: read 100000 rows from t3 in 11.429 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #3: read 100000 rows from t3 in 11.308 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #4: read 100000 rows from t3 in 11.793 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #5: read 100000 rows from t3 in 11.903 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.

run #1: read 100000 rows from t4 in 19.789 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #2: read 100000 rows from t4 in 19.461 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #3: read 100000 rows from t4 in 19.181 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #4: read 100000 rows from t4 in 19.211 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.
run #5: read 100000 rows from t4 in 19.242 seconds via jdbc:oracle:thin:@//localhost:1521/odb.docker.

run #1: read 100000 rows from t1 in 13.145 seconds via jdbc:oracle:oci:@odb.
run #2: read 100000 rows from t1 in 12.698 seconds via jdbc:oracle:oci:@odb.
run #3: read 100000 rows from t1 in 13.14 seconds via jdbc:oracle:oci:@odb.
run #4: read 100000 rows from t1 in 12.842 seconds via jdbc:oracle:oci:@odb.
run #5: read 100000 rows from t1 in 12.978 seconds via jdbc:oracle:oci:@odb.

run #1: read 100000 rows from t2 in 13.049 seconds via jdbc:oracle:oci:@odb.
run #2: read 100000 rows from t2 in 12.581 seconds via jdbc:oracle:oci:@odb.
run #3: read 100000 rows from t2 in 12.44 seconds via jdbc:oracle:oci:@odb.
run #4: read 100000 rows from t2 in 12.787 seconds via jdbc:oracle:oci:@odb.
run #5: read 100000 rows from t2 in 12.727 seconds via jdbc:oracle:oci:@odb.

run #1: read 100000 rows from t3 in 12.402 seconds via jdbc:oracle:oci:@odb.
run #2: read 100000 rows from t3 in 12.479 seconds via jdbc:oracle:oci:@odb.
run #3: read 100000 rows from t3 in 12.483 seconds via jdbc:oracle:oci:@odb.
run #4: read 100000 rows from t3 in 12.346 seconds via jdbc:oracle:oci:@odb.
run #5: read 100000 rows from t3 in 12.528 seconds via jdbc:oracle:oci:@odb.

run #1: read 100000 rows from t4 in 11.452 seconds via jdbc:oracle:oci:@odb.
run #2: read 100000 rows from t4 in 10.945 seconds via jdbc:oracle:oci:@odb.
run #3: read 100000 rows from t4 in 11.597 seconds via jdbc:oracle:oci:@odb.
run #4: read 100000 rows from t4 in 11.295 seconds via jdbc:oracle:oci:@odb.
run #5: read 100000 rows from t4 in 11.746 seconds via jdbc:oracle:oci:@odb.

run #1: read 100000 rows from t1 in 10.508 seconds via jdbc:oracle:oci:@.
run #2: read 100000 rows from t1 in 10.662 seconds via jdbc:oracle:oci:@.
run #3: read 100000 rows from t1 in 10.105 seconds via jdbc:oracle:oci:@.
run #4: read 100000 rows from t1 in 10.44 seconds via jdbc:oracle:oci:@.
run #5: read 100000 rows from t1 in 10.415 seconds via jdbc:oracle:oci:@.

run #1: read 100000 rows from t2 in 10.29 seconds via jdbc:oracle:oci:@.
run #2: read 100000 rows from t2 in 10.15 seconds via jdbc:oracle:oci:@.
run #3: read 100000 rows from t2 in 10.266 seconds via jdbc:oracle:oci:@.
run #4: read 100000 rows from t2 in 10.351 seconds via jdbc:oracle:oci:@.
run #5: read 100000 rows from t2 in 10.259 seconds via jdbc:oracle:oci:@.

run #1: read 100000 rows from t3 in 9.95 seconds via jdbc:oracle:oci:@.
run #2: read 100000 rows from t3 in 9.756 seconds via jdbc:oracle:oci:@.
run #3: read 100000 rows from t3 in 10.325 seconds via jdbc:oracle:oci:@.
run #4: read 100000 rows from t3 in 9.517 seconds via jdbc:oracle:oci:@.
run #5: read 100000 rows from t3 in 9.951 seconds via jdbc:oracle:oci:@.

run #1: read 100000 rows from t4 in 9.182 seconds via jdbc:oracle:oci:@.
run #2: read 100000 rows from t4 in 8.996 seconds via jdbc:oracle:oci:@.
run #3: read 100000 rows from t4 in 8.977 seconds via jdbc:oracle:oci:@.
run #4: read 100000 rows from t4 in 9.024 seconds via jdbc:oracle:oci:@.
run #5: read 100000 rows from t4 in 9.082 seconds via jdbc:oracle:oci:@.

As in my previous post, I ignore the slowest and fastest run and take the average of the remaining three results per test variant to produce a chart.

We see that using the JDBC OCI driver delivers the fastest results when skipping Oracle Net. Furthermore, the MemOptimized RowStore delivers the fastest results via OCI. Accessing the MemOptimized RowStore via the JDBC thin driver leads by to a bad performance. This looks like a bug.

Conclusion

To get benefit from the MemOptimized RowStore you have to access the database via OCI.

Is Your Application SmartDB?

$
0
0

I had recently a few discussions regarding the Smart Database Paradigm (SmartDB) with long-standing customers, new customers, partners, competitors and colleagues. Some people think that using APEX and PL/SQL in their database application is SmartDB. But it is not that simple. Bryn Llewelyn defined the term “Smart Database Paradigm” (SmartDB) in his talk Guarding your data behind a hard shell PL/SQL API. Based on his definition a SmartDB application must have the following five properties:

  1. The connect user does not own database objects
  2. The connect user can execute PL/SQL API units only
  3. PL/SQL API units handle transactions
  4. SQL statements are written by human hand
  5. SQL statements exploit the full power of set-based SQL

These five properties are not a set of recommendations. They are the bare minimum. Either your application has these properties or not. It’s binary. There is (almost) no room for interpretation. Here’s an excerpt of a longer Twitter thread, making my and especially Bryn Llewelyn’s view a bit clearer.

In this blog post I show how to check the compliance with the first three SmartDB properties by querying the Oracle data dictionary. The remaining two SmartDB properties have to be evaluated manually using reviews. The goal is to show that some of these properties are easily not followed (for good reasons) and that makes your database centric application something else than SmartDB (but not necessarily a curate’s egg).

In How to Prove That Your SmartDB App Is Secure I’ve crafted a good, a bad and an ugly demo application. I installed these applications using this script in my Oracle Database 18c instance.

The anonymous PL/SQL block and the SQL queries in this blog post require DBA privileges. The required minimum database version is mentioned in the title of the code bock, e.g. (>=9.2), (>=12.1) or (>=12.2).

Now let’s look at the five SmartDB properties.

1. The connect user does not own database objects

The connect user is used by application components outside of the database to interact with the database. It is configured for example in the connection pool of the middle tier application.

The connect user must access only the APIs of the underlying database applications and therefore does not need own database objects.

Checking the compliance of this property is simple.

SELECT username
  FROM dba_users
 WHERE username NOT IN (
         SELECT owner
           FROM dba_objects
       )
 ORDER BY username;

USERNAME                
------------------------
ANONYMOUS
APEX_INSTANCE_ADMIN_USER
APEX_PUBLIC_USER
APEX_REST_PUBLIC_USER
DIP
GGSYS
GSMCATUSER
GSMUSER
MDDATA
ORDS_PUBLIC_USER
SYS$UMF
SYSBACKUP
SYSDG
SYSKM
SYSRAC
THE_BAD_USER
THE_GOOD_USER
THE_UGLY_USER
XS$NULL

19 rows selected.

If you are using a connect user that is not listed as result, then your application is not SmartDB.

The result contains also users that do not have the CREATE SESSION privilege and therefore cannot be used as connect users. The queries to check SmartDB properties 2 and 3 will address this issue.

2. The connect user can execute PL/SQL API units only

Database views and tables are guarded behind a hard shell PL/SQL API. Only the following database objects may be part of the API:

  • Packages
  • Types
  • Functions
  • Procedures

So we just have to check if the connect user has assess to objects with the predicate object_type NOT IN ('PACKAGE', 'TYPE', 'FUNCTION', 'PROCEDURE'), right? Yes, but the result would not be helpful. Why? Because every user with just the CREATE SESSION privilege has access to some thousand tables and views via the PUBLIC role. For example DUAL, ALL_VIEWS or NLS_SESSION_PARAMETERS. Strictly speaking it is not possible to create an Oracle user that can execute PL/SQL units only. Some might argue that this alone makes SmartDB applications a fantasy. However, I’m not in that camp. I think we just have to focus on our own objects and exclude all Oracle maintained users along with some common utility users from the analysis.

Furthermore the connect user should only have the CONNECT role (no more and no less). This way we ensure/know that no access is granted to internal objects via ANY privileges.

For this check we can reuse the query for rule 1 from my previous blog post How to Prove That Your SmartDB App Is Secure.

Query to check SmartDB property 2

WITH
   -- roles as recursive structure
   role_base AS (
      -- roles without parent (=roots)
      SELECT r.role, NULL AS parent_role
        FROM dba_roles r
       WHERE r.role NOT IN (
                SELECT p.granted_role
                  FROM role_role_privs p
             )
      UNION ALL
      -- roles with parent (=children)
      SELECT granted_role AS role, role AS parent_role
        FROM role_role_privs
   ),
   -- roles tree, calculate role_path for every hierarchy level
   role_tree AS (
      SELECT role,
             parent_role,
             sys_connect_by_path(ROLE, '/') AS role_path
        FROM role_base
      CONNECT BY PRIOR role = parent_role
   ),
   -- roles graph, child added to all ancestors including self
   -- allows simple join to parent_role to find all descendants
   role_graph AS (
      SELECT DISTINCT
             role,
             regexp_substr(role_path, '(/)(\w+)', 1, 1, 'i', 2) AS parent_role
        FROM role_tree
   ),
   -- application users in scope of the analysis
   -- other users are treated as if they were not installed
   app_user AS (
      SELECT username
        FROM dba_users
       WHERE oracle_maintained = 'N' -- SYS, SYSTEM, SYSAUX, ...
         AND username NOT IN ('FTLDB', 'PLSCOPE', 'UT3')
         -- OR username LIKE 'APEX%' -- APEX
         -- OR username LIKE 'ORD%' -- ORDS
   ),
   -- user system privileges
   sys_priv AS (
      -- system privileges granted directly to users
      SELECT u.username, p.privilege
        FROM dba_sys_privs p
        JOIN app_user u ON u.username = p.grantee
      UNION
      -- system privileges granted directly to PUBLIC
      SELECT u.username, p.privilege
        FROM dba_sys_privs p
       CROSS JOIN app_user u
       WHERE p.grantee = 'PUBLIC'
         AND p.privilege NOT IN (
                SELECT r.role
                  FROM dba_roles r
             )
      UNION
      -- system privileges granted to users via roles
      SELECT u.username, p.privilege
        FROM dba_role_privs r
        JOIN app_user u ON u.username = r.grantee
        JOIN role_graph g ON g.parent_role = r.granted_role
        JOIN dba_sys_privs p ON p.grantee = g.role
      UNION
      -- system privileges granted to PUBLIC via roles
      SELECT u.username, p.privilege
        FROM dba_role_privs r
        JOIN role_graph g ON g.parent_role = r.granted_role
        JOIN dba_sys_privs p ON p.grantee = g.role
        CROSS JOIN app_user u
       WHERE r.grantee = 'PUBLIC'
   ),
   -- user object privileges
   obj_priv AS (
      -- objects granted directly to users
      SELECT u.username, p.owner, p.type AS object_type, p.table_name AS object_name
        FROM dba_tab_privs p
        JOIN app_user u ON u.username = p.grantee
       WHERE p.owner IN (
                SELECT u2.username
                  FROM app_user u2
             )
      UNION
      -- objects granted to users via roles
      SELECT u.username, p.owner, p.type AS object_type, p.table_name AS object_name
        FROM dba_role_privs r
        JOIN app_user u ON u.username = r.grantee
        JOIN role_graph g ON g.parent_role = r.granted_role
        JOIN dba_tab_privs p ON p.grantee = g.role
       WHERE p.owner IN (
                SELECT u2.username
                  FROM app_user u2
             )
      -- objects granted to PUBLIC
      UNION
      SELECT u.username, p.owner, p.type AS object_type, p.table_name AS object_name
        FROM dba_tab_privs p
       CROSS JOIN app_user u
       WHERE p.owner IN (
                SELECT u2.username
                  FROM app_user u2
             )
         AND p.grantee = 'PUBLIC'
   ),
   -- issues if user is configured in the connection pool of a middle tier
   issues AS (
      -- privileges not part of CONNECT role
      SELECT username,
             'SYS' AS owner,
             'PRIVILEGE' AS object_type,
             privilege AS object_name,
             'Privilege is not part of the CONNECT role' AS issue
        FROM sys_priv
       WHERE privilege NOT IN ('CREATE SESSION', 'SET CONTAINER')
      -- access to non PL/SQL units
      UNION ALL
      SELECT username,
             owner,
             object_type,
             object_name,
             'Access to non-PL/SQL unit'
        FROM obj_priv
       WHERE object_type NOT IN ('PACKAGE', 'TYPE', 'FUNCTION', 'PROCEDURE')
      -- own objects
      UNION ALL
      SELECT u.username,
             o.owner,
             o.object_type,
             o.object_name,
             'Connect user must not own any object'
        FROM app_user u
        JOIN dba_objects o ON o.owner = u.username
      -- missing CREATE SESSION privilege
      UNION ALL
      SELECT u.username,
             'SYS',
             'PRIVILEGE',
             'CREATE SESSION',
             'Privilege is missing, but required'
        FROM app_user u
       WHERE u.username NOT IN (
                SELECT username
                  FROM sys_priv
                 WHERE privilege = 'CREATE SESSION' 
             )
   ),
   -- aggregate issues per user
   issue_aggr AS (
      SELECT u.username, COUNT(i.username) issue_count
        FROM app_user u
        LEFT JOIN issues i ON i.username = u.username
       GROUP BY u.username
   ),
   -- user summary (calculate is_smartdb_property_2_met)
   summary AS (
      SELECT username,
             CASE
                WHEN issue_count = 0 THEN
                   'YES'
                ELSE
                   'NO'
             END AS is_smartdb_property_2_met,
             issue_count
        FROM issue_aggr
       ORDER BY is_smartdb_property_2_met DESC, username
   )
-- main
SELECT * 
  FROM summary
 WHERE issue_count = 0;

USERNAME                 IS_SMARTDB_PROPERTY_2_MET ISSUE_COUNT
------------------------ ------------------------- -----------
APEX_REST_PUBLIC_USER    YES                                 0
THE_BAD_USER             YES                                 0
THE_GOOD_USER            YES                                 0

If you are using a connect user that is not listed as result, then your application is not SmartDB.

In this case the APEX_REST_PUBLIC_USER is a false positive. The named subquery  app_user excludes the APEX_180100 user which grants various views and sequences to PUBLIC. Hence APEX 18.1 is not a SmartDB application.

3. PL/SQL API units handle transactions

A SmartDB application holds the complete business logic in the database. A PL/SQL API call handles a transaction completely. The API must not contain units for partial transaction work. Such units may exist, but must not be part of the PL/SQL API exposed to the connect user.

For write operations a COMMIT is called on success and a ROLLBACK is called on failure at the end of the operation.

For read operations the PL/SQL API is responsible for the read consistency.

Distributed transactions are supported via database links only. Other data sources cannot participate in the same database transaction. If this is a mandatory requirement, then SmartDB is the wrong approach. However, Oracle AQ can be a good alternative to propagate data consistently in upstream or downstream transactions.

To check if an application has this SmartDB property, we have to do something like this:

  • Find all PL/SQL API units (as we’ve done it for the SmartDB property 2).
  • Produce a call tree for PL/SQL API units. On object level this could be achieved by querying DBA_DEPENDENCIES. For a more accurate result on sub-object level PL/Scope could be used by querying DBA_IDENTIFIERS .
  • Find INSERT, UPDATEDELETE, MERGE, COMMIT and ROLLBACK statements in PL/SQL units. Static statements can be found via PL/Scope in the DBA_STATEMENTS view. But executions in dynamic statements are a challenge, since the DML may be stored outside of the PL/SQL unit (e.g. in tables). It’s virtually impossible to get a complete result using static code analysis.
  • Bring these results together and check if DML statements are followed by a transaction control statement. This is another challenge. Without a parser (and some semantic analysis) it is not possible to find out if a statement is really executed (e.g. PL/Scope does not provide information about control structures).

For this blog post we use a naïve static code analysis approach. We do the analysis on object level and consider static SQL statements only. Furthermore we assume that DML statements (INSERT, UPDATEDELETE, MERGE) and transaction control statements (COMMITROLLBACK) found in the call hierarchy are all executed and the transaction control statement at the very end.

As long as the transaction control statements are not executed as dynamic SQL the result should be good enough. This means if the query produces no result for an application, then this is for sure not a SmartDB application, but if a result is produced, then this does not guarantee that the application is really following the rules and issuing a COMMIT or a ROLLBACK at the end of a write transaction.

Compile all application users with PL/Scope

SET SERVEROUTPUT ON SIZE UNLIMITED
DECLARE
   PROCEDURE exec_sql (in_sql_stmt IN VARCHAR2) IS
   BEGIN
      dbms_output.put_line('executing: ' || in_sql_stmt);
      EXECUTE IMMEDIATE in_sql_stmt;
   END exec_sql;
   --
   PROCEDURE enable_plscope IS
   BEGIN
      exec_sql(q'[ALTER SESSION SET plscope_settings='IDENTIFIERS:ALL, STATEMENTS:ALL']');
   END enable_plscope;
   --
   PROCEDURE compile_private_synonyms(in_user IN VARCHAR2) IS
   BEGIN
      <<synonyms>>
      FOR r IN (
         SELECT synonym_name
           FROM dba_synonyms
          WHERE owner = in_user
      ) LOOP
         exec_sql('ALTER SYNONYM "' || in_user || '"."' || r.synonym_name || '" COMPILE');
      END LOOP synonyms;
   END compile_private_synonyms;
   --
   PROCEDURE compile_public_synonyms(in_user IN VARCHAR2) IS
   BEGIN
      FOR r IN (
         SELECT synonym_name
           FROM dba_synonyms
          WHERE owner = 'PUBLIC'
            AND table_owner = in_user
      ) LOOP
         exec_sql('ALTER PUBLIC SYNONYM "' || r.synonym_name || '" COMPILE');
      END LOOP public_synonyms;
   END compile_public_synonyms;
   --
   PROCEDURE compile_types(in_user IN VARCHAR2) IS
      e_has_table_deps EXCEPTION;
      e_is_not_udt     EXCEPTION;
      e_compile_error  EXCEPTION;
      PRAGMA exception_init(e_has_table_deps, -2311);
      PRAGMA exception_init(e_is_not_udt, -22307);
      PRAGMA exception_init(e_compile_error, -24344);
   BEGIN
      <<types>>
      FOR r IN (
         SELECT o.object_type, o.object_name, count(d.name) AS priority 
           FROM dba_objects o
           LEFT JOIN dba_dependencies d
             ON d.owner = o.owner
                AND d.type = o.object_type
                AND d.name = o.object_name
          WHERE o.owner = in_user
            AND o.object_type in ('TYPE', 'TYPE BODY')
          GROUP BY o.object_type, o.object_name
          ORDER BY priority
      ) LOOP
         <<compile_type>>
         BEGIN
             IF r.object_type = 'TYPE' THEN
                exec_sql('ALTER TYPE "' || in_user || '"."' || r.object_name || '" COMPILE');
             ELSE
                exec_sql('ALTER TYPE "' || in_user || '"."' || r.object_name || '" COMPILE BODY');
             END IF;
         EXCEPTION
            WHEN e_has_table_deps OR e_is_not_udt OR e_compile_error THEN
               NULL;
         END compile_type;
      END LOOP types;
   END compile_types;
   --
   PROCEDURE compile_schema(in_user IN VARCHAR2) IS
   BEGIN
      -- synonyms and types are not covered by dbms_utility.compile_schema
      compile_private_synonyms(in_user);
      compile_public_synonyms(in_user);
      compile_types(in_user);
      dbms_utility.compile_schema(
         schema         => in_user,
         compile_all    => TRUE,
         reuse_settings => FALSE
      );
   END compile_schema;
BEGIN
   enable_plscope;
   <<app_user>>
   FOR r IN (
      SELECT username
        FROM dba_users
       WHERE oracle_maintained = 'N' 
         AND username NOT IN ('FTLDB', 'PLSCOPE', 'UT3')      
   ) LOOP
      compile_schema(r.username);
   END LOOP app_user;
END;
/

Query to check SmartDB property 3

WITH 
   -- calculate object dependencies recursively
   -- using PL/SQL to handle cycles (expected on object level)
   -- SQL variant using NOCYCLE did not work (runs forever)
   FUNCTION get_dep (
      in_xml IN XMLTYPE
   ) RETURN XMLTYPE IS
      l_deps     sys.ora_mining_varchar2_nt;
      l_result   XMLTYPE := XMLTYPE('<xml/>');
      l_element  XMLTYPE;
      --
      PROCEDURE add_child(
         io_deps  IN OUT sys.ora_mining_varchar2_nt,
         in_owner           IN VARCHAR2,
         in_type            IN VARCHAR2,
         in_name            IN VARCHAR2,
         in_has_dml         IN INTEGER,
         in_has_transaction IN INTEGER
      ) IS
      BEGIN
         io_deps.extend;
         io_deps(io_deps.count) := in_owner || '.' || in_type || '.' || in_name 
            || '.' || in_has_dml || '.' || in_has_transaction;         
      END add_child;
      --
      FUNCTION exists_child(
         in_deps            IN sys.ora_mining_varchar2_nt,
         in_owner           IN VARCHAR2,
         in_type            IN VARCHAR2,
         in_name            IN VARCHAR2,
         in_has_dml         IN INTEGER,
         in_has_transaction IN INTEGER
      ) RETURN BOOLEAN is
         l_found INTEGER;
      BEGIN
         SELECT COUNT(*)
           INTO l_found
           FROM table(in_deps)
          WHERE column_value = in_owner || '.' || in_type || '.' || in_name 
                   || '.' || in_has_dml || '.' || in_has_transaction
            AND rownum = 1;
         RETURN l_found > 0;
      END exists_child;
      --
      PROCEDURE add_children(
         io_deps  IN OUT sys.ora_mining_varchar2_nt,
         in_xml   IN     XMLTYPE,
         in_owner IN     VARCHAR2,
         in_type  IN     VARCHAR2,
         in_name  IN     VARCHAR2
      ) IS
      BEGIN
         FOR r IN (
            SELECT owner, type, name, has_dml, has_transaction
              FROM XMLTABLE(
                      'xml/row/value/dependency[../../key/owner=$owner and ../../key/type=$type and ../../key/name=$name]'
                      PASSING in_xml, in_owner AS "owner", in_type AS "type", in_name AS "name"
                      COLUMNS owner           VARCHAR2(128) PATH 'referenced_owner',
                              type            VARCHAR2(128) PATH 'referenced_type',
                              name            VARCHAR2(128) PATH 'referenced_name',
                              has_dml         INTEGER       PATH 'referenced_has_dml',
                              has_transaction INTEGER       PATH 'referenced_has_transaction'
                   )
         ) LOOP
            IF NOT exists_child(io_deps, r.owner, r.type, r.name, r.has_dml, r.has_transaction) THEN
               add_child(io_deps, r.owner, r.type, r.name, r.has_dml, r.has_transaction);
               add_children(io_deps, in_xml, r.owner, r.type, r.name);
            END IF;
         END LOOP;
      END add_children;
      ---
      FUNCTION get_fragment(
         in_deps  IN sys.ora_mining_varchar2_nt,
         in_owner IN VARCHAR2,
         in_type  IN VARCHAR2,
         in_name  IN VARCHAR2
      ) RETURN XMLTYPE IS
         l_xml XMLTYPE;
      BEGIN
         SELECT XMLELEMENT("xml",
                   XMLAGG(
                      XMLELEMENT("row",
                         XMLELEMENT("owner", in_owner),
                         XMLELEMENT("type", in_type),
                         XMLELEMENT("name", in_name),
                         XMLELEMENT("referenced_owner", regexp_substr(column_value, '[^\.]+', 1, 1)),
                         XMLELEMENT("referenced_type", regexp_substr(column_value, '[^\.]+', 1, 2)),
                         XMLELEMENT("referenced_name", regexp_substr(column_value, '[^\.]+', 1, 3)),
                         XMLELEMENT("referenced_has_dml", regexp_substr(column_value, '[^\.]+', 1, 4)),
                         XMLELEMENT("referenced_has_transaction", regexp_substr(column_value, '[^\.]+', 1, 5))
                      )
                   )
                )   
           INTO l_xml
           FROM table(in_deps);
          RETURN l_xml;
      END get_fragment;
      ---
      PROCEDURE add_to_result(
         io_result    IN OUT XMLTYPE,
         in_fragment  IN     XMLTYPE
      ) IS
      BEGIN
         SELECT xmlquery('
                   copy $i := $p1 modify
                   (
                      for $j in $i/xml 
                      return insert node $p2 into $j
                   )
                   return $i'
                   PASSING io_result AS "p1", in_fragment.extract('/xml/row') AS "p2"
                   RETURNING CONTENT
                )
           INTO io_result
           FROM dual;  
      END add_to_result;
   BEGIN
      FOR r IN (
         SELECT owner, type, name
           FROM XMLTABLE (
                   '/xml/row/key'
                   PASSING in_xml
                   COLUMNS owner VARCHAR2(128) PATH 'owner',
                           type  VARCHAR2(128) PATH 'type',
                           name  VARCHAR2(128) PATH 'name'                           
                )
      ) 
      LOOP
         l_deps := sys.ora_mining_varchar2_nt();
         add_children(l_deps, in_xml, r.owner, r.type, r.name);
         add_to_result(l_result, get_fragment(l_deps, r.owner, r.type, r.name));
      END LOOP;
      RETURN l_result;
   END get_dep;
   -- application users in scope of the analysis
   -- other users are treated as if they were not installed
   app_user AS (
      SELECT username
        FROM dba_users
       WHERE oracle_maintained = 'N' -- SYS, SYSTEM, SYSAUX, ...
         AND username NOT IN ('FTLDB', 'PLSCOPE', 'UT3')
   ),
   -- materialize relevant PL/Scope identifiers to avoid very bad execution plans
   identifiers AS (
      SELECT --+ materialize
             owner,
             object_type, 
             object_name
        FROM dba_identifiers i
       WHERE usage_context_id = 0
         AND object_type IN ('PACKAGE BODY', 'TYPE BODY', 'FUNCTION', 'PROCEDURE', 'TRIGGER')
   ),
   -- PL/SQL objects without PL/Scope metadata
   missing_plscope_obj AS (
      SELECT o.owner, o.object_type, o.object_name
        FROM dba_objects o
        JOIN app_user u ON u.username = o.owner
        LEFT JOIN identifiers i
          ON i.owner = o.owner
             AND i.object_type = o.object_type
             AND i.object_name = o.object_name
       WHERE o.object_type IN ('PACKAGE BODY', 'TYPE BODY', 'FUNCTION', 'PROCEDURE', 'TRIGGER')
         AND i.object_name IS NULL
   ),   
   -- PL/SQL bodies extended by has_dml and has_transaction colums using PL/Scope 
   plscope_obj AS (
      SELECT s.owner, s.object_type, s.object_name,
             MAX (
                CASE
                   WHEN s.type IN ('INSERT', 'UPDATE', 'DELETE', 'MERGE') THEN
                      1
                   ELSE
                      0
                END
             ) AS has_dml,
             MAX (
                CASE
                   WHEN s.type IN ('COMMIT', 'ROLLBACK') THEN
                      1
                   ELSE
                      0
                END
             ) AS has_transaction
        FROM dba_statements s
        JOIN app_user u ON u.username = s.owner
       WHERE s.object_type IN ('PACKAGE BODY', 'TYPE BODY', 'FUNCTION', 'PROCEDURE', 'TRIGGER')
       GROUP BY s.owner, s.object_type, s.object_name
   ),
   -- dba_dependencies reduced to a PL/SQL bodies
   dep_base AS (
      SELECT owner,
             type, 
             name, 
             referenced_owner,
             CASE referenced_type
                WHEN 'PACKAGE' THEN 'PACKAGE BODY'
                WHEN 'TYPE' THEN 'TYPE BODY'
                ELSE referenced_type
             END AS referenced_type,
             referenced_name
        FROM dba_dependencies d
       WHERE referenced_type IN ('PACKAGE', 'PACKAGE BODY', 'TYPE', 'TYPE BODY', 'FUNCTION', 'PROCEDURE', 'SYNONYM')
         and (owner = 'PUBLIC' OR owner IN (SELECT username FROM app_user))
         and (referenced_owner = 'PUBLIC' OR referenced_owner IN (SELECT username FROM app_user))
   ), 
   -- extend dependencies by columns has_dml and has_transaction
   dep AS (
      select d.owner,
             d.type,
             d.name, 
             d.referenced_owner, 
             d.referenced_type, 
             d.referenced_name, 
             nvl(p.has_dml, 0) AS referenced_has_dml,
             nvl(p.has_transaction, 0) AS referenced_has_transaction
        FROM dep_base d
        LEFT JOIN plscope_obj p
          ON p.owner = d.referenced_owner
             AND p.object_type = d.referenced_type
             AND p.object_name = d.referenced_name
   ),
   -- XML because JSON values are still restricted to 4000/32767 bytes
   -- see Bug 27199654 : ORA-40459 WHEN GENERATING JSON DATA
   xml_dep AS (
      SELECT XMLELEMENT("xml",
                XMLAGG(
                   XMLELEMENT("row",
                      XMLELEMENT("key",
                         XMLELEMENT("owner", d.owner),
                         XMLELEMENT("type", d.type),
                         XMLELEMENT("name", d.name)
                      ),
                      XMLELEMENT("value",
                         XMLAGG(
                            XMLELEMENT("dependency",
                               XMLELEMENT("referenced_owner", d.referenced_owner),
                               XMLELEMENT("referenced_type", d.referenced_type),
                               XMLELEMENT("referenced_name", d.referenced_name),
                               XMLELEMENT("referenced_has_dml", d.referenced_has_dml),
                               XMLELEMENT("referenced_has_transaction", d.referenced_has_transaction)
                            )
                         )
                      )
                   )
                )
             ) AS xmldoc
        FROM dep d
        JOIN dba_objects o
          ON d.owner = o.owner
         AND d.type = o.object_type
         AND d.name = o.object_name
       WHERE o.owner IN (SELECT username FROM app_user)
         AND o.object_type IN ('PACKAGE BODY', 'TYPE BODY', 'FUNCTION', 'PROCEDURE')
         AND (d.owner, replace(d.type, ' BODY'), d.name) IN (
                SELECT owner, type, table_name 
                 FROM dba_tab_privs
             )
       GROUP BY d.owner, d.type, d.name               
   ),
   -- get the object dependencies via PL/SQL function
   -- passing data as XML because the PL/SQL function dont't have access to named subqueries
   dep_hier AS (
      SELECT owner, type, name, referenced_owner, referenced_type, referenced_name, 
             referenced_has_dml, referenced_has_transaction
        FROM XMLTABLE(
               '/xml/row'
               PASSING get_dep((SELECT xmldoc from xml_dep))
                   COLUMNS owner                      VARCHAR2(128) PATH 'owner',
                           type                       VARCHAR2(128) PATH 'type',
                           name                       VARCHAR2(128) PATH 'name',
                           referenced_owner           VARCHAR2(128) PATH 'referenced_owner',
                           referenced_type            VARCHAR2(128) PATH 'referenced_type',
                           referenced_name            VARCHAR2(128) PATH 'referenced_name',
                           referenced_has_dml         INTEGER       PATH 'referenced_has_dml',
                           referenced_has_transaction INTEGER       PATH 'referenced_has_transaction'
             )
   ),
   -- aggregate columns has_dml and has_transaction per root PL/SQL body
   app_plsql AS (
      SELECT owner, type AS object_type, name AS object_name,
             MAX(referenced_has_dml) AS has_dml,
             MAX(referenced_has_transaction) AS has_transaction
        FROM dep_hier
       GROUP by owner, type, name
   ),
   -- roles as recursive structure
   role_base AS (
      -- roles without parent (=roots)
      SELECT r.role, NULL AS parent_role
        FROM dba_roles r
       WHERE r.role NOT IN (
                SELECT p.granted_role
                  FROM role_role_privs p
             )
      UNION ALL
      -- roles with parent (=children)
      SELECT granted_role AS role, role AS parent_role
        FROM role_role_privs
   ),
   -- roles tree, calculate role_path for every hierarchy level
   role_tree AS (
      SELECT role,
             parent_role,
             sys_connect_by_path(ROLE, '/') AS role_path
        FROM role_base
      CONNECT BY PRIOR role = parent_role
   ),
   -- roles graph, child added to all ancestors including self
   -- allows simple join to parent_role to find all descendants
   role_graph AS (
      SELECT DISTINCT
             role,
             regexp_substr(role_path, '(/)(\w+)', 1, 1, 'i', 2) AS parent_role
        FROM role_tree
   ),
   -- user system privileges
   sys_priv AS (
      -- system privileges granted directly to users
      SELECT u.username, p.privilege
        FROM dba_sys_privs p
        JOIN app_user u ON u.username = p.grantee
      UNION
      -- system privileges granted directly to PUBLIC
      SELECT u.username, p.privilege
        FROM dba_sys_privs p
       CROSS JOIN app_user u
       WHERE p.grantee = 'PUBLIC'
         AND p.privilege NOT IN (
                SELECT r.role
                  FROM dba_roles r
             )
      UNION
      -- system privileges granted to users via roles
      SELECT u.username, p.privilege
        FROM dba_role_privs r
        JOIN app_user u ON u.username = r.grantee
        JOIN role_graph g ON g.parent_role = r.granted_role
        JOIN dba_sys_privs p ON p.grantee = g.role
      UNION
      -- system privileges granted to PUBLIC via roles
      SELECT u.username, p.privilege
        FROM dba_role_privs r
        JOIN role_graph g ON g.parent_role = r.granted_role
        JOIN dba_sys_privs p ON p.grantee = g.role
        CROSS JOIN app_user u
       WHERE r.grantee = 'PUBLIC'
   ),
   -- user object privileges
   obj_priv AS (
      -- objects granted directly to users
      SELECT u.username, p.owner, p.type AS object_type, p.table_name AS object_name
        FROM dba_tab_privs p
        JOIN app_user u ON u.username = p.grantee
       WHERE p.owner IN (
                SELECT u2.username
                  FROM app_user u2
             )
      UNION
      -- objects granted to users via roles
      SELECT u.username, p.owner, p.type AS object_type, p.table_name AS object_name
        FROM dba_role_privs r
        JOIN app_user u ON u.username = r.grantee
        JOIN role_graph g ON g.parent_role = r.granted_role
        JOIN dba_tab_privs p ON p.grantee = g.role
       WHERE p.owner IN (
                SELECT u2.username
                  FROM app_user u2
             )
      -- objects granted to PUBLIC
      UNION
      SELECT u.username, p.owner, p.type AS object_type, p.table_name AS object_name
        FROM dba_tab_privs p
       CROSS JOIN app_user u
       WHERE p.owner IN (
                SELECT u2.username
                  FROM app_user u2
             )
         AND p.grantee = 'PUBLIC'
   ),
   -- issues if user is configured in the connection pool of a middle tier
   issues AS (
     -- privileges not part of CONNECT role
      SELECT username,
             'SYS' AS owner,
             'PRIVILEGE' AS object_type,
             privilege AS object_name,
             'Privilege is not part of the CONNECT role' AS issue
        FROM sys_priv
       WHERE privilege NOT IN ('CREATE SESSION', 'SET CONTAINER')
      -- access to non PL/SQL units
      UNION ALL
      SELECT username,
             owner,
             object_type,
             object_name,
             'Access to non-PL/SQL unit'
        FROM obj_priv
       WHERE object_type NOT IN ('PACKAGE', 'TYPE', 'FUNCTION', 'PROCEDURE')
       -- own objects
      UNION ALL
      SELECT u.username,
             o.owner,
             o.object_type,
             o.object_name,
             'Connect user must not own any object'
        FROM app_user u
        JOIN dba_objects o ON o.owner = u.username
      -- missing CREATE SESSION privilege
      UNION ALL
      SELECT u.username,
             'SYS',
             'PRIVILEGE',
             'CREATE SESSION',
             'Privilege is missing, but required'
        FROM app_user u
       WHERE u.username NOT IN (
                SELECT username
                  FROM sys_priv
                 WHERE privilege = 'CREATE SESSION' 
             )
      -- missing PL/Scope metadata leads to wrong results
      UNION ALL 
      SELECT p.username,
             p.owner, 
             p.object_type, 
             p.object_name, 
             'PL/Scope metadata is missing, required for analysis'
        FROM obj_priv p
        JOIN missing_plscope_obj s
          ON s.owner = p.owner
             AND replace(s.object_type, ' BODY') = p.object_type
             AND s.object_name = p.object_name
      -- access to PL/SQL units updating database state without COMMIT/ROLLBACK
      UNION ALL 
      SELECT p.username,
             p.owner,
             p.object_type,
             p.object_name,
             'INSERT/UPDATE/DELETE/MERGE without COMMIT/ROLLBACK'
        FROM obj_priv p
        JOIN app_plsql a
          ON a.owner = p.owner
             AND replace(a.object_type, ' BODY') = p.object_type
             AND a.object_name = p.object_name
       WHERE p.object_type IN ('PACKAGE', 'TYPE', 'FUNCTION', 'PROCEDURE') 
         AND a.has_dml = 1 AND a.has_transaction = 0
   ),
   -- aggregate issues per user
   issue_aggr AS (
      SELECT u.username, COUNT(i.username) issue_count
        FROM app_user u
        LEFT JOIN issues i ON i.username = u.username
       GROUP BY u.username
   ),
   -- user summary (calculate is_smartdb_property_3_met)
   summary AS (
      SELECT username,
             CASE
                WHEN issue_count = 0 THEN
                   'YES'
                ELSE
                   'NO'
             END AS is_smartdb_property_3_met,
             issue_count
        FROM issue_aggr
       ORDER BY is_smartdb_property_3_met DESC, username
   )
-- main
SELECT * 
  FROM summary
 WHERE issue_count = 0;
/

USERNAME                 IS_SMARTDB_PROPERTY_3_MET ISSUE_COUNT
------------------------ ------------------------- -----------
APEX_REST_PUBLIC_USER    YES                                 0
THE_BAD_USER             YES                                 0
THE_GOOD_USER            YES                                 0

If you are using a connect user that is not listed as result, then your application is not SmartDB.

The query checks also the SmartDB properties 1 and 2. However, the query produces only a result if the PL/SQL bodies in application users are compiled with PL/Scope (see script above). Change the main part of the query to SELECT * from issues if you want to know why a connect user is not shown in the result.

BTW: the check results for the SmartDB properties 2 and 3 are identical because THE_BAD_USER and THE_GOOD_USER do not have access to write operations.

4. SQL statements are written by human hand

If you generate SELECT, INSERT, UPDATEDELETE or MERGE statements, then your application is not SmartDB.

I see the following reasons to generate code (including SQL statements):

  • Don’t repeat yourself (DRY principle). Striving for DRYness leads to better data models, better designs and better code. In some cases code generators are required to achieve the goal.
  • Reduce the overall complexity by using a domain specific language (DSL). Enforce rules and conventions in the DSL and the code templates. This leads to a smaller code base and improves the productivity.

A generator creates code at design/build time or at runtime. Both approaches have pros and cons. Code generators producing code at runtime are easier to deploy, but may produce more runtime errors, are harder to debug and come with a performance penalty. Code generator producing code at design/build time are more extensive and more costly to deploy, are much easier to debug, have a better runtime performance and produce errors at install time rather than runtime.

Even if generated code should look like written by human hand, it should never become a part of your code base. Generated code is derived from something else. This “something else” (generator, code templates, generator input) is part of your code base. Do not amend generated code and keep your code base as small as possible. It’s okay to keep the generated code also in the version control system, but you should separate it from the “real code base”. It should be absolutely clear that generated code is completely replaced by a subsequent generator run.

Code generators offer a high value when used correctly. Therefore, the general ban on the use of generators is simply ignorant.

5. SQL statements exploit the full power of set-based SQL

If you use row-by-row processing when set-based SQL is feasible and noticeably faster, then your application is not SmartDB.

This is the most important SmartDB property. Set-based SQL ist the key to good performance. It means that you are using the database as a processing engine and not as a data store only.

You should make it a habit to minimize the total number of executed SQL statements to get a job done. Less loops, more set-based SQL. In the end it is simpler. You tell the database what you want and the optimizer figures out how to do it efficiently.

Conclusion

If you use set-based SQL in your application and manually craft your SQL statements, then you can check with a single SQL statement, if your application is really SmartDB. Don’t be disappointed, if it is not. Beside some demo applications I haven’t seen a SmartDB application and I do not expect to see one soon.

The SmartDB idea is based on sound analysis and some good advices (see Toon Koppelaar’s excellent video and slide deck). But the resulting SmartDB definition overshoots the mark. It focuses too much on PL/SQL and ignores the capabilities of database-aware tools. These tools support SQL (or MDX) as the primary interface to the database (especially for queries). Using another path to the database is usually possible, but less efficient from a development costs and time to market perspective.

It looks like that there is currently no way to refine the SmartDB definition making this approach broadly usable. The recommended alternative is to come up with an own definition. My next post is dealing with that topic. However, I’d still like to see a SmartDB 2.0 definition that tolerates views as part of the API, generated code and transaction control statements by the API caller.

The post Is Your Application SmartDB? appeared first on Philipp Salvisberg's Blog.

The Pink Database Paradigm (PinkDB)

$
0
0

1. Introduction

The Pink Database paradigm (PinkDB) is an application architecture for database centric applications. It is focusing on relational database systems and is vendor neutral. The principles are based on the ideas of SmartDB, with some adaptions that make PinkDB easier to apply in existing development environments. An important feature of a PinkDB application is that it uses set-based SQL to conserve resources and deliver best performance.

Connect User


The connect user does not own objects. No tables. No views. No synonyms. No stored objects. It follows the principle of least privileges.

API Schema


The API schema owns the API to the data. Access is granted on bases of the principle of least privileges. The API consists of stored objects and views, but no tables.

Data


The data ist stored in a data model using the features of the underlying database system for consistency. It is protected by the API and processed by set-based SQL for best performance.

2. Features

An application implementing PinkDB has following features:

  1. The connect user does not own database objects
  2. The connect user has access to API objects only
  3. The API consists of stored objects and views
  4. Data is processed by set-based operations
  5. Exceptions are documented

2.1. The connect user does not own database objects

The connect user is used by application components outside of the database to interact with the database. It is configured for example in the connection pool of the middle tier application.

The connect user must access only the APIs of the underlying database applications. It must not own database database objects such as tables, views, synonyms or stored objects.

The principle of least privileges is followed.

This is 100 percent identical to SmartDB.

2.2. The connect user has access to API objects only

Database tables are guarded behind an API. The connect user must not have privileges to access objects that are not part of the API, e.g. via SELECT ANY TABLE privileges or similar.

The principle of least privileges is followed.

2.3 The API consist of stored objects and views

The API schema owns the API to the data. Access is granted on bases of the principle of least privileges. The API consists of stored objects and views, but no tables.

2.4 Data is processed by set-based operations

The data ist stored in a data model using the features of the underlying database system for consistency. It is not necessary to store tables, indexes, etc. in a dedicated schema. But it is mandatory, that the data is protected by the API.

Set-based SQL ist the key to good performance. It means that you are using the database as a processing engine and not as a data store only. You should avoid row-by-row processing when set-based SQL is feasible and noticeably faster. This means that row-by-row processing is acceptable, e.g. to update a few rows via GUI, but not for batch processing where set-based operations are by factors faster. Using stored objects for batch processing simplifies the work. Set-based processing becomes natural.

You should make it a habit to minimize the total number of executed SQL statements to get a job done. Less loops, more set-based SQL. In the end it is simpler. You tell the database what you want and the optimizer figures out how to do it efficiently.

2.5 Exceptions are documented

All these features are understood as recommendations. They should be followed. Without exceptions. However, in real projects we have to deal with limitations and bugs and sometimes it is necessary to break rules. Document the reason for the exception and make sure that the exception does not become the rule.

3. Differences to SmartDB

SmartDB is targeting PL/SQL and therefore focusing on Oracle Databases. PinkDB is vendor agnostic and can be applied on SQL Server, Db2, Teradata, EnterpriseDB, PostgreSQL, MySQL, MariaDB, HSQL, etc. This does not mean that just a common superset of database features should be used, quite the contrary. Use the features of the underlying systems to get the best value, even if they are vendor specific.

The API in SmartDB consists of PL/SQL units only. No exceptions. PinkDB allows views. In fact they are an excellent API for various use-cases. For example reporting tools using SQL to access star schemas or using a MDX adapter to access logical cubes based on analytic views. APEX is another example. You develop efficiently with APEX when your reports and screens are based on views (or tables). Using stored objects only to access Oracle database sources is working against the tool. However, you have to be careful. Using views only can be dangerous and most probably will violate sooner or later the “data is processed by set-based operations” feature, if you do not pay attention. Other examples are applications built with JOOQ. JOOQ makes static SQL possible within Java. The productivity is comparable to PL/SQL. It’s natural to write set-based SQL. These examples show that defining NoPlsql (NoStoredObjects) as the opposite of SmartDB is misleading, since it describes something bad. NoPlsql is not bad per se. It really depends how you use the database. If you use it as a processing engine then this cannot be bad. In fact it is excellent. This is probably the biggest difference between SmartDB and PinkDB.

SmartDB has this weird requirement that all SELECT, INSERT, UPDATE, DELETE and MERGE statements must be written by human hand (within PL/SQL). No generators are allowed. PinkDB welcomes generators to increase the productivity and consistency of the result.

The last difference are transaction control statements. SmartDB enforces them to be part of the PL/SQL API. PinkDB allows the use of COMMIT and ROLLBACK outside of the database. However, if a stored object call is covering the complete transaction, it should take also the responsibility of the final COMMIT.

SmartDB and PinkDB have the same ancestors. I see PinkDB as the understanding sister of her wise, but sometimes a bit stubborn brother SmartDB.

4. Related Resources

As I said, PinkDB and SmartDB are related. That’s why all SmartDB documents are also interesting for PinkDB. Steven Feuerstein is maintaining a SmartDB Resource Center. You find a lot of useful information and links there. I highly recommend to look at Toon Koppelaar’s excellent video and slide deck. Toon really knows what he is talking about. Would you like to know if your database application is SmartDB compliant? Then see my previous blog post. There’s a script you can run to find out.

The post The Pink Database Paradigm (PinkDB) appeared first on Philipp Salvisberg's Blog.

Why Pink?

$
0
0

PinkDB is an acronym for “processing in knowing database”. In this blog post I tell you how I came up with the acronym and its meaning.

When I start writing a blog post, I usually only have a vague idea of the title. I’m usually going to change it more than once. The Pink Database Paradigm (PinkDB) was no different. An early version of the title was based on the acronym “uDBasPE” for “use the database as processing engine”. I liked the meaning, but I knew the acronym was unfit. I had difficulties remembering it myself, and the first three letters pointed in the wrong direction.

In a customer project we did it the other way around and named a code generator after a female first name. A few days later we had a meaning for the acronym. Nobody in the project remembered the meaning, but everyone remembered the name and most didn’t even know it was an acronym. That was an inspiring experience.

I came up with “Pink” when I thought about the coloring of the circles in the figure of the architectural layers. Pink is the color of my wife’s car and it can be found in many places in our home. I wouldn’t wear pink clothes at conferences if I didn’t like the color. The partial meaning “processing in… database” for PinkDB was obvious. I googled for “PinkDB” and searched for “#PinkDB” on Twitter. Of course I found something, but nothing conflicting. Hence “PinkDB”. I just had to find a reasonably appropriate adjective beginning with “k”. I grabbed my “Langenscheidts Handwörterbuch Englisch-Deutsch / Deutsch-English”, that I’ve never touched in years and looked for English words starting with “k”. Only 5 pages… not many candidates.  And “knowing” was my choice.

Knowing as adjective with meaning intelligent, clever, shrewd, smart. So, what makes the database knowing? – The optimizer. It’s the core of a good processing engine. Using the database as processing engine is a key feature of PinkDB. Pink it is.

The post Why Pink? appeared first on Philipp Salvisberg's Blog.

Viewing all 118 articles
Browse latest View live