Quantcast
Viewing all 118 articles
Browse latest View live

IslandSQL Episode 3: Lock Table

Introduction

In the last episode we extended the IslandSQL grammar to cover all DML statements as single lexer token. Now it’s time to handle the complete grammar for one DML statement. The simplest one is lock table. A good reason to start with it and lay the foundation for the other DML commands.

The full source code is available on GitHub and the binaries on Maven Central.

Grammars in SQL Scripts

When using SQL scripts, we work with several grammars. There is always more than one grammar involved. It depends on your use case how much more.

The candidates when working with an Oracle Database 21c are:

  • SQL*Plus
  • SQLcl
  • PGQL
  • SQL
  • PL/SQL
  • Java
  • more hidden in strings and LOBs such as XML, XSLT, JSON, …

The number of grammars is growing. For example, we expect JavaScript stored procedures in Oracle Database 23c.

In the previous episodes, we have primarily dealt with SQL*Plus and SQL as a whole. Before we deal with a specific SQL statement such as lock table, we need to know where a SQL statement starts and where it ends. The start seems obvious, but the end? Is the fragment SQL_END correctly describing the end of a SQL statement?

fragment SQL_END:
      EOF
    | (';' [ \t]* SINGLE_NL?)
    | SLASH_END
;

Where Does a SQL Statement End?

A common misconception is, that a SQL statement ends with a semicolon. This seems to be true when you only look at the syntax per statement in the Oracle Database documentation, but it is not. Here’s an example:

begin
   execute immediate '
      lock table dept in exclusive mode
   ';
end;
/

The anonymous PL/SQL block executes a dynamic lock table statement on line 3. Please note that the lock table statement starts with whitespace and ends on whitespace. We do not pass a semicolon as part of the execute immediate statement. This anonymous PL/SQL block completes successfully when the connected user can lock emp.

However, when we add a semicolon at the end of the lock table statement we get the following error:

Error starting at line : 1 in command -
begin
   execute immediate '
      lock table dept in exclusive mode;
   ';
end;
Error report -
ORA-00933: SQL command not properly ended
ORA-06512: at line 2
00933. 00000 -  "SQL command not properly ended"
*Cause:    
*Action:

It’s not allowed to terminate a SQL statement with a semicolon in dynamic SQL and therefore also when executing SQL via JDBC or ODBC.

But what is the semicolon for? Well, it terminates a SQL statement within a SQL*Plus or SQLcl script. In SQL*Plus you can change the behaviour using the set sqlterminator command.

The following script works in SQL*Plus (but not in SQLcl, it’s a documented limitation):

set sqlterminator $
lock table dept in exclusive mode$

The semicolon is the default terminator of SQL statement in SQL scripts. The semicolon is part of the SQL*Plus grammar but not part of the SQL grammar. However, it is more. It is also the only supported statement terminator in PL/SQL as the following SQL*Plus script shows.

set sqlterminator $
begin
   lock table dept in exclusive mode;
end;
/

The semicolon on line 4 terminates the anonymous PL/SQL block. It’s part of the PL/SQL grammar. The final slash is not part of the anonymous PL/SQL block. It is an alternative to the SQL*Plus run command. It sends the buffer (the anonymous PL/SQL block) to the database server.

By the way, the Oracle Database documentation explains why it uses the semicolon in its grammar. Here is the corresponding quote:

Note: SQL statements are terminated differently in different programming environments. This documentation set uses the default SQL*Plus character, the semicolon (;).
Lexical Conventions, SQL Language Reference. Oracle Datase 21c

Yes, the semicolon is part of the SQL*Plus grammar. There is no common sequence of characters for identifying the end of an SQL statement.

What Are We Going to Do Now?

I initially wanted to use ANTLR modes to handle the complete grammar of chosen statements. However, ANTLR modes require that you can identify the start and the end of a mode. For the lock mode statement, the start is the lock keyword and the end is the SQL_END fragment as we used before. We could also use just the semicolon to determine the end. While this works for the lock table statement, it will cause some problems when trying to integrate the PL/SQL grammar.

How do we find out whether a semicolon belongs to a PL/SQL statement or to an SQL statement? Is this possible in the lexer? Well, I think it’s possible by doing some sematic predicate acrobatics, but I don’t think it’s sensible.

Another approach is to use two lexers. The first one, extracting the relevant statement in scope of the IslandSQL grammar. And the second lexer, processing only the extracted statements. The parser uses the token stream from the second lexer. Perfect. However, we want to keep the original positions (line/column) of the tokens in scope. They are important for navigating to the right place in the code. How do we do that?

Keep Hidden Tokens as Whitespace

The idea is to replace all non-whitespace characters in hidden tokens with a space. This way the number of lines and the position in the line of all relevant tokens stay the same. The total number of characters are also the same (the number of bytes might change when multibyte characters are replaced).

Here’s an example.

/* ===========================================
 * ignore multiline comment
 * =========================================== */
select * from dual;

rem ignore remark: select * from dual;

-- ignore single line comment
lock table dept in exclusive mode;

After the transformation the script should look like this.

..............................................
...........................
.................................................
select * from dual;

......................................

.............................
lock table dept in exclusive mode;

Please note that a dot(.) represents a replaced character. Read a dot as a space.

We can use this converted script as input for the second lexer.

The implementation is relatively easy. I renamed the original lexer to IslandSqlScopeLexer und used this code for the transformation:

static public String getScopeText(CommonTokenStream tokenStream) {
    TokenStreamRewriter rewriter = new TokenStreamRewriter(tokenStream);
    tokenStream.fill();
    tokenStream.getTokens().stream()
            .filter(token -> token.getChannel() == Token.HIDDEN_CHANNEL
                    && token.getType() != IslandSqlScopeLexer.WS)
            .forEach(token -> {
                        StringBuilder sb = new StringBuilder();
                        token.getText().codePoints().mapToObj(c -> (char) c)
                                .forEach(c -> sb.append(c == '\t' || c == '\r' || c == '\n' ? c : ' '));
                        rewriter.replace(token, sb.toString());
                    }
            );
    return rewriter.getText();
}

The method gets a token stream as input and returns the transformed text (SQL script).

The ANTLR runtime comes with a TokenStreamRewriter that helps adding, deleting or changing tokens. We are only changing hidden tokens that are not of type whitespace. Tabs, carriage returns and line feeds are kept. Other characters are replaced by a space.

The New Lexer

After the preprocessing of the original input we can concentrate on the islands. The sea is representated as whitespace. This simplifies the logic of the lexer.

lexer grammar IslandSqlLexer;

options {
    superClass=IslandSqlLexerBase;
    caseInsensitive = true;
}

/*----------------------------------------------------------------------------*/
// Fragments to name expressions and reduce code duplication
/*----------------------------------------------------------------------------*/

fragment SINGLE_NL: '\r'? '\n';
fragment COMMENT_OR_WS: ML_COMMENT|SL_COMMENT|WS;
fragment SQL_TEXT: (ML_COMMENT|SL_COMMENT|STRING|.);
fragment SLASH_END: SINGLE_NL WS* '/' [ \t]* (EOF|SINGLE_NL);
fragment PLSQL_DECLARATION_END: ';'? [ \t]* (EOF|SLASH_END);
fragment SQL_END:
      EOF
    | (';' [ \t]* SINGLE_NL?)
    | SLASH_END
;

/*----------------------------------------------------------------------------*/
// Hidden tokens
/*----------------------------------------------------------------------------*/

WS: [ \t\r\n]+ -> channel(HIDDEN);
ML_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SL_COMMENT: '--' .*? (EOF|SINGLE_NL) -> channel(HIDDEN);
CONDITIONAL_COMPILATION_DIRECTIVE: '$if' .*? '$end' -> channel(HIDDEN);

/*----------------------------------------------------------------------------*/
// Keywords
/*----------------------------------------------------------------------------*/

K_EXCLUSIVE: 'exclusive';
K_FOR: 'for';
K_IN: 'in';
K_LOCK: 'lock';
K_MODE: 'mode';
K_NOWAIT: 'nowait';
K_PARTITION: 'partition';
K_ROW: 'row';
K_SHARE: 'share';
K_SUBPARTITION: 'subpartition';
K_TABLE: 'table';
K_UPDATE: 'update';
K_WAIT: 'wait';

/*----------------------------------------------------------------------------*/
// Special characters
/*----------------------------------------------------------------------------*/

AT_SIGN: '@';
CLOSE_PAREN: ')';
COMMA: ',';
DOT: '.';
OPEN_PAREN: '(';
SEMI: ';';
SLASH: '/';

/*----------------------------------------------------------------------------*/
// Data types
/*----------------------------------------------------------------------------*/

STRING:
    'n'?
    (
          (['] .*? ['])+
        | ('q' ['] '[' .*? ']' ['])
        | ('q' ['] '(' .*? ')' ['])
        | ('q' ['] '{' .*? '}' ['])
        | ('q' ['] '<' .*? '>' ['])
        | ('q' ['] . {saveQuoteDelimiter1()}? .+? . ['] {checkQuoteDelimiter2()}?)
    )
;

INT: [0-9]+;

/*----------------------------------------------------------------------------*/
// Identifier
/*----------------------------------------------------------------------------*/

QUOTED_ID: '"' .*? '"' ('"' .*? '"')*;
ID: [\p{Alpha}] [_$#0-9\p{Alpha}]*;

/*----------------------------------------------------------------------------*/
// Islands of interest as single tokens
/*----------------------------------------------------------------------------*/

CALL:
    'call' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

DELETE:
    'delete' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

EXPLAIN_PLAN:
    'explain' COMMENT_OR_WS+ 'plan' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

INSERT:
    'insert' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

MERGE:
    'merge' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

UPDATE:
    'update' COMMENT_OR_WS+ SQL_TEXT+? 'set' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

SELECT:
    (
          ('with' COMMENT_OR_WS+ ('function'|'procedure') SQL_TEXT+? PLSQL_DECLARATION_END)
        | ('with' COMMENT_OR_WS+ SQL_TEXT+? SQL_END)
        | (('(' COMMENT_OR_WS*)* 'select' COMMENT_OR_WS SQL_TEXT+? SQL_END)
    )
;

/*----------------------------------------------------------------------------*/
// Any other token
/*----------------------------------------------------------------------------*/

ANY_OTHER: . -> channel(HIDDEN);

Options

The options are the same as in IslandSqlScopeLexer. We need the superclass IslandSqlLexerBase only in the STRING rule to handle all quote identifiers supported by the Oracle Database.

Fragments

We moved the fragments section to the top. The fragments CONTINUE_LINE, SQLPLUS_TEXT and SQLPLUS_END are not required in this lexer. They are used in IslandSqlLexerBase to identify SQL*Plus commands as hidden tokens.

Since we replaced all SQL*Plus commands with whitespace there is no need to handle SQL*Plus commands in this lexer.

Hidden Tokens

In this section we define the tokens that we do not need in the parser. Therefore we place them on the hidden channel.

Wait, weren’t the hidden tokens replaced by whitespace? Yes, but only those that were not part of other tokens. However, the statements in scope (represented as a single token in IslandSqlScopeLexer) contain whitespace characters and maybe also comments or even conditional compilation directives (e.g. in plsql_declarations of a select statement). At the current stage of the grammar the CONDITIONAL_COMPILATION_DIRECTIVE is de facto unused. We will need it (or some adequate replacement) once we are going to implement the select statement or other statements containing PL/SQL code.

Keywords

It’s a good practice to define a rule for each token. This way we can control the names of the constants generated by ANTLR. We prefix the keywords with a K_ to distinguish them from other rules/tokens. These keywords can also be used as identifiers in various contexts. At the current stage of the grammar this section contains only the keywords used in the lock table statement of the Oracle Database 21c.

Special Characters

The lock table statement uses these special characters.

Data Types

The STRING rule is the same as in IslandSqlScopeLexer. It is a complete definition of a text literal. The INT rule is new. It defines an unsigned integer.

In a future version of the grammar we will need to support all numeric literals. And of course also date literals and interval literals. We will split the implementation between the lexer and the parser. That will be a bit tricky. For now, let’s keep it simple. – An unsigned integer works in most cases.

Identifier

In the lexer we define two types of identifier. Quoted and nonquoted Identifiers. See also Database Object Naming Rules. See also my blog post regarding unnecessary quoted identifers.

Islands of Interest as Single Token

This is the same list of rules as in IslandSqlScopeLexer. The only rule that is missing is LOCK_TABLE since we are tokenizing this SQL statement completely.

Please note that the UPDATE rule includes a set keyword. This is necessary because the keyword update  is also part of the lock table statement. Without this change a lock table emp in share update mode nowait; statement would be partly identified als update statement (update mode nowait;).

The final version of the lexer will not contain statements as single tokens.

Any Other Token

As in IslandSqlScopeLexer we put any other character on the hidden channel. This suppresses some errors in the parser. For example, we can insert a euro sign () or a pound sign (£) almost anywhere in the code without causing an error.

In future versions of the lexer, we will put the ANY_OTHER token on the DEFAULT_CHANNEL to avoid this kind of error suppression.

Parser Changes

The changes to the previous version of the parser are highlighted.

parser grammar IslandSqlParser;

options {
    tokenVocab=IslandSqlLexer;
}

/*----------------------------------------------------------------------------*/
// Start rule
/*----------------------------------------------------------------------------*/

file: dmlStatement* EOF;

/*----------------------------------------------------------------------------*/
// Data Manipulation Language
/*----------------------------------------------------------------------------*/

dmlStatement:
      callStatement
    | deleteStatement
    | explainPlanStatement
    | insertStatement
    | lockTableStatement
    | mergeStatement
    | selectStatement
    | updateStatement
;

callStatement: CALL;
deleteStatement: DELETE;
explainPlanStatement: EXPLAIN_PLAN;
insertStatement: INSERT;
mergeStatement: MERGE;
updateStatement: UPDATE;
selectStatement: SELECT;

/*----------------------------------------------------------------------------*/
// Lock table
/*----------------------------------------------------------------------------*/

lockTableStatement:
    stmt=lockTableStatementUnterminated sqlEnd
;

lockTableStatementUnterminated:
    K_LOCK K_TABLE objects+=lockTableObject (COMMA objects+=lockTableObject)*
        K_IN lockmode=lockMode K_MODE waitOption=lockTableWaitOption?
;

lockTableObject:
    (schema=sqlName DOT)? table=sqlName
        (
              partitionExtensionClause
            | (AT_SIGN dblink=qualifiedName)
        )?
;

partitionExtensionClause:
      (K_PARTITION OPEN_PAREN name=sqlName CLOSE_PAREN)             # partition
    | (K_PARTITION K_FOR OPEN_PAREN
        (keys+=expression (COMMA keys+=expression)*) CLOSE_PAREN)   # partitionKeys
    | (K_SUBPARTITION OPEN_PAREN name=sqlName CLOSE_PAREN)          # subpartition
    | (K_SUBPARTITION K_FOR OPEN_PAREN
        (keys+=expression (COMMA keys+=expression)*) CLOSE_PAREN)   # subpartitionKeys
;

// TODO: complete according https://github.com/IslandSQL/IslandSQL/issues/11
expression:
      STRING        # stringLiteral
    | INT           # integerLiteral
    | sqlName       # sqlNameExpression
;

lockMode:
      (K_ROW K_SHARE)               # rowShare
    | (K_ROW K_EXCLUSIVE)           # rowExclusive
    | (K_SHARE K_UPDATE)            # shareUpdate
    | (K_SHARE)                     # share
    | (K_SHARE K_ROW K_EXCLUSIVE)   # shareRowExclusive
    | (K_EXCLUSIVE)                 # exclusive
;

lockTableWaitOption:
      K_NOWAIT                  # nowait
    | K_WAIT waitSeconds=INT    # wait
;

/*----------------------------------------------------------------------------*/
// Identifiers
/*----------------------------------------------------------------------------*/

keywordAsId:
      K_EXCLUSIVE
    | K_FOR
    | K_IN
    | K_LOCK
    | K_MODE
    | K_NOWAIT
    | K_PARTITION
    | K_ROW
    | K_SHARE
    | K_SUBPARTITION
    | K_TABLE
    | K_UPDATE
    | K_WAIT
;

unquotedId:
      ID
    | keywordAsId
;

sqlName:
      unquotedId
    | QUOTED_ID
;

qualifiedName:
	sqlName (DOT sqlName)*
;

/*----------------------------------------------------------------------------*/
// SQL statement end, slash accepted without preceeding newline
/*----------------------------------------------------------------------------*/

sqlEnd: EOF | SEMI | SLASH;

Data Manipulation Language

The only visible change in this section is the title. However, there is an important change regarding lockTableStatement.  It’s not a simple rule referring to a lexer token anymore.

Lock Table

lockTableStatement

On line 40-42 we define the lockTableStatement. It starts with a lockTableStatementUnterminated and ends on sqlEnd. It contains the same number of characters as in the previous parser version. As a result the extension for Visual Studio Code finds the same lock table statements as before.

lockTableStatementUnterminated

On line 44-47 we define the lockTableStatementUnterminated according the Oracle Database SQL Language Reference 21c with the following three fields:

  • objects as an array of lockTableObjects with at least one entry
  • lockMode that refers to an mandatory instance of lockMode
  •  waitOption that refers to an optional instance of lockTableWaitOption

Based on that ANTLR generates a IslandSqlParser class with a nested class LockTableStatementUnterminatedContext.

public class IslandSqlParser extends Parser {
    ...
    public static class LockTableStatementUnterminatedContext extends ParserRuleContext {
        public LockTableObjectContext lockTableObject;
        public List<LockTableObjectContext> objects = new ArrayList<LockTableObjectContext>();
        public LockModeContext lockmode;
        public LockTableWaitOptionContext waitOption;
        ...
    }
    ...
}

The parser populates an instance of LockTableStatementUnterminatedContext according the input. Interesting is, that there is a redundancy between objects and lockTableObject. The former contains all objects to be locked and the latter just the last one.

Please note that the lock table statement ends on mode keyword or on lockTableWaitOption which can end on wait keyword or on an integer value.

lockTableObject

The lockTableObject on line 49-55 defines the following fields:

  • schema, optional that refers to a sqlNameidentifier
  • table, mandatory that refers to a sqlName identifier
  • dblink, optional that refers to a qualifiedName identifier

For the optional partitionExtensionClause no field is defined. I think this is wrong and should be fixed in a future version. Nonetheless it’s possible to find it in the generic children field.

partitionExtensionClause

The partitionExtensionClause on line 57-64 defines four partition variants. Each variant has a label – the token after the hash sign (#). Based on these labels ANTLR generates the following subclasses of the class PartitionExtensionClauseContext:

  • PartitionContext
  • SubpartitionKeysContext
  • SubpartitionContext
  • SubpartitionKeysContext

It’s another good practice to define a label for an alternative. It simplifies finding classes in the parse tree using listeners or visitors and makes the parse tree more expressive. The next screenshots highlights the partition alternative in the ANTLR IntelliJ plugin. The ANTLR interpreter does not generate classes. Instead it shows the alternative after a colon. Either the ordinal number or the label, if available. However, it’s still a good representation of what you can expect at runtime of the parser generated by ANTLR.

Image may be NSFW.
Clik here to view.

The alternatives for partitionKeys and subpartitionKeys define a field named keys with an array of expression.

expression

When working on a grammar you feel more than once like Hal fixing a light bulb. An expression is probably the most extensive part of the SQL grammar. It’s huge. It contains subqueries and a subquery is basically a select statement and a select statement uses conditions… Once we’ve done that, implementing the rest of the IslandSQL grammar is a piece of cake.

Therefore I decided to postpone the complete implementation and define just the bare minimum on line 66-71. Making the lock table statement work for partition keys based on integers, strings and variable names. – No datetime expressions yet.

lockMode

The Oracle Database allows 6 different lock modes. You find the valid allternatives on line 74-79.

lockTableWaitOption

By default the Oracle Database waits indefinitely for the lock. You can override this behehaviour by one of the alternatives defined on line 83-84.

The grammar defines waitSeconds as an INT. That matches the definition in the SQL Language Reference of the Oracle Database 21c.

Image may be NSFW.
Clik here to view.

However, what is the meaning of integer in this case? Can we use an integer variable in PL/SQL? Can we use a decimal literal that can be converted to an integer such as 10.? Or can we use scientific notations such as 1e2 or even 1e2d? To know that, we have to try it out.

SQL> declare
  2     co_wait_in_seconds constant integer := 10;
  3  begin
  4     lock table emp in exclusive mode wait co_wait_in_seconds;
  5  end;
  6  /
   lock table emp in exclusive mode wait co_wait_in_seconds;
                                         *
ERROR at line 4:
ORA-06550: line 4, column 42:
PL/SQL: ORA-30005: missing or invalid WAIT interval
ORA-06550: line 4, column 4:
PL/SQL: SQL Statement ignored

SQL> lock table emp in exclusive mode wait 10.;

Table(s) Locked.

SQL> lock table emp in exclusive mode wait 1e2;

Table(s) Locked.

SQL> lock table emp in exclusive mode wait 1e2d;
lock table emp in exclusive mode wait 1e2d
                                      *
ERROR at line 1:
ORA-30005: missing or invalid WAIT interval

So, we cannot use a variable/constant. But the scientific notation works and a decimal literal can be converted to an integer as long as we do not use the scientific notation.

I consider it a bug that we cannot use a variable/constant for the time to wait in PL/SQL. Especially since we can use static expressions in PL/SQL in various places, e.g. to define the size of varchar2 variable (since 12.2). It does not make sense to enforce the use of dynamic SQL to handle dynamic wait times in PL/SQL.

I can imagine that a future version of the Oracle Database will lift this restriction in the lock table statement. It would be a small change. Maybe not even documented. Therefore it might be a good idea to change this part of the grammar and support a bit more.

Identifiers

The Oracle Database allows the use of keywords as identifiers in a lot of places. Therefore we should allow the use of keywords such as lock in the lock table statement’s the identifiers schema, table and dblink.  For that we created a rule named keywordAsId on line 91-105 that covers all keywords.

We defined ID in the lexer. It covers all identifiers. However, keywords have a higher priority in the lexer. Therefore we defined a new rule unquotedId that combines ID with keywordAsId.

The rule sqlName on line 112-115 combines unquotedId with the QUOTED_ID which we defined in the lexer.

And finally the rule qualifiedName on line 117-119 covers the unbounded concatenation of SqlName with a dot. The concatenation is optional. So a qualifiedName could look 100% the same as a sqlName. We could remove the schema in the rule lockTableObject and use qualifiedName for table like this:

lockTableObject:
    table=qualifiedName
        (
              partitionExtensionClause
            | (AT_SIGN dblink=qualifiedName)
        )?
;

This works and is a valid representation of the grammar. However, it’s less expressive. For dblink we must use qualifiedName.  There is no predefined, binding naming scheme that covers the number of segments for a database link name.

SQL Statement End

The sqlEnd rule on the last line 125 defines the end of a SQL statement in SQL*Plus/SQLcl. We do not handle whitespace here as in the IslandSqlScopeLexer. As a result a lock table statement could be terminated with a slash on the same line. This might need some rework in a future version of the grammar.

Syntax Errors

Let’s look at a lock table  statemant that uses an invalid lock mode.

SQL> lock table dept in access exclusive mode;
lock table dept in access exclusive mode
                          *
ERROR at line 1:
ORA-01737: valid modes: [ROW] SHARE, [[SHARE] ROW] EXCLUSIVE, SHARE UPDATE

This is a valid lock mode in PostgresSQL 15. Beside the lock mode the syntax of the lock table statement is different to the one in the Oracle Database 21c in various places. However, it should not be to complicated to define a grammar that can handle both syntaxes. We put this on the todo list and focus on supporting the Oracle Database grammar first.

However, how does our grammar deal with invalid SQL statements? – Here’s a screenshot of the extension for Visual Studio Code showing some lock table statements.

Image may be NSFW.
Clik here to view.

You see the word access wavy underlined in red. On mouse over you get the details displayed in the problems panel. Furthermore you see that lock_table.sql is displayed in red with a number 3 indicating that this file has three problems. And the outline view indicates problems by showing symbols in red.

That’s the cool thing when using an IDE supporting Micosoft’s Language Server Protocol. We just have to provide the syntax errors and the visualization happens automatically by the IDE in a standardized manner.

Right now the parser provides only syntax errors. However, it is relatively easy to implement an linter based on this grammar and provide the results as warnings. For example for lock table statements without a waitOption.

Outlook

We have not succeeded in fully supporting the lock table statement. There are cases that cannot yet be successfully parsed. I would like to address that. To do this, we need to look a bit more at literals and expressions before we deal with more DML statements.

Another topic is the support of more SQL dialects.  I’d like to support PostgreSQL. Maybe it is a good time to start as soon as a DML statement is fully covered.

Stay tuned.

The post IslandSQL Episode 3: Lock Table appeared first on Philipp Salvisberg's Blog.


IslandSQL Episode 4: Expressions

Introduction

In the last episode, we extended the IslandSQL grammar covering the complete lock table statement. However, the support for expressions was very limited. It was not possible to use a date literal or to_date function to determine a partition to be locked. Time to fix that. In this episode, we will have a closer look at some SQL expressions and how to deal with the complexity of the SQL language.

The full source code is available on GitHub and the binaries are on Maven Central.

Lock Table Partition

To lock a table partition we use the partition name like this:

lock table sales partition (sales_q2_2000) in exclusive mode nowait;

Or we let the Oracle Database determine the partition by passing the values of a partition key like this:

lock table sales partition for (date '2000-04-01') in exclusive mode nowait;

Both statements will lock the same partition of the table sales. The latter is IMO better since it works also with system-generated partition names, which you might use with interval partitioning.

There are a lot of possibilities to specify the date. Here is a selection of sensible and less sensible alternatives:

lock table sales partition for (to_date('2000-04', 'YYYY-MM')) in exclusive mode nowait;
lock table sales partition for (to_date('2000-092', 'YYYY-DDD')) in exclusive mode nowait;
lock table sales partition for (timestamp '2000-04-01 08:42:42') in exclusive mode nowait;
lock table sales partition for (add_months(trunc(to_date('2000-12-31', 'YYYY-MM-DD'), 'YYYY'), 3)) in exclusive mode nowait;
lock table sales partition for (date '2000-01-01' + interval '3' month) in exclusive mode nowait;
lock table sales partition for (timestamp '2000-12-31 18:00:00' + 1/4 + -9 * interval '1' month) in exclusive mode nowait;

All these alternatives lock the partition in the sales table where the data for 1st April 2000 is stored. Using different SQL expressions, of course.

Expressions

I mentioned in my last post that expressions are the most extensive part of the SQL grammar. However, there are ways to optimise the extent of the grammar. ANTLR 4 helps because it allows defining left recursive parts of a grammar naturally. See the expressions labelled with binaryExpression in the excerpt of the parser in version 0.4.0 of the grammar below.

expression:
      expr=STRING                                               # simpleExpressionStringLiteral
    | expr=NUMBER                                               # simpleExpressionNumberLiteral
    | K_DATE expr=STRING                                        # dateLiteral
    | K_TIMESTAMP expr=STRING                                   # timestampLiteral
    | expr=intervalExpression                                   # intervalLiteral
    | expr=sqlName                                              # simpleExpressionName
    | LPAR exprs+=expression (COMMA exprs+=expression)* RPAR    # expressionList
    | expr=caseExpression                                       # caseExpr
    | operator=unaryOperator expr=expression                    # unaryExpression
    | expr=functionExpression                                   # functionExpr
    | expr=AST                                                  # allColumnWildcardExpression
    | left=expression operator=(AST|SOL) right=expression       # binaryExpression
    | left=expression
        (
              operator=PLUS
            | operator=MINUS
            | operator=VERBAR VERBAR
        )
      right=expression                                          # binaryExpression
    | left=expression operator=K_COLLATE right=expression       # binaryExpression
    | left=expression operator=PERIOD right=expression          # binaryExpression
;

In line 13 we deal with multiplication and division. Both sides of the operation allow an expression. Unlike grammars based on ANTLR 3, it is no longer necessary to left-factor this left-recursion in ANTLR 4.

ANTLR 4 solves the ambiguity by prioritizing the alternatives in the order of the definition. As a result, multiplication/division has a higher priority than addition/subtraction and a lower priority than the function expression on line 11.

Left-factoring still might be helpful to optimize the runtime performance of the parser. However, it’s not necessary anymore. And I’m happy with a simpler grammar.

Function Expressions

It would be a bad idea to handle each function like to_date or to_char separately in the grammar. Why? It would be more work and we still would need a solution for custom functions. As a result, we generically implement functions. The most naïve grammar definition would look like this:

functionExpression:
    name=sqlName LPAR (params+=expression (COMMA params+=expression)*)? RPAR
;

Parameterless functions that do not allow parentheses like sysdate will be treated as simpleExpressionName.  This rule handles parameterless functions that require parentheses like sys_guid(). And it can handle functions with an unbounded number of parameters.

So what’s the problem? Why do I call this definition “naïve”? Because the following cases are not covered:

  • named parameters, e.g. dbms_utility.get_hash_value(name => 'text1', base => 0, hash_size => 16)
  • parameter prefixes, e.g. distinct or all in any_value
  • parameter suffixes, e.g. deterministic in approx_median or partition_by_clause/order_by_clause in approx_rank
  • partial analytic clauses, e.g. within group in approx_percentile
  • analytic clauses, e.g. over in avg

In most cases, it makes probably sense to handle them generically. However, there are cases where it’s probably better to define dedicated grammar rules for very special functions such as json_table or xml_table.

You find a less naïve implementation on GitHub. However, it is still incomplete. IMO a good way to assess completeness is to write tests. For function expressions, this means tests for every single function according to the SQL Language Reference. My tests currently cover abs through cardinality. The next function on the to-do list is cast, which contains the following uncovered grammar constructs:

  • parameter prefix multiset for a subquery expression, which is also not yet covered
  • parameter suffix as type_name
  • parameter suffix DEFAULT return_value ON CONVERSION ERROR

Shall we define a dedicated grammar rule for the cast function? I guess yes. However, I’d probably implement multiset as a unary operator and the subquery as part of the expression rule. For that, we need to implement the grammar for the complete select statement.

Knowing the Scope and the Limitations

A typical question I get is “Why are you writing a SQL grammar? There are already some available for ANTLR 4, right?”. That’s an excellent question. The ANLTR organisation on GitHub manages a repository with example grammars. You find the ones for SQL here.

Most of the 3rd party grammars cover just a (large) subset of the underlying languages. They define what they cover in ANTLR or by EBNF. However, they often do not define which versions they cover and they do not define what they don’t cover. As a result, you have to try if the grammar is sufficient for your use case. Furthermore, you have to assess if it is sufficient for future use cases and if it will cover the changes in newer versions. And of course, you have to decide if you are ready to extend/fix the grammar in areas where it does not meet your expectations. You will have to maintain a fork. This can become painful, especially if the grammar contains a lot of stuff you are not interested in and if the existing test suites are incomplete.

Furthermore, I’m not aware of an open-sourced grammar that covers the relevant portions for the Oracle Database and PostgreSQL. Yes, the goal is that IslandSQL supports the recent versions of both dialects as if they were a single dialect.

We develop the grammar iteratively. As a result, there are a lot of interim limitations like the one mentioned regarding the cast function.

SQL*Plus Substitution Variables

However, some limitations are supposed to be permanent. Like the one for substitution variables. Substitution variables can contain arbitrary text. They are replaced before the execution of a script. The IslandSQL grammar provides limited support for substitution variables. They can be used in places where a sqlName is valid. This is basically everywhere you can use an expression.

Here’s an example of a supported usage:

lock table &table_name in exclusive mode wait &seconds;

And here’s an example of an unsupported usage:

lock table dept in &lock_mode mode nowait;

The grammar expects certain keywords at the position of &lock_mode. Here’s the excerpt of the grammar that should make that clear:

lockTableStatementUnterminated:
    K_LOCK K_TABLE objects+=lockTableObject (COMMA objects+=lockTableObject)*
        K_IN lockmode=lockMode K_MODE waitOption=lockTableWaitOption?
;

lockMode:
      K_ROW K_SHARE                 # rowShareLockMode
    | K_ROW K_EXCLUSIVE             # rowExclusiveLockMode
    | K_SHARE K_UPDATE              # shareUpdateLockMode
    | K_SHARE                       # shareLockMode
    | K_SHARE K_ROW K_EXCLUSIVE     # shareRowExclusiveLockMode
    | K_EXCLUSIVE                   # exclusiveLockMode
;

And the next excerpt shows how substitution variables are defined in the grammar.

sqlName:
      unquotedId
    | QUOTED_ID
    | substitionVariable
;

substitionVariable:
    AMP AMP? name=substitionVariableName period=PERIOD?
;

substitionVariableName:
      NUMBER
    | sqlName
;

A substitution variable starts with one or two &. The name can be either a NUMBER (like 1) or a sqlName (like tableName). And a substitution variable can optionally end on .. That’s it. This way we can provide limited support for SQL*Plus substitution variables.

Outlook

The grammar evolves quite nicely. However, expressions are still incomplete. This will be covered with the full support of the selectstatement.

As I’m sure you’ve already found out for yourself, the version of the grammar matches the episodes in this blog post series. And these are the planned versions of IslandSQL with their main features:

  • v0.5.0: Fully parse select statement, complete expressions and conditions
  • v0.6.0: Fully parse remaining DML statements (call, delete, explain plan, insert, merge, update)
  • v0.7.0: PostgreSQL syntax compatibility of implemented statements
  • v0.8.0: Fully parse PL/SQL block
  • v0.9.0: Fully parse create statements of the Oracle Database (function, package, procedure, trigger, type, view)
  • v0.10.0: pgPL/SQL syntax compatibility (sql and plpgsql language in create function, create procedure, create trigger and do)

And after episode 10 the fun begins. We can start to provide value for database developers and others. A linter is one option. But there is more. Stay tuned.

The post IslandSQL Episode 4: Expressions appeared first on Philipp Salvisberg's Blog.

Oracle Database 23c on a Mac with an M-Series Chip

Starting Position

I got my MacBook Pro 16″ with an Apple M1 Max chip with 10 cores, 64 GB RAM and 4 TB disk in November 2021. At that time, the M1 chip had already been on the market for a year and I knew that there were problems when running virtual machines or Docker containers with the Intel x86_64 architecture. Most of my colleagues decided to go with a database instance in the cloud. However, I like to ability to develop offline and I often need an Oracle Database. And disk space is not an issue.

My solution was to run Windows on ARM in Parallels Desktop and install the Oracle Database there. Microsoft did a tremendous job running Intel-based software under Windows on ARM. As far as I remember I did not experience a single crash. And the performance was similar to the Oracle Databases that ran in a Docker container on my Mac mini Server with a 2.3 GHz Intel Quad-Core i7.

I tried other solutions based on QEMU like UTM, but the performance was unbearable slow. So, I lived with my Oracle Database under Windows. Happily until 4th April 2023. The release date of Oracle Database 23c Free.

Oracle Database 23c Free

The Oracle Database 23c Free is available as Linux RPM, VirtualBox VM or Docker image. There are two distributions for the Docker image:

Gerald and the team at Oracle managed to allow us to download Oracle software without forcing us to log in and accept some license agreements. This is super cool, especially in CI environments.

Gerald mentions also the limitation regarding the Oracle Database Free on Apple M chips and points to Colima. I have not used Colima before, however, it sounds like the way to go until Oracle releases their flagship database system for ARM.

What is Colima?

Colima provides an alternative context for Docker containers. Docker containers do not run natively on macOS. Instead, the context provides a virtual machine and the containers run there. The idea is that we do not need to know that there is a VM behind the scenes. Mostly.

Docker Desktop provides already a VM. Why do we need an alternative or an additional VM? Well, I see the following reasons:

  • Firstly, you can use Colima as a replacement for Docker Desktop.
  • Secondly, you have more control over the configuration of the virtual machine, and this effectively allows you to run an Oracle Database in a Docker container on a macOS machine with an Apple M-Series chip.

It’s important to note, that Colima can run side-by-side with Docker Desktop. You can change the current context via the docker context command. However, Docker Desktop 4.17.0 is not able to see “foreign” contexts. This means when you work with Colima you do that from the command line.

Prerequisites

For the next steps, you will need the following:

  • A Mac with an M-Series chip,
  • with macOS Ventura 13.3.1 or newer,
  • Internet access,
  • and the ability to work in a terminal window and execute sudo commands.

So, let’s get started.

Step 1 – Install Homebrew

Run the following command in a terminal window, if you have not installed homebrew already:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

If you already have installed homebrew, then update your installation as follows:

brew update

You might be asked to upgrade outdated formulae. – You know what to do.

Step 2 – Install Colima

We install the stable version of Colima via homebrew. See the installation guide for all options.

brew install colima

Step 3 – Create Colima VM

Now we can start Colima. The initial start creates the virtual machine. We pass a minimal set of parameters to override the defaults where necessary.

colima start \
    --arch x86_64 \
    --vm-type=vz \
    --vz-rosetta \
    --mount-type=virtiofs \
    --memory 4

This command produces a VM with 2 CPUs, 4 GB RAM and 60 GB disk space. The architecture is x86_64. To improve the performance we use the virtualization framework of macOS 13 vz, enable Rosetta 2 and use the macOS-specific volume mount driver virtiofs.

The VM is created in $HOME/.lima. The configuration of the VM is stored in $HOME/.colima/default/colima.yaml. Most of the settings can be changed, but require a restart of the VM (via colima restart or colima stop followed by colima start).

Starting Colima changes the context to colima and stopping it changes it back to default. Use docker context use to change the context.

Step 4 – Create 23c Container

We want to create a container with a volume mapped to a local directory. Due to various permission issues, it is not that simple to achieve that directly. It’s easier if we create an internal Docker volume (within the VM) first.

docker run -d \
    --name 23c \
    -p 1522:1521 \
    -v 23c-data:/opt/oracle/oradata \
    container-registry.oracle.com/database/free

We use Oracle’s distribution of the image. It will be pulled automatically. The volume 23c-data is also created automatically.

To monitor the console output we run the following command:

docker logs -tf 23c

Here’s the log output with timestamps. The database was up and running after 38 seconds.

2023-04-16T10:16:43.392545000Z Starting Oracle Net Listener.
2023-04-16T10:16:44.057130000Z Oracle Net Listener started.
2023-04-16T10:16:44.057316000Z Starting Oracle Database instance FREE.
2023-04-16T10:17:19.544454000Z Oracle Database instance FREE started.
2023-04-16T10:17:19.547911000Z 
2023-04-16T10:17:19.705193000Z The Oracle base remains unchanged with value /opt/oracle
2023-04-16T10:17:21.584419000Z #########################
2023-04-16T10:17:21.584722000Z DATABASE IS READY TO USE!
2023-04-16T10:17:21.584981000Z #########################
2023-04-16T10:17:21.669542000Z The following output is now a tail of the alert.log:
2023-04-16T10:17:21.720236000Z FREEPDB1(3):Opening pdb with Resource Manager plan: DEFAULT_PLAN
2023-04-16T10:17:21.720480000Z 2023-04-16T10:17:17.657677+00:00
2023-04-16T10:17:21.720530000Z Completed: Pluggable database FREEPDB1 opened read write 
2023-04-16T10:17:21.720563000Z Completed: ALTER DATABASE OPEN
2023-04-16T10:17:21.720592000Z 2023-04-16T10:17:19.761876+00:00
2023-04-16T10:17:21.720620000Z ===========================================================
2023-04-16T10:17:21.720647000Z Dumping current patch information
2023-04-16T10:17:21.720673000Z ===========================================================
2023-04-16T10:17:21.720698000Z No patches have been applied
2023-04-16T10:17:21.720722000Z ===========================================================

We can press Ctrl-C once we see this output.

Step 5 – Change Passwords

The database was created with random passwords for SYS, SYSTEM and PDBADMIN. To change them we run the following:

docker exec -it 23c ./setPassword.sh oracle

As result, we expect the following SQL*Plus console output:

The Oracle base remains unchanged with value /opt/oracle

SQL*Plus: Release 23.0.0.0.0 - Developer-Release on Sun Apr 16 10:18:57 2023
Version 23.2.0.0.0

Copyright (c) 1982, 2023, Oracle.  All rights reserved.


Connected to:
Oracle Database 23c Free, Release 23.0.0.0.0 - Developer-Release
Version 23.2.0.0.0

SQL> 
User altered.

SQL> 
User altered.

SQL> 
Session altered.

SQL> 
User altered.

SQL> Disconnected from Oracle Database 23c Free, Release 23.0.0.0.0 - Developer-Release
Version 23.2.0.0.0

That’s it if you are happy with a Docker volume.

If you’d like the volume to be mapped to a local directory on your Mac, then read on.

Step 6 – Export Volume

There are several ways to export a volume. The following script uses a helper container to access the volume and copy the content to the local directory which we want to use in the future as the replacement for the Docker volume.

docker stop 23c
mkdir -p $HOME/docker/23c-data
docker run --rm \
    -v 23c-data:/source \
    -v $HOME/docker:/target \
    ubuntu tar czvf /target/23c-data.tar.gz /source
sudo tar xvpfz $HOME/docker/23c-data.tar.gz \
    --strip-components=1 -C $HOME/docker/23c-data

Step 7 – Change Permissions

We used sudo tar xvpfz... previously to ensure that we can create all files in the target directory with the same permissions as in the original Docker volume. However, this does not seem to be enough.

sudo chown 54321:54321 $HOME/docker/23c-data
sudo chmod -R 777 $HOME/docker/23c-data
sudo chmod 4640 $HOME/docker/23c-data/dbconfig/FREE/orapwFREE

orapwFREE is the password file. It needs restrictive permission to work.

If you’d like to have the user oracle and the group oinstall also on your Mac, then you can run the following:

sudo dscl . -create /Groups/oinstall
sudo dscl . -create /Groups/oinstall name oinstall
sudo dscl . -create /Groups/oinstall gid 54321
sudo dscl . -create /Users/oracle
sudo dscl . -create /Users/oracle name oracle
sudo dscl . -create /Users/oracle uid 54321
sudo dscl . -create /Users/oracle PrimaryGroupID 54321
sudo dseditgroup -o edit -a oracle -t user oinstall
sudo dseditgroup -o edit -a $USER -t user oinstall
sudo dscl . -create /Groups/oinstall GroupMembership oracle
sudo dscl . -create /Groups/oinstall GroupMembership $USER

After that the output of cd $HOME/docker;ls -lR 23c-data should look similar to this:

total 0
drwxrwxrwx  15 oracle  oinstall  480 Apr 16 12:16 FREE
drwxrwxrwx   3 oracle  oinstall   96 Apr 16 12:16 dbconfig

23c-data/FREE:
total 4928904
drwxrwxrwx  7 oracle  oinstall         224 Apr 16 12:16 FREEPDB1
-rwxrwxrwx  1 oracle  oinstall    18759680 Apr 16 12:19 control01.ctl
-rwxrwxrwx  1 oracle  oinstall    18759680 Mar 28 15:28 control02.ctl
drwxrwxrwx  6 oracle  oinstall         192 Apr 16 12:16 pdbseed
-rwxrwxrwx  1 oracle  oinstall   209715712 Apr 16 12:16 redo01.log
-rwxrwxrwx  1 oracle  oinstall   209715712 Apr 16 12:19 redo02.log
-rwxrwxrwx  1 oracle  oinstall   209715712 Apr 16 12:16 redo03.log
-rwxrwxrwx  1 oracle  oinstall   576724992 Apr 16 12:19 sysaux01.dbf
-rwxrwxrwx  1 oracle  oinstall  1216356352 Apr 16 12:19 system01.dbf
-rwxrwxrwx  1 oracle  oinstall    20979712 Mar 28 15:24 temp01.dbf
-rwxrwxrwx  1 oracle  oinstall    26222592 Apr 16 12:19 undotbs01.dbf
-rwxrwxrwx  1 oracle  oinstall     5251072 Apr 16 12:19 users01.dbf

23c-data/FREE/FREEPDB1:
total 1464400
-rwxrwxrwx  1 oracle  oinstall  325066752 Apr 16 12:19 sysaux01.dbf
-rwxrwxrwx  1 oracle  oinstall  293609472 Apr 16 12:19 system01.dbf
-rwxrwxrwx  1 oracle  oinstall   20979712 Apr 16 12:17 temp01.dbf
-rwxrwxrwx  1 oracle  oinstall  104865792 Apr 16 12:19 undotbs01.dbf
-rwxrwxrwx  1 oracle  oinstall    5251072 Apr 16 12:19 users01.dbf

23c-data/FREE/pdbseed:
total 1454144
-rwxrwxrwx  1 oracle  oinstall  325066752 Mar 28 15:27 sysaux01.dbf
-rwxrwxrwx  1 oracle  oinstall  293609472 Mar 28 15:27 system01.dbf
-rwxrwxrwx  1 oracle  oinstall   20979712 Mar 28 15:25 temp01.dbf
-rwxrwxrwx  1 oracle  oinstall  104865792 Mar 28 15:27 undotbs01.dbf

23c-data/dbconfig:
total 0
drwxrwxrwx  8 oracle  oinstall  256 Apr 16 12:16 FREE

23c-data/dbconfig/FREE:
total 48
-rwxrwxrwx  1 oracle  oinstall   449 Mar 28 15:28 listener.ora
-rwSr-----  1 oracle  oinstall  2048 Apr 16 12:19 orapwFREE
-rwxrwxrwx  1 oracle  oinstall   779 Mar 28 15:28 oratab
-rwxrwxrwx  1 oracle  oinstall  3584 Apr 16 12:17 spfileFREE.ora
-rwxrwxrwx  1 oracle  oinstall    69 Mar 28 15:28 sqlnet.ora
-rwxrwxrwx  1 oracle  oinstall   690 Mar 28 15:28 tnsnames.ora

Step 8 – Remove Container & Volume

Now we can remove the existing 23c Container and its associated volume.

docker rm 23c
docker volume prune -f

Step 9 – Recreate 23c Container

Finally, we recreate the container as in step 4. The only difference is the volume. We use now a local directory.

docker run -d \
    --name 23c \
    -p 1522:1521 \
    -v $HOME/docker/23c-data:/opt/oracle/oradata \
    container-registry.oracle.com/database/free

To monitor the console output we run the following command:

docker logs -tf 23c

Here’s the log output with timestamps. The database was up and running after 40 seconds.

2023-04-16T10:27:31.320723000Z Starting Oracle Net Listener.
2023-04-16T10:27:31.958206000Z Oracle Net Listener started.
2023-04-16T10:27:31.958391000Z Starting Oracle Database instance FREE.
2023-04-16T10:28:08.401153000Z Oracle Database instance FREE started.
2023-04-16T10:28:08.404032000Z 
2023-04-16T10:28:08.520502000Z The Oracle base remains unchanged with value /opt/oracle
2023-04-16T10:28:11.555842000Z #########################
2023-04-16T10:28:11.565430000Z DATABASE IS READY TO USE!
2023-04-16T10:28:11.565936000Z #########################
2023-04-16T10:28:11.670669000Z The following output is now a tail of the alert.log:
2023-04-16T10:28:11.697019000Z FREEPDB1(3):Opening pdb with Resource Manager plan: DEFAULT_PLAN
2023-04-16T10:28:11.697309000Z 2023-04-16T10:28:07.384593+00:00
2023-04-16T10:28:11.697423000Z Completed: Pluggable database FREEPDB1 opened read write 
2023-04-16T10:28:11.697460000Z 2023-04-16T10:28:07.573292+00:00
2023-04-16T10:28:11.697490000Z ===========================================================
2023-04-16T10:28:11.697518000Z Dumping current patch information
2023-04-16T10:28:11.697545000Z ===========================================================
2023-04-16T10:28:11.697581000Z No patches have been applied
2023-04-16T10:28:11.697606000Z ===========================================================
2023-04-16T10:28:11.697633000Z Completed: ALTER DATABASE OPEN

We can press Ctrl-C once we see this output.

Summary

Thanks to the prepared database files within the Docker image we can start an Oracle Database quite fast. Even on a Mac with an M-Series chip.  Mounting a folder as a volume needs a bit of fiddling. I like to mount volumes this way because they are easy to share and I can reset the underlying VM without losing data.

The runtime performance of the Database in a Colima container is still bad, compared to my Intel i7 Mac mini and the Oracle Database under Windows on ARM. Depending on the workload it is 5 to 10 times slower. Sometimes even more.  I do not see a big difference with or without Rosetta 2. That’s a bit disappointing. Maybe I do something wrong and someone can point me in the right direction. Microsoft showed what’s possible.

Anyway, I’m not sure if I’m going to use a database within Colima during my live demos. But I will definitely keep one ready as a backup.

Hopefully, Oracle will release an ARM version of their Database soon.

The post Oracle Database 23c on a Mac with an M-Series Chip appeared first on Philipp Salvisberg's Blog.

Sharing SQL Developer Connections #JoelKallmanDay

1. The Problem

I created a Docker image and a container for the Oracle Database 19c (19.19) for Linux ARM. The container contains ORDS, APEX and various sample schemas. Finally, an Oracle database that runs pretty fast on my Apple Silicon machine. Then I built a cold database clone by creating a container on another machine using a copy of the original Docker volume. So far, so good.

In SQL Developer I created a folder named odb190-localhost with 29 connections. Now I’d like to copy these SQL Developer connections to another machine. Easy, right? Export all connections to a JSON file and import it in the other SQL Developer instance. Yes, this works. But if you have several hundred connections like me, you get more than you want. The target installation might contain connection names with different properties. Therefore you do not want to import and overwrite existing connections and to identify and delete unwanted newly created connections after the import.

So, what can we do?

2. Some Possible Solutions

2.1. Export Only Wanted Connections

Exporting some chosen connections works technically. However, the export wizard does not support the concept of folders and presents a flat list of all connections. Selecting all connections in a folder is not feasible without sorting or filtering options in the UI.

Image may be NSFW.
Clik here to view.
SQL Developer export wizard

Finding all entries ending on odb190-localhost is no fun. It might work for a handful of connections, but not for 29.

2.2. Export Only One Template Connection

As an alternative, you can export just one connection and import it into the target SQL Developer installation. Use this connection as a kind of template. Click on the connection and select Properties in the context menu.

Image may be NSFW.
Clik here to view.
New / Select Database Connection in SQL Developer

In this window, you can change the Name, Username and Password and press Save. Repeat that process for all other connections. Either now or later when you need them.

Not really a satisfying solution either, right?

2.3. Export All Connections and Filter with VS Code

Exporting all connections is easy. Let’s look at the file in Visual Studio Code.

Image may be NSFW.
Clik here to view.
all-connections.json in VS Code (original)

This is a minified JSON. Let’s select Format Document from the context menu to make it human-readable.

Image may be NSFW.
Clik here to view.
all-connections.json in VS Code (formatted)

Ahh, much better.

Removing Unwanted info Objects

On line 11 we see the property NAV_FOLDER. We are only interested in connections with the value odb190-localhost. Is there a way to delete all unwanted entries? Yes, there is. By using jq, a command line utility to process JSON files. There is also an extension for VS Code named jq-vscode, which simplifies the development of a jq command.

Here’s the jq command to remove all unwanted info objects from the connections array:

del(
    .connections[] 
    | select(.info.NAV_FOLDER!="odb190-localhost")
)

In VS Code you can open a split window via the command JQ: Open a new file to exec jq. There you can enter the jq command.

Image may be NSFW.
Clik here to view.
all-connections.json filtered with jq-vscode

In the lower part of the screen, you can see the result of the jq command. Copy the output and save it to a file to be imported into any SQL Developer installation.

2.4. Export All Connections and Filter with JQ CLI

This is basically the same solution as before. The only difference is that we use the jq command-line tool. jq is available on all major platforms (Linux, Windows, macOS). Download it from here or install it directly with your OS package installer (e.g. apt, yum, dnf, brew, …).

I exported all connections to a file named all-connections.json from SQL Developer.

With the following command, I produce the file odb190-localhost-connections.json that contains only the connections in the folder odb190-localhost:

jq 'del(.connections[] | select(.info.NAV_FOLDER!="odb190-localhost"))' \
    all-connections.json > odb190-localhost-connections.json

The file odb190-localhost-connections.json is ready to be shared and imported into other SQL Developer installations.

3. Conclusion

There are other solutions, not covered in this blog post, to create a filtered connection export from SQL Developer. For example, writing a dedicated SQL Developer extension. Or filtering an export file with another tool. Or processing a JSON export file in the database.

However, IMO jq is an excellent tool to process JSON data in shell scripts. Perfect for automation. In this case, it makes sharing a subset of your SQL Developer connections easier. For any filter criteria, you want to apply.

The post Sharing SQL Developer Connections #JoelKallmanDay appeared first on Philipp Salvisberg's Blog.

MLE TypeScript & JavaScript Modules

Introduction

The Oracle Database 23c supports MLE JavaScript modules. MLE modules are standard ECMAScript 2022 modules. The easiest way to develop such modules is outside the database. This allows us to take advantage of the extensive JavaScript ecosystem and develop MLE modules in TypeScript instead of JavaScript.

In this blog post, I demonstrate how to develop and test an MLE module in TypeScript and deploy it into the database. I will use Node and VS Code.

TL;DR

See the conclusion and explore the code on GitHub.

Requirements

The idea is to provide a public stored procedure in the database that creates and populates the well-known tables dept and emp in the current schema. The function accepts different table names. The tables are not re-created if they already exist. However, the original data should be reset to their initial state while other rows should be left unchanged. Problems are reported via exceptions.

Nothing fancy. However, we will have to deal with SQL and address potential SQL injection vulnerabilities. Furthermore, it allows us to demonstrate how to test an MLE module outside the database.

Design

The following image visualizes the solution design.

Image may be NSFW.
Clik here to view.
Solution design of the demo application

We create an Oracle Database user demotab and deploy an npm and a self-made module as MLE modules into this schema, along with an MLE environment and a PL/SQL package as an interface. We grant the package to public and create a public synonym for it. As a result, any user in the database instance (e.g. the user otheruser) can execute the following code to install and populate the tables dept and emp within their schema.

begin
   demo.create_tabs;
end;
/

We can also pass alternative table names like this:

begin
   demo.create_tabs('departments', 'employees');
end;
/

Prerequisites

You need the following to build this MLE module yourself:

  • Full access to an Oracle Database 23c Free (>=23.3). This means you know how to connect as sysdba.
  • A machine with VS Code (>=1.83.1), Node (>=20.9.0) and SQLcl (>=23.3.0, must be found in the OS path).
  • Internet access and the rights to install npm modules and VS Code extensions.

Prepare Node Project

Open a folder in VS Code where you want to develop the MLE module. And create the files package.json, tsconfig.json, .eslintrc and .prettier. The content of each file is shown below.

package.json
{
    "name": "demotab",
    "version": "1.0.0",
    "description": "Create and populate demo tables.",
    "type": "module",
    "scripts": {
        "build": "npm run format && npm run lint && npm run tsc && npm run coverage",
        "tsc": "tsc --project tsconfig.json",
        "lint": "eslint . --ext .ts",
        "format": "prettier --write './**/*{.ts,.eslintrc,.prettierrc,.json}'",
        "test": "vitest --no-threads --reporter=verbose --dir ./test",
        "coverage": "vitest --no-threads --dir ./test run --coverage"
    },
    "devDependencies": {
        "@types/oracledb": "^6.0.3",
        "@typescript-eslint/eslint-plugin": "^6.9.1",
        "@typescript-eslint/parser": "^6.9.1",
        "@vitest/coverage-v8": "^0.34.6",
        "eslint": "^8.52.0",
        "eslint-config-prettier": "^9.0.0",
        "oracledb": "^6.2.0",
        "prettier": "^3.0.3",
        "typescript": "^5.2.2",
        "vitest": "^0.34.6"
    },
    "dependencies": {
        "sql-assert": "^1.0.3"
    }
}

Node uses some of these JSON fields. However, most of the fields are required by npm and its command-line interface. The type on line 5 defines that we build an ECMAScript module. Important are the dependencies. Our module needs the sql-assert module at runtime. Other dependencies are for developing purposes only.

tsconfig.json
{
    "compilerOptions": {
        "rootDir": "./src",
        "target": "ES2017",
        "module": "ES2022",
        "moduleResolution": "node",
        "esModuleInterop": true,
        "forceConsistentCasingInFileNames": true,
        "strict": true,
        "skipLibCheck": true,
        "sourceMap": true,
        "outDir": "esm"
    },
    "include": ["./src/**/*"]
}

This is the configuration for the TypeScript compiler. On lines 5 and 6 we define the ECMAScript versions to be used. We develop in TypeScript with ECMAScript 2022 features and generate a JavaScript file using ECMAScript 2017, the version in which the async/await feature was introduced. This makes the generated JavaScript module a bit more readable. However, for MLE we could also use ECMAScript 2022. Using older ECMAScript targets makes sense when you want to use it in environments with an older JavaScript engine, for example, old browsers.

.eslintrc
{
    "root": true,
    "parser": "@typescript-eslint/parser",
    "plugins": ["@typescript-eslint"],
    "extends": [
        "eslint:recommended",
        "plugin:@typescript-eslint/eslint-recommended",
        "plugin:@typescript-eslint/recommended",
        "prettier"
    ],
    "rules": {
        "no-console": "error",
        "@typescript-eslint/no-explicit-any": "off"
    }
}

Here we define the configuration for ESlint. We configure the linter for TypeScript with a recommended rule set. However, we do not want console.log statements in our code. Therefore we treat them as errors. Furthermore, we allow the explicit use of the any data type. We need that for the global variable session provided by the MLE, which contains the connection object for a database session. More on this later.

.prettierrc
{
    "semi": true,
    "printWidth": 120,
    "singleQuote": false,
    "tabWidth": 4,
    "trailingComma": "none",
    "arrowParens": "always"
}

The last configuration file is for Prettier, a popular formatter for various languages.

Initialize Node Project

Now we are ready to initialize the Node project. Open a terminal window in VS Code and execute the following command:

npm install

This will create a file named package-lock.json and a node_modules folder. package-lock.json is the table of contents for the node_modules folder. It contains the recursively resolved dependencies with their versions. Dependencies can be defined with version ranges. Therefore, they are not unambiguous and can lead to different results depending on the time of analysis.

When you delete the node_modules folder and re-run npm install, it will produce the same content based on the module versions registered in package-lock.json. As a result, it might be useful to add this file to your version control system. To make things reproducible.

Original TypeScript Module

Let’s create a file named demotab.ts in a new folder src with the following content:

src/demotab.ts
import { simpleSqlName } from "sql-assert";

// global variable for default connection in the database
declare const session: any;

/**
 * Creates demo tables with initial data for the well-known tables `dept` and `emp`.
 * Alternative table names can be passed to this function. The tables are not re-created
 * if they already exist. However, the rows for the 4 departments and the 14 employees
 * should be reset to their initial state while other rows are left unchanged.
 * Problems are reported via exceptions.
 *
 * @param [deptName="dept"] name of the dept table.
 * @param [empName="emp"] name of the emp table.
 * @returns {Promise<void>}.
 */
export async function create(deptName: string = "dept", empName: string = "emp"): Promise<void> {
    const dept = simpleSqlName(deptName);
    const emp = simpleSqlName(empName);
    await session.execute(`
        create table if not exists ${dept} (
           deptno number(2, 0)      not null constraint ${dept}_pk primary key,
           dname  varchar2(14 char) not null,
           loc    varchar2(13 char) not null
        )
    `);
    await session.execute(`
        merge into ${dept} t
        using (values 
                 (10, 'ACCOUNTING', 'NEW YORK'),
                 (20, 'RESEARCH',   'DALLAS'),
                 (30, 'SALES',      'CHICAGO'),
                 (40, 'OPERATIONS', 'BOSTON')
              ) s (deptno, dname, loc)
           on (t.deptno = s.deptno)
         when matched then
              update
                 set t.dname = s.dname,
                     t.loc = s.loc
         when not matched then
              insert (t.deptno, t.dname, t.loc)
              values (s.deptno, s.dname, s.loc)
    `);
    await session.execute(`
        create table if not exists ${emp} (
            empno    number(4, 0)      not null  constraint ${emp}_pk primary key,
            ename    varchar2(10 char) not null,
            job      varchar2(9 char)  not null,
            mgr      number(4, 0)                constraint ${emp}_mgr_fk references ${emp},
            hiredate date              not null,
            sal      number(7, 2),
            comm     number(7, 2),
            deptno   number(2, 0)      not null  constraint ${emp}_deptno_fk references ${dept}
        )
    `);
    await session.execute(`create index if not exists ${emp}_mgr_fk_i on ${emp} (mgr)`);
    await session.execute(`create index if not exists ${emp}_deptno_fk_i on ${emp} (deptno)`);
    await session.execute(`alter table ${emp} disable constraint ${emp}_mgr_fk`);
    await session.execute(`
        merge into ${emp} t
        using (values
                 (7839, 'KING',   'PRESIDENT', null, date '1981-11-17', 5000, null, 10),
                 (7566, 'JONES',  'MANAGER',   7839, date '1981-04-02', 2975, null, 20),
                 (7698, 'BLAKE',  'MANAGER',   7839, date '1981-05-01', 2850, null, 30),
                 (7782, 'CLARK',  'MANAGER',   7839, date '1981-06-09', 2450, null, 10),
                 (7788, 'SCOTT',  'ANALYST',   7566, date '1987-04-19', 3000, null, 20),
                 (7902, 'FORD',   'ANALYST',   7566, date '1981-12-03', 3000, null, 20),
                 (7499, 'ALLEN',  'SALESMAN',  7698, date '1981-02-20', 1600,  300, 30),
                 (7521, 'WARD',   'SALESMAN',  7698, date '1981-02-22', 1250,  500, 30),
                 (7654, 'MARTIN', 'SALESMAN',  7698, date '1981-09-28', 1250, 1400, 30),
                 (7844, 'TURNER', 'SALESMAN',  7698, date '1981-09-08', 1500,    0, 30),
                 (7900, 'JAMES',  'CLERK',     7698, date '1981-12-03',  950, null, 30),
                 (7934, 'MILLER', 'CLERK',     7782, date '1982-01-23', 1300, null, 10),
                 (7369, 'SMITH',  'CLERK',     7902, date '1980-12-17',  800, null, 20),
                 (7876, 'ADAMS',  'CLERK',     7788, date '1987-05-23', 1100, null, 20)                        
              ) s (empno, ename, job, mgr, hiredate, sal, comm, deptno)
           on (t.empno = s.empno)
         when matched then
              update
                 set t.ename = s.ename,
                     t.job = s.job,
                     t.mgr = s.mgr,
                     t.hiredate = s.hiredate,
                     t.sal = s.sal,
                     t.comm = s.comm,
                     t.deptno = s.deptno
         when not matched then
              insert (t.empno, t.ename, t.job, t.mgr, t.hiredate, t.sal, t.comm, t.deptno)
              values (s.empno, s.ename, s.job, s.mgr, s.hiredate, s.sal, s.comm, s.deptno)
    `);
    await session.execute(`alter table ${emp} enable constraint ${emp}_mgr_fk`);
}
Global session Variable

On line 4 we declare a constant named session. This way we tell the TypeScript compiler that a read-only session object is available. We would also like to define the correct type. Unfortunately, the node-oracledb module does not provide TypeScript definitions and the mle-js-oracledb module comes with a definition for IConnection which is synchronous and not asynchronous as in node-oracledb. As a result, the TypeScript compiler complains about await being not necessary. This is the reason why the session type is any in this file. This works, TypeScript does not complain anymore, but as a result, there is no code completion available for the session. And as mentioned before we need to tell ESLint that this is okay.

Asynchronous Function

On line 17 we define the signature of the create function. Please note that this is an asynchronous function. We need that to use await in Node. However, functions in MLE modules run always synchronously within the Oracle Database, even if a function is declared as async. So, async would not be necessary if the code runs exclusively in the database.

Preventing SQL Injection

We call simpleSqlName on lines 18 and 19 to ensure that no SQL injection is possible. This makes the variables dept and emp in the template literals safe. The function simpleSqlName has the advantage that it runs outside of the database. It has the same logic as its sibling dbms_assert.simple_sql_name.

Generated JavaScript Module

We run the TypeScript compiler as follows in a terminal window within VS Code:

npm run tsc

This will execute tsc --project tsconfig.json as defined in package.json and produce a demotab.js file in the esm folder.

esm/demotab.js
import { simpleSqlName } from "sql-assert";
/**
 * Creates demo tables with initial data for the well-known tables `dept` and `emp`.
 * Alternative table names can be passed to this function. The tables are not re-created
 * if they already exist. However, the rows for the 4 departments and the 14 employees
 * should be reset to their initial state while other rows are left unchanged.
 * Problems are reported via exceptions.
 *
 * @param [deptName="dept"] name of the dept table.
 * @param [empName="emp"] name of the emp table.
 * @returns {Promise<void>}.
 */
export async function create(deptName = "dept", empName = "emp") {
    const dept = simpleSqlName(deptName);
    const emp = simpleSqlName(empName);
    await session.execute(`
        create table if not exists ${dept} (
           deptno number(2, 0)      not null constraint ${dept}_pk primary key,
           dname  varchar2(14 char) not null,
           loc    varchar2(13 char) not null
        )
    `);
    await session.execute(`
        merge into ${dept} t
        using (values 
                 (10, 'ACCOUNTING', 'NEW YORK'),
                 (20, 'RESEARCH',   'DALLAS'),
                 (30, 'SALES',      'CHICAGO'),
                 (40, 'OPERATIONS', 'BOSTON')
              ) s (deptno, dname, loc)
           on (t.deptno = s.deptno)
         when matched then
              update
                 set t.dname = s.dname,
                     t.loc = s.loc
         when not matched then
              insert (t.deptno, t.dname, t.loc)
              values (s.deptno, s.dname, s.loc)
    `);
    await session.execute(`
        create table if not exists ${emp} (
            empno    number(4, 0)      not null  constraint ${emp}_pk primary key,
            ename    varchar2(10 char) not null,
            job      varchar2(9 char)  not null,
            mgr      number(4, 0)                constraint ${emp}_mgr_fk references ${emp},
            hiredate date              not null,
            sal      number(7, 2),
            comm     number(7, 2),
            deptno   number(2, 0)      not null  constraint ${emp}_deptno_fk references ${dept}
        )
    `);
    await session.execute(`create index if not exists ${emp}_mgr_fk_i on ${emp} (mgr)`);
    await session.execute(`create index if not exists ${emp}_deptno_fk_i on ${emp} (deptno)`);
    await session.execute(`alter table ${emp} disable constraint ${emp}_mgr_fk`);
    await session.execute(`
        merge into ${emp} t
        using (values
                 (7839, 'KING',   'PRESIDENT', null, date '1981-11-17', 5000, null, 10),
                 (7566, 'JONES',  'MANAGER',   7839, date '1981-04-02', 2975, null, 20),
                 (7698, 'BLAKE',  'MANAGER',   7839, date '1981-05-01', 2850, null, 30),
                 (7782, 'CLARK',  'MANAGER',   7839, date '1981-06-09', 2450, null, 10),
                 (7788, 'SCOTT',  'ANALYST',   7566, date '1987-04-19', 3000, null, 20),
                 (7902, 'FORD',   'ANALYST',   7566, date '1981-12-03', 3000, null, 20),
                 (7499, 'ALLEN',  'SALESMAN',  7698, date '1981-02-20', 1600,  300, 30),
                 (7521, 'WARD',   'SALESMAN',  7698, date '1981-02-22', 1250,  500, 30),
                 (7654, 'MARTIN', 'SALESMAN',  7698, date '1981-09-28', 1250, 1400, 30),
                 (7844, 'TURNER', 'SALESMAN',  7698, date '1981-09-08', 1500,    0, 30),
                 (7900, 'JAMES',  'CLERK',     7698, date '1981-12-03',  950, null, 30),
                 (7934, 'MILLER', 'CLERK',     7782, date '1982-01-23', 1300, null, 10),
                 (7369, 'SMITH',  'CLERK',     7902, date '1980-12-17',  800, null, 20),
                 (7876, 'ADAMS',  'CLERK',     7788, date '1987-05-23', 1100, null, 20)                        
              ) s (empno, ename, job, mgr, hiredate, sal, comm, deptno)
           on (t.empno = s.empno)
         when matched then
              update
                 set t.ename = s.ename,
                     t.job = s.job,
                     t.mgr = s.mgr,
                     t.hiredate = s.hiredate,
                     t.sal = s.sal,
                     t.comm = s.comm,
                     t.deptno = s.deptno
         when not matched then
              insert (t.empno, t.ename, t.job, t.mgr, t.hiredate, t.sal, t.comm, t.deptno)
              values (s.empno, s.ename, s.job, s.mgr, s.hiredate, s.sal, s.comm, s.deptno)
    `);
    await session.execute(`alter table ${emp} enable constraint ${emp}_mgr_fk`);
}
//# sourceMappingURL=demotab.js.map

As you see on line 13, all type definitions are gone. Besides that, the file looks very much like its TypeScript pendant.

On line 89 there’s a comment mentioning a map file. This map file was also generated by the TypeScript compiler. It improves the developer experience during a debugging session so that the developer can work on the original TypeScript files. The JavaScript files are only used behind the scenes.

Testing

1. Framework

I decided to use Vitest for this project. Why not Jest or Mocha?

I tried Mocha with a plain JavaScript project. It felt a bit outdated and I did not like the fact that I had to opt-in for an assertion library. IMO this should be part of the framework. It’s too much freedom. Too many unnecessary variants when googling for solutions.

Jest is a full-fletched and very popular testing framework. It would have been a natural choice. However, I stumbled over Vitest with a Jest-compatible API which claims to be faster and easier to use with TypeScript. So I thought to give it a try.

2. Database Configuration

We create a file named dbconfig.ts in a new folder test with the following content:

test/dbconfig.ts
import oracledb from "oracledb";

let sysSession: oracledb.Connection;
export let demotabSession: oracledb.Connection;
export let otheruserSession: oracledb.Connection;

const connectString = "192.168.1.8:51007/freepdb1";

const sysConfig: oracledb.ConnectionAttributes = {
    user: "sys",
    password: "oracle",
    connectString: connectString,
    privilege: oracledb.SYSDBA
};

export const demotabConfig: oracledb.ConnectionAttributes = {
    user: "demotab",
    password: "demotab",
    connectString: connectString
};

export const otheruserConfig: oracledb.ConnectionAttributes = {
    user: "otheruser",
    password: "otheruser",
    connectString: connectString
};

export async function createSessions(): Promise<void> {
    sysSession = await oracledb.getConnection(sysConfig);
    await createUser(demotabConfig);
    await createUser(otheruserConfig);
    await sysSession.execute("grant create public synonym to demotab");
    await sysSession.execute("grant execute on javascript to public");
    sysSession.close();
    demotabSession = await oracledb.getConnection(demotabConfig);
    otheruserSession = await oracledb.getConnection(otheruserConfig);
}

async function createUser(config: oracledb.ConnectionAttributes): Promise<void> {
    await sysSession.execute(`drop user if exists ${config.user} cascade`);
    await sysSession.execute(`
        create user ${config.user} identified by ${config.password}
           default tablespace users
           temporary tablespace temp
           quota 1m on users
    `);
    await sysSession.execute(`grant db_developer_role to ${config.user}`);
}

export async function closeSessions(): Promise<void> {
    await demotabSession?.close();
    await otheruserSession?.close();
}

To make the configuration work in your environment, you need to change the lines 7 and 11. The connect string and the password of the Oracle user sys. Everything else could be left “as is”.

This module creates the database users demotab and otheruser and manages database sessions.

3. Test TypeScript Module Outside of the Database

We create a file named demotab.test.ts in the folder test with the following content:

test/demotab.test.ts
import { beforeAll, afterAll, describe, it, expect } from "vitest";
import { createSessions, closeSessions, demotabSession } from "./dbconfig";
import { create } from "../src/demotab";

describe("TypeScript outside of the database", () => {
    const timeout = 10000;

    beforeAll(async () => {
        await createSessions();
        global.session = demotabSession;
    });

    describe("invalid input causing 'Invalid SQL name.'", () => {
        // error is thrown in JavaScript (no ORA-04161 outside of the database)
        it("should throw an error with invalid deptName", () => {
            expect(async () => await create("a-dept-table")).rejects.toThrowError(/invalid sql/i);
        });
        it("should throw an error with invalid empName", () => {
            expect(async () => await create("dept", "a-emp-table")).rejects.toThrowError(/invalid sql/i);
        });
    });

    describe("invalid input causing 'ORA-00911: _: invalid character after <identifier>'", () => {
        // error is thrown by the Oracle Database while trying to execute a SQL statement
        it("should throw an error with quoted deptName", () => {
            expect(async () => await create('"dept"')).rejects.toThrowError(/ORA-00911.+invalid/);
        });
        it("should throw an error with quoted empName", () => {
            expect(async () => await create("dept", '"emp"')).rejects.toThrowError(/ORA-00911.+invalid/);
        });
    });

    describe(
        "valid input",
        () => {
            it("should create 'dept' and 'emp' without parameters)", async () => {
                await create();
                const dept = await demotabSession.execute("select * from dept order by deptno");
                expect(dept.rows).toEqual([
                    [10, "ACCOUNTING", "NEW YORK"],
                    [20, "RESEARCH", "DALLAS"],
                    [30, "SALES", "CHICAGO"],
                    [40, "OPERATIONS", "BOSTON"]
                ]);
                const emp = await demotabSession.execute(`
                    select empno, ename, job, mgr, to_char(hiredate,'YYYY-MM-DD'), sal, comm, deptno 
                    from emp 
                    order by empno
                `);
                expect(emp.rows).toEqual([
                    [7369, "SMITH", "CLERK", 7902, "1980-12-17", 800, null, 20],
                    [7499, "ALLEN", "SALESMAN", 7698, "1981-02-20", 1600, 300, 30],
                    [7521, "WARD", "SALESMAN", 7698, "1981-02-22", 1250, 500, 30],
                    [7566, "JONES", "MANAGER", 7839, "1981-04-02", 2975, null, 20],
                    [7654, "MARTIN", "SALESMAN", 7698, "1981-09-28", 1250, 1400, 30],
                    [7698, "BLAKE", "MANAGER", 7839, "1981-05-01", 2850, null, 30],
                    [7782, "CLARK", "MANAGER", 7839, "1981-06-09", 2450, null, 10],
                    [7788, "SCOTT", "ANALYST", 7566, "1987-04-19", 3000, null, 20],
                    [7839, "KING", "PRESIDENT", null, "1981-11-17", 5000, null, 10],
                    [7844, "TURNER", "SALESMAN", 7698, "1981-09-08", 1500, 0, 30],
                    [7876, "ADAMS", "CLERK", 7788, "1987-05-23", 1100, null, 20],
                    [7900, "JAMES", "CLERK", 7698, "1981-12-03", 950, null, 30],
                    [7902, "FORD", "ANALYST", 7566, "1981-12-03", 3000, null, 20],
                    [7934, "MILLER", "CLERK", 7782, "1982-01-23", 1300, null, 10]
                ]);
            });
            it("should create 'dept2' and 'emp2' with both parameters)", async () => {
                await create("dept2", "emp2");
                const dept = await demotabSession.execute("select * from dept minus select * from dept2");
                expect(dept.rows).toEqual([]);
                const emp = await demotabSession.execute("select * from emp minus select * from emp2");
                expect(emp.rows).toEqual([]);
            });
            it("should fix data in 'dept' and 'emp' after changing data and using default parameters", async () => {
                await demotabSession.execute(`
                    begin
                        delete dept where deptno = 40;
                        update dept set loc = initcap(loc);
                        insert into dept(deptno, dname, loc) values(50, 'utPLSQL', 'Winterthur');
                        delete emp where empno = 7876;
                        update emp set sal = sal * 2;
                        insert into emp(empno, ename, job, hiredate, sal, deptno)
                        values (4242, 'Salvisberg', 'Tester', date '2000-01-01', 9999, '50');
                    end;
                `);
                await create();
                const dept = await demotabSession.execute("select * from dept minus select * from dept2");
                expect(dept.rows).toEqual([[50, "utPLSQL", "Winterthur"]]);
                const emp = await demotabSession.execute(`
                    select empno, ename, job, mgr, to_char(hiredate,'YYYY-MM-DD'), sal, comm, deptno 
                    from emp 
                    minus 
                    select empno, ename, job, mgr, to_char(hiredate,'YYYY-MM-DD'), sal, comm, deptno
                    from emp2
                `);
                expect(emp.rows).toEqual([[4242, "Salvisberg", "Tester", null, "2000-01-01", 9999, null, 50]]);
            });
        },
        timeout
    );

    afterAll(async () => {
        await closeSessions();
    });
});
Test Suite

The main test suite starts on line 5 and ends on line 105. The Vitest configuration enforces serial execution. As a result, the tests are executed according to their order in the file.

Global session Variable

On line 11 we initialize the global variable session with a database session to the Oracle user demotab. We use this global variable in the function create. See demotab.ts.

Test Case – Assertions

Look at line 36. It looks similar to the English sentence “it should create ‘dept’ and ’emp’ without parameters”. That’s why the testing framework provides the alias it for the function test. This notation leads to test names that are easier to understand in the code and other contexts where the it is not shown, during test execution, for example.

On line 37 we call the create function without parameters. We read the content of the table dept into a variable dept on line 38. And finally on lines 39 to 44 we assert if the 4 expected rows are stored in the table dept.

A difference between the expected and actual results would be reported like this. I changed the expected output in the code to produce this result:

Image may be NSFW.
Clik here to view.
Failed test: example of how difference between expected and actual results are visualized in VS Code's terminal window

4. Run All Tests

To run all tests open a terminal window in VS Code and execute the following command:

npm run test

This will produce an output similar to this:

Image may be NSFW.
Clik here to view.
Console output of "npm run test" for demotab.test.ts

5. Build with Test Coverage

Open a terminal window in VS Code and execute the following to format, lint and compile the code, and run all tests with a code coverage report:

npm run build

This will produce an output similar to this:

Image may be NSFW.
Clik here to view.
Console output of "npm run build"

Deployment

We tested the module successfully outside of the database. Now it’s time to deploy it into the database. For that, we create the SQL script deploy.sql in the root folder of our project with the following content:

deploy.sql
set define off
script
var url = new java.net.URL("https://esm.run/sql-assert@1.0.3");
var content = new java.lang.String(url.openStream().readAllBytes(),
                  java.nio.charset.StandardCharsets.UTF_8);
var script = 'create or replace mle module sql_assert_mod '
               + 'language javascript as ' + '\n'
               + content + "\n"
               + '/' + "\n";
sqlcl.setStmt(script);
sqlcl.run();
/

script
var path = java.nio.file.Path.of("./esm/demotab.js");
var content = java.nio.file.Files.readString(path);
var script = 'create or replace mle module demotab_mod '
               + 'language javascript as ' + '\n'
               + content + "\n"
               + '/' + "\n";
sqlcl.setStmt(script);
sqlcl.run();
/

create or replace mle env demotab_env
   imports('sql-assert' module sql_assert_mod)
   language options 'js.strict=true, js.console=false, js.polyglot-builtin=true';

create or replace package demo authid current_user is
   procedure create_tabs as 
   mle module demotab_mod env demotab_env signature 'create()';

   procedure create_tabs(
      in_dept_table_name in varchar2
   ) as mle module demotab_mod env demotab_env signature 'create(string)';

   procedure create_tabs(
      in_dept_table_name in varchar2,
      in_emp_table_name  in varchar2
   ) as mle module demotab_mod env demotab_env signature 'create(string, string)';
end demo;
/

-- required "execute on javascript" was granted to public in test
grant execute on demo to public;
create or replace public synonym demo for demotab.demo;

exit
npm Module sql-assert (MLE Module sql_assert_mod)

On lines 2-12, we load version 1.0.3 of the npm module sql-assert as MLE module sql_assert_mod into the database. We dynamically build a create or replace mle module statement and execute it with the help of SQLcl’s script command.

The URL https://esm.run/sql-assert@1.0.3 provides a minimized file of the npm module. In other words, it is optimized for use in browsers where the modules are loaded over the network at runtime.

Minimized code works in the database, of course. However, it might make it a bit harder to understand the error stack.

No Template Literals?

You might wonder why we do not use ECMAScript template literals to populate the script variable. The reason is, that SQLcl does not provide a JavaScript engine. It relies on the JDK’s JavaScript engine. Unfortunately, the Nashorn JavaScript engine is decommissioned in current JDKs. The last JDK with a JavaScript engine is JDK 11, based on ECMAScript 2011 (5.1), which does not support Template Literals.

The GraalVM JDK is an exception. Versions 17 and 21 come with a current GraalVM JavaScript engine that supports Template Literals. And this JDK can be used with SQLcl.

However, there is an additional reason to avoid JavaScript features introduced after ECMAScript 2011 and that’s SQL Developer. You can run the SQL script deploy.sql also in an SQL Developer worksheet. SQL Developer requires a JDK 11. You cannot use a newer JDK in SQL Developer, because you would lose some important features such as Real Time SQL Monitor which requires JavaFX. Another decommissioned component in the JDK. And the GraalVM JDK does not provide JavaFX.

So for compatibility reasons, we have to stick to old JavaScript features available in ECMAScript 2011 when using the script command in SQLcl or SQL Developer.

Local Module demotab (MLE Module demotab_mod)

On lines 14-23, we load the JavaScript MLE module demotab from our local file system into the database. The process is similar to the npm module. The only difference is that we get the module from the local disk and not over the network.

MLE Environment demotab_env

On lines 25-27, we create an MLE environment. Besides configuring compiler options, we tell the JavaScript compiler what modules are available and where to find them.

PL/SQL Call Interface

On lines 29-42, we create a PL/SQL package demo. It contains three procedures with call specifications for the function create in the MLE module demotab_mod. Why three procedures and not just one? Because the MLE call specifications do not support default values for parameters. However, we can work around it by providing three procedures. One without parameters, one with a single parameter and another one with two parameters.

Test JavaScript MLE Module within the Database

To test if the deployed code works we create the file mle-demotab.test.ts in the folder test with the following content:

test/mle-demotab.test.ts
import { beforeAll, afterAll, describe, it, expect, beforeEach } from "vitest";
import { createSessions, closeSessions, otheruserSession, demotabSession, demotabConfig } from "./dbconfig";
import oracledb from "oracledb";
import { exec } from "child_process";
import util from "node:util";

describe("MLE JavaScript module within the database", () => {
    const timeout = 15000;

    async function userTables(): Promise<oracledb.Result<unknown>> {
        return await otheruserSession.execute(`
            with
               function num_rows(in_table_name in varchar2) return integer is
                  l_rows integer;
               begin
                  execute immediate 'select count(*) from ' || in_table_name 
                     into l_rows;
                  return l_rows;
               end;
            select table_name, num_rows(table_name) as num_rows
              from user_tables
             order by table_name
        `);
    }

    beforeAll(async () => {
        await createSessions();
        const execAsync = util.promisify(exec);
        await execAsync(
            `sql -S ${demotabConfig.user}/${demotabConfig.password}@${demotabConfig.connectString} @deploy.sql`
        );
    }, timeout);

    beforeEach(async () => {
        await otheruserSession.execute(`
            begin
               for r in (select table_name from user_tables) loop
                  execute immediate 'drop table ' 
                     || r.table_name
                     || ' cascade constraints purge';
               end loop;
            end;
        `);
    });

    describe("deployment", () => {
        it("should have valid database objects in demotab user", async () => {
            const mods = await demotabSession.execute(`
                select object_type, object_name, status 
                  from user_objects 
                 order by object_type, object_name
            `);
            expect(mods.rows).toEqual([
                ["MLE ENVIRONMENT", "DEMOTAB_ENV", "VALID"],
                ["MLE MODULE", "DEMOTAB_MOD", "VALID"],
                ["MLE MODULE", "SQL_ASSERT_MOD", "VALID"],
                ["PACKAGE", "DEMO", "VALID"]
            ]);
        });
    });

    describe("run MLE module from otheruser", () => {
        it("should create 'dept' and 'emp' without parameters", async () => {
            await otheruserSession.execute("begin demo.create_tabs; end;");
            expect((await userTables()).rows).toEqual([
                ["DEPT", 4],
                ["EMP", 14]
            ]);
        });
        it("should create 'd' and 'emp' with first parameter only", async () => {
            await otheruserSession.execute("begin demo.create_tabs('d'); end;");
            expect((await userTables()).rows).toEqual([
                ["D", 4],
                ["EMP", 14]
            ]);
        });
        it("should create 'd' and 'e' with both parameters", async () => {
            await otheruserSession.execute("begin demo.create_tabs('d', 'e'); end;");
            expect((await userTables()).rows).toEqual([
                ["D", 4],
                ["E", 14]
            ]);
        });
    });

    afterAll(async () => {
        await closeSessions();
    });
});

On line 30 we run the SQL script deploy.sql with SQLcl. We connect as demotab with the credentials and connect string configured in dbconfig.ts.

We test the default PL/SQL call interface on lines 63-69 by executing begin demo.create_tabs; end;. Then we check the number of rows in the tables dept and emp. That’s enough. We do not need to repeat the tests of the demotab module since the module was already successfully tested.

Re-Run All Tests

To re-run all tests open a terminal window in VS Code and execute the following command:

npm run test

This will produce an output similar to this:

Image may be NSFW.
Clik here to view.
Console output of "npm run test" for mle-demotab.test.ts and demotab.test.ts

Conclusion

For years I’ve been advocating file-based database development. With moderate success. All of my customers are using a version control system and automated deployments. However, the way the files in the version control system are maintained is suboptimal. In most cases, the developers use an IDE such as SQL Developer or PL/SQL Developer to read the source code from the database, change it in the editor of the IDE and then save (=deploy) it in the database. Updating the files in the version control system is a postponed, sometimes half-automated task. This leads to all kinds of bugs detected in the CI (or in later stages) which should have been detected during development. Sometimes code changes are lost, for example, when the underlying database instance has been replaced by a newer clone.

Why is it so hard to change the behaviour of the database developers? Changing the files first and then deploying them into the database? One reason is that the IDEs do not support the file-based development process well enough. They favour the let-us-read-everything-from-the-database approach, which makes sense for application data but is not ideal for code.

The MLE is not supported by the current IDEs. Oracle Database Actions (SQL Developer Web) is an exception, it provides basic support for MLE. However, I guess it will take years until a reasonable functionality is provided, if at all.

So when we want to develop MLE modules efficiently we have to use the currently available IDEs for TypeScript or JavaScript. They are great. Excellent editor, VCS integration, testing tools, debugger, formatter, linter and packaging system. The ecosystem is mature and is constantly improving. I very much like the fact that we have a global module registry npm which supports also private modules. As a result, being forced to use this ecosystem is not a bad thing. Quite the contrary. It’s the best that could have happened to database development.

When I look at the code of this MLE demo module, I’m quite happy with it. I’m confident that this approach can be used on a larger scale.

IMO the MLE is the best thing that happened to the Oracle Database since version 7, which brought us PL/SQL.

Let’s find out what works and what should be improved.


Updated on 2023-11-03, using npm install to initialize the Node project; using var instead of const in deploy.sql to make it compatible with ECMAScript 2011 (ES 5.1).

The post MLE TypeScript & JavaScript Modules appeared first on Philipp Salvisberg's Blog.

Installing MLE Modules in the Oracle Database

Introduction

In my previous blog post, I’ve shown how you can deploy an npm module from a URL and a custom ESM module from a local file into a remote Oracle Database 23c using JavaScript and SQLcl. This works well. However, for two MLE modules, I had to write 22 lines of code with duplications. I do not like that. I have therefore developed a small SQLcl custom command that greatly simplifies the installation of MLE modules in the Oracle Database.

Installing the mle Custom Command

Start SQLcl and run the following command to install the custom command mle in the current session.

script https://raw.githubusercontent.com/PhilippSalvisberg/mle-sqlcl/main/mle.js register

The SQLcl script command can read a JavaScript file from the local file system or from a URL. If you feel uncomfortable running JavaScript files directly from a URL, you can have a look at the GitHub repo first and download and run a chosen version of the script from the local file system.

The register subcommand registers the mle script as an SQLcl command. However, this registration is not permanent. It will be lost after closing SQLcl. To make the custom command available in every new SQLcl session add the command above to your login.sql or startup.sql. SQLcl executes these files on start-up or after establishing a new connection. Just make sure that you have configured the SQLPATH environment variable accordingly.

Providing Help

Now you can run the mle command like this:

mle

Without parameters, an error message is displayed along with information on how to use this command.

Image may be NSFW.
Clik here to view.
Output of mle command without parameters

Installing Validator

Now we know the syntax to install an MLE module from a URL. However, what’s the URL for an npm module? We use the free OpenSource CDN jsDelivr for that. They provide a service returning an npm module as an ECMAScript module. The result can be used in a browser and also in an Oracle Database. The URL looks like this:

https://esm.run/npm-package-name@npm-package-version

We want to install the current version 13.11.0 of the npm module validator. Based on the information provided above our SQLcl command for that looks like this:

mle install validator_mod https://esm.run/validator@13.11.0 13.11.0

And here is the result in SQLcl :

Image may be NSFW.
Clik here to view.
Output of successfully executed mle command to install the validator module from npm

Please note that the response message by SQLcl 23.3 is not 100% correct for MLE modules. However, our module is installed correctly. We verify that later.

Maybe you’d like to know what the SQL statement looks like to install this module. The last statement is still in SQLcl’s buffer. Therefore we can type l followed by enter to see the content of the buffer.


SQLcl's buffer

The first line contains the start of the SQL statement. Lines 2 to 7 are comments generated by jsDelivr. Line 8 contains the complete module. Yes, as a single line. The code is minified. All unnecessary whitespace is gone and internally used identifiers are shortened to save space. On line 9 we would see a JavaScript comment with a pointer to a map file that points to the original, nicely formatted source code. This is interesting when debugging in other environments. It’s currently not used in the database.

Image may be NSFW.
Clik here to view.
SQLcl buffer containing the SQL command to install the validator module


Querying MLE Modules

The following SQL statement shows some data regarding the previously deployed MLE module:

set sqlformat ansiconsole
select module_name, version, language_name, length(module)
  from user_mle_modules;

The result in SQLcl looks like this:

Image may be NSFW.
Clik here to view.

We see the version of the module and also the size in bytes of the blob column where the module is stored. It is one byte larger than the result of the URL since It contains a final line feed character due to the way we built the SQL statement.

IMO it is helpful to provide the version of the external ESM as long as there is no proper package registry within the Oracle Database.

Verifying Installation

Let’s write a call specification in order to verify the successful installation of the validator module.

create or replace function is_email(
   in_email in varchar2
) return boolean as mle module validator_mod signature 'default.isEmail(string)';
/

Now we can run the following query to verify e-mail addresses and the installation of the validator module:

select is_email('jane.doe@example.org') as jane_doe,
       is_email('john.doe@example') as john_doe;

In SQLcl 23.3 the result looks like this (boolean values as 1 and 0):

Image may be NSFW.
Clik here to view.
Result of calling the is_email validator function in SQLcl

And in SQL*Plus 23.3 the result looks like this (boolean values as TRUE and FALSE):

Image may be NSFW.
Clik here to view.
Result of calling the is_email validator function in SQL*Plus

Installing More Modules

Let’s install some other modules I find useful. You find them on npm if you want to know what they do.

mle install sentiment_mod https://esm.run/sentiment@5.0.2 5.0.22
mle install jimp_mod https://esm.run/jimp@0.22.10/browser/lib/jimp.js 0.22.10
mle install sql_assert_mod https://esm.run/sql-assert@1.0.3 1.0.3
mle install lodash_mod https://esm.run/lodash@4.17.21 4.17.21
mle install js_yaml_mod https://esm.run/js-yaml@4.1.0 4.1.0
mle install minimist_mod https://esm.run/minimist@1.2.8 1.2.8
mle install typeorm_mod https://esm.run/typeorm@0.3.17 0.3.17

Installing all modules is just a matter of a few seconds.

Conclusion

Using the SQLcl custom mle command to install ECMAScript modules from a file or from a URL is not only easy and fast, but it is also an excellent way to provide tested functionality within the database.

Technically it should be possible to generate the PL/SQL call specifications based on the information that is provided for IDEs to better support code completion (type definitions). I hope that Oracle will provide such a feature in one of the coming releases of SQLcl or as part of a dedicated MLE tool (similar to dbjs which was part of the experimental version of the MLE back in 2017). Even if I do not want to provide access to all underlying functionality of an MLE module within the database or provide the functionality with the same signature, a PL/SQL package with all call specifications would simplify the work for a wrapper PL/SQL package that exposes just the relevant subset in a suitable way for the use in PL/SQL or SQL.

The post Installing MLE Modules in the Oracle Database appeared first on Philipp Salvisberg's Blog.

Autonomous Transactions

Introduction

Autonomous transactions became available in the Oracle Database 8i Release 1 (8.1.5). 25 years ago. Before then the feature was used only internally, for example, when requesting a new value from a sequence. I mean, if Oracle is using autonomous transactions internally and they’ve made them public, then the usage can hardly be bad, right? – Wrong.

“The legitimate real-world use of autonomous transactions is exceedingly rare. If you find them to be a feature you are using constantly, you’ll want to take a long, hard look at why.”

— Tom Kyte

In this blog post, I’d like to discuss some of the side effects of autonomous transactions.

Example in X Poll

Here’s the screenshot of a poll result on X (Twitter).

Image may be NSFW.
Clik here to view.

You can run this SQL script using a common SQL client connected to an Oracle Database 12c instance or higher.

I’ve run this script against the following Oracle database versions with the same result: 23.3, 21.8, 21.3, 19.22, 19.21, 19.19, 19.17, 18.4, 12.2 and 12.1. It should therefore also work in your environment using SQL*Plus, SQLcl or SQL Developer. Simply drop the possibly existing table t beforehand.

1) SQL script from X poll
create table t (c1 number);
insert into t values(1);
commit;

with
   function f return number deterministic is
      pragma autonomous_transaction;
   begin
      delete from t;
      commit;
      return 1;
   end;
select * from t where f() = 1;
/
        C1
----------
         1

The query result shows the row inserted on line 2. So the majority of respondents were right.

Just to be clear: This result is expected. It is not a bug. The reason why the query returns one row is the statement-level read consistency of the Oracle database. We see the data as it was at the start of the query. The autonomous transaction that deletes all rows is completed (committed) after the query is started. As a result, changes made by the autonomous transaction are not visible in the main transaction.

When Do We Get an ORA-14551?

I like the features introduced in 23c, such as the IF [NOT] EXISTS syntax support. That’s why I’m using the 23c syntax in this blog post from now on. However, it should not be too difficult to adapt the code for older versions.

The next script looks very similar to the first one. The difference is that the function f does not contain the pragma autonomous_transaction anymore. The default, so to speak. And this leads to a different result.

2) SQL script without “pragma autonomous_transaction”
drop table if exists t;
create table t (c1 number);
insert into t values(1);
commit;

with
  function f return number deterministic is
  begin
     delete from t;
     commit;
     return 1;
  end;
select * from t where f()=1;
/
Error starting at line : 6 in command -
with
  function f return number deterministic is
  begin
     delete from t;
     commit;
     return 1;
  end;
select * from t where f()=1
Error at Command Line : 13 Column : 15
Error report -
SQL Error: ORA-14551: cannot perform a DML operation inside a query 
ORA-06512: at line 3
ORA-06512: at line 7
14551. 00000 -  "cannot perform a DML operation inside a query "
*Cause:    DML operation like insert, update, delete or select-for-update
           cannot be performed inside a query or under a PDML slave.
*Action:   Ensure that the offending DML operation is not performed or
           use an autonomous transaction to perform the DML operation within
           the query or PDML slave.

More Details :
https://docs.oracle.com/error-help/db/ora-14551/
https://docs.oracle.com/error-help/db/ora-06512/

The error message, the cause and the action are good. You get even the information that you can work around the problem by using an autonomous transaction.

What is missing, however, is the information on why performing a DML operation within a query is a bad thing. Maybe it’s too obvious. Who would expect a query to change data? – Probably nobody. It therefore makes sense to prohibit DML in queries by default.

When Do We Get No Rows?

Add Logging

Let’s add some logging information to the query to better understand what is being executed and when.

3) SQL script with logging
drop table if exists t;
create table t (c1 number);
insert into t values (1), (2);
commit;
drop table if exists l;
create table l (
   id integer generated always as identity primary key, 
   text varchar2(50 char)
);
column logit format a20

with
   procedure logit (in_text in varchar2) is
      pragma autonomous_transaction;
   begin
      insert into l(text) values(in_text);
      commit;
   end;
   function logit (in_text in varchar2) return varchar2 is
   begin
      logit(in_text);
      return in_text;
   end;
   function f return number deterministic is
      pragma autonomous_transaction;
   begin
      logit('in function f');
      delete from t;
      commit;
      return 1;
   end;
select c1, logit('in select_list') as logit
  from t
 where f() = 1 and logit('in where_clause') is not null;
/

select * from l order by id;
        C1 LOGIT               
---------- --------------------
         1 in select_list      
         2 in select_list      

        ID TEXT                                              
---------- --------------------------------------------------
         1 in where_clause                                   
         2 in function f                                     
         3 in select_list                                    
         4 in select_list                                    

We have inserted an additional row into the table t to understand better how often a function is called. A logit call produces a row in the new table l.

The query on the table t returns now two rows. That’s expected due to statement-level read consistency.

The query result on the table l reveals the order in which the logit calls were evaluated by the Oracle Database.

So far so good.

DML Restart

The Oracle Database can restart any DML statement. Automatically or intentionally via our code, for example in exception handlers. A restart also implies a rollback. The scope of a rollback is the current transaction. Changes that are made outside the current transaction, such as writing to a file, calling a REST service or executing code in an autonomous transaction, remain unaffected by a rollback. This means that the application is responsible for reversing changes made outside the current transaction.

Let’s force a DML restart. Franck Pachot provided the simplest solution in this post on X that works in a single database session. The script looks the same as before, we only added a for update clause on line 35.

4) SQL script causing a DML restart
drop table if exists t purge;
create table t (c1 number);
insert into t values (1), (2);
commit;
drop table if exists l purge;
create table l (
   id integer generated always as identity primary key, 
   text varchar2(50 char)
);
column logit format a20

with
   procedure logit (in_text in varchar2) is
      pragma autonomous_transaction;
   begin
      insert into l(text) values(in_text);
      commit;
   end;
   function logit (in_text in varchar2) return varchar2 is
   begin
      logit(in_text);
      return in_text;
   end;
   function f return number deterministic is
      pragma autonomous_transaction;
   begin
      logit('in function f');
      delete from t;
      commit;
      return 1;
   end;
select c1, logit('in select_list') as logit
  from t
 where f() = 1 and logit('in where_clause') is not null
   for update;
/

select * from l order by id;
no rows selected

        ID TEXT                                              
---------- --------------------------------------------------
         1 in where_clause                                   
         2 in function f                                     
         3 in where_clause                                                                    

The query on the table t now returns no rows. The log contains two rows with the text “in where clause”. The second one is a clear indication of a DML restart.

Again, this is not a bug in the Oracle Database. It’s a bug in the application code.

But why is the function f just called once? – Because the function is declared as deterministic (a false claim, BTW, but a necessary evil to avoid an ORA-600 due to recursive restarts). The Oracle database already knows the result of the function call where no parameters are passed. As a result, there is no need to re-evaluate it.

And why was the statement restarted? – Because the Oracle Database detected that the rows to be locked have been changed. Locking outdated versions of a row is not possible and it would not make any sense. So the database is left with two options. Either throw an error or restart the statement. “Let’s try again” is a solution approach we often use when something doesn’t work on the first attempt. This also works for the Oracle Database.

DML Restart – Using Update Instead Of Delete

Although what I wrote before sounds reasonable, I’d still like to verify it.

So, let’s change the previous script slightly and replace the delete with a update statement on line 28.

5) SQL script causing a DML restart – update instead of delete
drop table if exists t;
create table t (c1 number);
insert into t values (1), (2);
commit;
drop table if exists l;
create table l (
   id integer generated always as identity primary key, 
   text varchar2(50 char)
);
column logit format a20

with
   procedure logit (in_text in varchar2) is
      pragma autonomous_transaction;
   begin
      insert into l(text) values(in_text);
      commit;
   end;
   function logit (in_text in varchar2) return varchar2 is
   begin
      logit(in_text);
      return in_text;
   end;
   function f return number deterministic is
      pragma autonomous_transaction;
   begin
      logit('in function f');
      update t set c1 = c1 + 1;
      commit;
      return 1;
   end;
select c1, logit('in select_list') as logit
  from t
 where f() = 1 and logit('in where_clause') is not null
   for update;
/

select * from l order by id;
        C1 LOGIT               
---------- --------------------
         2 in select_list      
         3 in select_list      


        ID TEXT                                              
---------- --------------------------------------------------
         1 in where_clause                                   
         2 in function f                                     
         3 in where_clause                                   
         4 in select_list                                    
         5 in select_list                                                                     

See, the query on the table t returns now the two updated rows. The effect of the restarted statement.

Real-Life Use Case

This blog post was inspired by a real-life use case. The original question was how to call a package procedure at the end of a process from a security and observability tool when this tool can run queries only.

Here’s the screenshot of an X post by my colleague Stefan Oehrli that demonstrates a possible solution approach. Just replace count(*) with a couple of columns from the table unified_audit_trail to make it realistic.

Image may be NSFW.
Clik here to view.

While this technically “solves” the original requirement it comes with a couple of issues, for example:

  • What happens if the query succeeds but the result cannot be stored successfully in the target database? – Data loss, since the autonomous transaction succeeded and the data is not provided with the next access.
  • What happens when the query crashes? – Again data loss as in the previous case. It does not matter how many rows have been read.
  • How can we reload some previously processed data? – This might not be possible with this approach, which is bad because it could solve the previous two issues.

The problem with this approach is that this easy implementation comes at the price of possible data loss. That’s fine as long as the stakeholders know this in advance and accept the risk. However, I can imagine that this is not good enough. Especially when dealing with security-relevant data, we should strive for the best possible approach and not be satisfied with the easiest solution to implement.

Alternatives to Autonomous Transactions

Years ago I was a fan of autonomous transactions. They allow me to persist data in logging tables even if the main transaction is not committed. That’s a great feature. However, over the years I’ve seen some abuse of autonomous transactions that changed my mind. I still like to use autonomous transactions for debugging purposes. But that’s it. Using them for something else is most probably a bug in the application code.

So what are the alternatives?

1. Make the Right Application Responsible for Data Consistency

Take a step back and think about who should be in charge of a certain process. The “real use case” mentioned above for example. I believe that the security and observability tool is responsible for the data it reads, processes and stores in its data store. This means that this tool should have mechanisms in place to remember the last processed log entry per source with the corresponding restart points and procedures. Moving this responsibility to the source for a part of it like “remembering the last processed log entry” is just plain wrong and leads to the issues outlined above.

Shifting process responsibility to the right place makes the use of autonomous transactions most probably superfluous.

2. Advanced Queues

Advanced Queues are transactional (enqueue and dequeue). It’s an excellent way to postpone certain parts of a transaction that you would like to be transactional, but which are not. For example, sending an e-mail, calling a REST service or calling a functionality that contains TCL statements. You can configure how to deal with failures such as the number of retries, delay time, etc. This leads to a more transactional-like behaviour than using autonomous transactions.

3. One-time Jobs

Jobs are similar to queue messages. You create them as part of the transaction. They might be the easier solution because the dequeue process is the job system itself.

Conclusion

Your application should contain at most one PL/SQL unit with a pragma autonomous_transaction. The one for storing logging/debugging messages as part of your instrumentation strategy.

Autonomous transactions may seem appealing for quickly solving certain problems or requirements. However, they come with the risk of data inconsistencies.

Even when analyzing logging/debugging data generated by autonomous transactions, we need to be aware that an entry does not mean that an action has taken place, as it could have been undone via a rollback.

The post Autonomous Transactions appeared first on Philipp Salvisberg's Blog.

IslandSQL Episode 5: Select in Oracle Database 23c

Introduction

In the last episode, we extended the expressions in the IslandSQL grammar to complete the lock table statement. The grammar now fully covers expressions, conditions and the select statement. In this episode, we will focus on optimizer hints and new features in the Oracle Database 23c that can be used in the select statement.

The full source code is available on GitHub, the binaries are on Maven Central and this VS Code extension uses the IslandSQL library to find text in DML statements and report syntax errors.

Token Channels

ANTLR uses the concept of channels which is based on the idea of radio frequencies. The lexer is responsible for identifying tokens and putting them on the right channel.

For most lexers, these two channels are enough:

  • DEFAULT_CHANNEL – all tokens that are relevant to the parser
  • HIDDEN_CHANNEL – all other tokens

Here’s an example:

1) Tokens in source, on hidden channel and on default channel
select/*+ full(emp) */█*█from█emp█where█empno█=█7788█;
/*+ full(emp) */█ █    █   █     █     █ █    █
select                  * from emp where empno = 7788 ;

The first line contains the complete statement where a space token is represented as . The syntax highlighting helps to identify the 19 tokens. In the second line, you find all 10 hidden tokens – comments and whitespace. The noise, so to speak. And in the third line are the visible 9 tokens on the default channel.

This is similar to a noise-cancelling system. The parser only gets the tokens that are necessary to do its job.

Identifying Hints

In this blog post, I explained how you can distinguish hints from ordinary comments and highlight them in SQL Developer. Solving this problem was a bit more complicated because SQL Developer’s parse tree does not contain hints. Because hints are just special comments.

However, in the IslandSQL grammar, we want to define hints as part of a query_block. In other words, we want to make them visible.

Image may be NSFW.
Clik here to view.
query_block in IslandSQL with highlighted hint
In the Lexer?

Identifying hints in the lexer and putting them on the DEFAULT_CHANNEL sounds like a good solution. However, we do not want to handle comment tokens that look like a hint in every position in the parser. This would be a nightmare. To avoid that we could add a semantic predicate to consider only hint-style comments following the select keyword. Of course, we need to ignore whitespace and ordinary comments. Furthermore, we have to ensure that the select keyword is the start of a query_block and not used in another context such as a grant statement.

At that point, it becomes obvious that the lexer would be doing the job of the parser.

Better in the Parser!

We better use the lexer only to identify hint tokens and put them on the HIDDEN_CHANNEL:

2) Excerpt IslandSqlLexer.g4 v0.5.0
ML_HINT: '/*+' .*? '*/' -> channel(HIDDEN);
ML_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SL_HINT: '--+' ~[\r\n]* -> channel(HIDDEN);
SL_COMMENT: '--' ~[\r\n]* -> channel(HIDDEN);

And then we define a semantic predicate in the parser:

3) Excerpt IslandSqlParser.g4 v0.5.0
queryBlock:
    {unhideFirstHint();} K_SELECT hint?
    queryBlockSetOperator?
    selectList
    (intoClause | bulkCollectIntoClause)? // in PL/SQL only
    fromClause? // starting with Oracle Database 23c the from clause is optional
    whereClause?
    hierarchicalQueryClause?
    groupByClause?
    modelClause?
    windowClause?
;

That’s the call of the function unhideFirstHint();} on line 161. At that point, the parser is at the position of the token K_SELECT. Here’s the implementation in the base class of the generated parser:

4) Excerpt IslandSqlParserBase.java v0.5.0
    public void unhideFirstHint() {
        CommonTokenStream input = ((CommonTokenStream) this.getTokenStream());
        List<Token> tokens = input.getHiddenTokensToRight(input.index());
        if (tokens != null) {
            for (Token token : tokens) {
                if (token.getType() == IslandSqlLexer.ML_HINT || token.getType() == IslandSqlLexer.SL_HINT) {
                    ((CommonToken) token).setChannel(Token.DEFAULT_CHANNEL);
                    return; // stop after first hint style comment
                }
            }
        }
    }

We scan all hidden tokens to the right of the keyword select and set the first hint token to the DEFAULT_CHANNEL to make it visible to the parser.

Parse Tree

Let’s visualise the parse tree of the following query:

5a) Example with comments and hints
select -- A query_block can have only one comment
      /*  containing hints, and that comment must
          follow the SELECT keyword. */
      /*+ full(emp) */
      --+ index(emp)
      ename, sal    -- select_list
 from emp           -- from_clause
where empno = 7788; -- where_clause

We use ParseTreeUtil.dotParseTree to produce an output in DOT format and paste the result into the web UI of Edotor or any other Graphviz viewer to produce this result:

Image may be NSFW.
Clik here to view.
parse tree with hint

The leave nodes are sand-coloured rectangles. They represent the visible lexer tokens, the ones on the DEFAULT_CHANNEL. All other nodes are sky blue and elliptical. They represent a rule in the parser grammar.

I have changed the colour of the hint node to red so that you can spot it more easily. You see that it contains the /*+ full(emp) */ hint-style comment. All other comments are not visible in the parse tree. That’s what we wanted.

Here’s an alternative textual representation of the parse tree using ParseTreeUtil.printParseTree. It is better suited to represent larger parse trees. Furthermore, it contains also the symbol name of lexer tokens, for example K_SELECT or ML_HINT as you see in lines 7 and 9.

5b) Parse tree
file
  dmlStatement
    selectStatement
      select
        subquery:subqueryQueryBlock
          queryBlock
            K_SELECT:select
            hint
              ML_HINT:/*+ full(emp) */
            selectList
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:ename
              COMMA:,
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:sal
            fromClause
              K_FROM:from
              fromItem:tableReferenceFromItem
                tableReference
                  queryTableExpression
                    sqlName
                      unquotedId
                        ID:emp
            whereClause
              K_WHERE:where
              condition
                expression:simpleComparisionCondition
                  expression:simpleExpressionName
                    sqlName
                      unquotedId
                        ID:empno
                  simpleComparisionOperator:eq
                    EQUALS:=
                  expression:simpleExpressionNumberLiteral
                    NUMBER:7788
      sqlEnd
        SEMI:;
  <EOF>

New Features in the Oracle Database 23c

The Oracle Database 23c comes with a lot of new features. See the new features guide for a complete list.

In the next chapters, we look at a few examples that are relevant when querying data. In other words, at some of the new features that are applicable in the select statement.

Graph Table Operator

You can use the new graph_table operator to query property graphs in the Oracle Database. It’s a table function similar to xml_table or json_table. A powerful addition to the converged database.

Setup

The SQL Language Reference 23 provides some good examples including a setup script.


6) Setup example property graph

The setup script is provided here for convenience. It’s a 1:1 copy from the SQL Language Reference with some minor additions and modifications.

The most important change is that business keys are used in the insert statements to retrieve the associated surrogate keys. As a result, it’s easier to add test data.

6) Setup example property graph
-- drop existing property graph including data
drop property graph if exists students_graph;
drop table if exists friendships;
drop table if exists students;
drop table if exists persons;
drop table if exists university;

-- create tables, insert data and create property graph
create table university (
   id             number       generated always as identity (start with 1 increment by 1) not null,
   name           varchar2(10) not null,
   constraint u_pk primary key (id),
   constraint u_uk unique (name)
);
insert into university (name) values ('ABC'), ('XYZ');

create table persons (
   person_id      number       generated always as identity (start with 1 increment by 1) not null,
   name           varchar2(10) not null,
   birthdate      date         not null,
   height         float        not null,
   person_data    json         not null,
   constraint person_pk primary key (person_id),
   constraint person_uk unique (name)
);
insert into persons (name, height, birthdate, person_data)
values ('John',  1.80, date '1963-06-13', '{"department":"IT","role":"Software Developer"}'),
       ('Mary',  1.65, date '1982-09-25', '{"department":"HR","role":"HR Manager"}'),
       ('Bob',   1.75, date '1966-03-11', '{"department":"IT","role":"Technical Consultant"}'),
       ('Alice', 1.70, date '1987-02-01', '{"department":"HR","role":"HR Assistant"}');

create table students (
   s_id           number       generated always as identity (start with 1 increment by 1) not null,
   s_univ_id      number       not null,
   s_person_id    number       not null,
   subject        varchar2(10) not null,
   constraint stud_pk primary key (s_id),
   constraint stud_uk unique (s_univ_id, s_person_id),
   constraint stud_fk_person foreign key (s_person_id) references persons(person_id),
   constraint stud_fk_univ foreign key (s_univ_id) references university(id)
);
insert into students(s_univ_id, s_person_id, subject)
select u.id, p.person_id, d.subject
  from (values
          (1, 'ABC', 'John',  'Arts'),
          (2, 'ABC', 'Bob',   'Music'),
          (3, 'XYZ', 'Mary',  'Math'),
          (4, 'XYZ', 'Alice', 'Science')
       ) as d (seq, uni_name, pers_name, subject)
  join university u
    on u.name = d.uni_name
  join persons p
    on p.name = d.pers_name
 order by d.seq;

create table friendships (
   friendship_id  number       generated always as identity (start with 1 increment by 1) not null,
   person_a       number       not null,
   person_b       number       not null,
   meeting_date   date         not null,
   constraint fk_person_a_id foreign key (person_a) references persons(person_id),
   constraint fk_person_b_id foreign key (person_b) references persons(person_id),
   constraint fs_pk primary key (friendship_id),
   constraint fs_uk unique (person_a, person_b)
);
insert into friendships (person_a, person_b, meeting_date)
select a.person_id, b.person_id, d.meeting_date
  from (values
          (1, 'John', 'Bob',   date '2000-09-01'),
          (2, 'Mary', 'Alice', date '2000-09-19'),
          (3, 'Mary', 'John',  date '2000-09-19'),
          (4, 'Bob',  'Mary',  date '2001-07-10')
       ) as d (seq, name_a, name_b, meeting_date)
  join persons a
    on a.name = d.name_a
  join persons b
    on b.name = d.name_b
  order by d.seq;
                
create property graph students_graph
   vertex tables (
      persons key (person_id)
         label person
            properties (person_id, name, birthdate as dob)
         label person_ht
            properties (height),
      university key (id)
   )
   edge tables (
      friendships as friends
         key (friendship_id)
         source key (person_a) references persons(person_id)
         destination key (person_b) references persons(person_id)
         properties (friendship_id, meeting_date),
      students as student_of
         source key (s_person_id) references persons(person_id)
         destination key (s_univ_id) references university(id)
         properties (subject)
  );


The example property graph looks like this:

Image may be NSFW.
Clik here to view.
Data in STUDENTS_GRAPH
Source: SQL Language Reference 23c
Query
7a) Query using graph_table
select a_name, b_name, c_name
  from graph_table (
          students_graph
          match
             (a is person)
                -[is friends]->    -- a is friend of b
             (b is person)
                -[is friends]->    -- b is friend of c
             (c is person)
                -[is friends]->    -- c is friend of a (cyclic path)
             (a)
          where
             a.name = 'Mary'       -- start of cyclic path with 3 nodes
          columns (
             a.name as a_name,
             b.name as b_name,
             c.name as c_name
          )
       ) g;
A_NAME     B_NAME     C_NAME    
---------- ---------- ----------
Mary       John       Bob       
Edges and Directions

An edge has a source and a destination vertex. According to the model, Mary is a friend of John and this means that John is also a friend of Mary. When we change the direction of the edges in the query from -[is friends]-> to <-[is friends]- the query result changes to:

A_NAME     B_NAME     C_NAME    
---------- ---------- ----------
Mary       Bob        John      

We’ve got now the clockwise result of the cyclic path starting with Mary (see the highlighted person vertices in the STUDENTS_GRAPH figure above).

Since there is only one type of edge between the vertices of the type persons we get the same result by using just <-[]- or even <-.

To ignore the direction of a friendship we can use <-[is friends]-> or -[is friends]- or <-[]-> or -[]- or <-> or just - to produce this result:

A_NAME     B_NAME     C_NAME    
---------- ---------- ----------
Mary       Bob        John      
Mary       John       Bob       

IMO this arrow-like syntax is intuitive and makes a graph_table query relatively easy to read and write.

7b) Parse tree
file
  dmlStatement
    selectStatement
      select
        subquery:subqueryQueryBlock
          queryBlock
            K_SELECT:select
            selectList
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:a_name
              COMMA:,
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:b_name
              COMMA:,
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:c_name
            fromClause
              K_FROM:from
              fromItem:tableReferenceFromItem
                tableReference
                  queryTableExpression
                    expression:specialFunctionExpressionParent
                      specialFunctionExpression
                        graphTable
                          K_GRAPH_TABLE:graph_table
                          LPAR:(
                          sqlName
                            unquotedId
                              ID:students_graph
                          K_MATCH:match
                          pathTerm
                            pathTerm
                              pathTerm
                                pathTerm
                                  pathTerm
                                    pathTerm
                                      pathTerm
                                        pathFactor
                                          pathPrimary
                                            elementPattern
                                              vertexPattern
                                                LPAR:(
                                                elementPatternFiller
                                                  sqlName
                                                    unquotedId
                                                      keywordAsId
                                                        K_A:a
                                                  K_IS:is
                                                  labelExpression
                                                    sqlName
                                                      unquotedId
                                                        ID:person
                                                RPAR:)
                                      pathFactor
                                        pathPrimary
                                          elementPattern
                                            edgePattern
                                              fullEdgePattern
                                                fullEdgePointingRight
                                                  MINUS:-
                                                  LSQB:[
                                                  elementPatternFiller
                                                    K_IS:is
                                                    labelExpression
                                                      sqlName
                                                        unquotedId
                                                          ID:friends
                                                  RSQB:]
                                                  MINUS:-
                                                  GT:>
                                    pathFactor
                                      pathPrimary
                                        elementPattern
                                          vertexPattern
                                            LPAR:(
                                            elementPatternFiller
                                              sqlName
                                                unquotedId
                                                  ID:b
                                              K_IS:is
                                              labelExpression
                                                sqlName
                                                  unquotedId
                                                    ID:person
                                            RPAR:)
                                  pathFactor
                                    pathPrimary
                                      elementPattern
                                        edgePattern
                                          fullEdgePattern
                                            fullEdgePointingRight
                                              MINUS:-
                                              LSQB:[
                                              elementPatternFiller
                                                K_IS:is
                                                labelExpression
                                                  sqlName
                                                    unquotedId
                                                      ID:friends
                                              RSQB:]
                                              MINUS:-
                                              GT:>
                                pathFactor
                                  pathPrimary
                                    elementPattern
                                      vertexPattern
                                        LPAR:(
                                        elementPatternFiller
                                          sqlName
                                            unquotedId
                                              ID:c
                                          K_IS:is
                                          labelExpression
                                            sqlName
                                              unquotedId
                                                ID:person
                                        RPAR:)
                              pathFactor
                                pathPrimary
                                  elementPattern
                                    edgePattern
                                      fullEdgePattern
                                        fullEdgePointingRight
                                          MINUS:-
                                          LSQB:[
                                          elementPatternFiller
                                            K_IS:is
                                            labelExpression
                                              sqlName
                                                unquotedId
                                                  ID:friends
                                          RSQB:]
                                          MINUS:-
                                          GT:>
                            pathFactor
                              pathPrimary
                                elementPattern
                                  vertexPattern
                                    LPAR:(
                                    elementPatternFiller
                                      sqlName
                                        unquotedId
                                          keywordAsId
                                            K_A:a
                                    RPAR:)
                          K_WHERE:where
                          condition
                            expression:simpleComparisionCondition
                              expression:binaryExpression
                                expression:simpleExpressionName
                                  sqlName
                                    unquotedId
                                      keywordAsId
                                        K_A:a
                                PERIOD:.
                                expression:simpleExpressionName
                                  sqlName
                                    unquotedId
                                      keywordAsId
                                        K_NAME:name
                              simpleComparisionOperator:eq
                                EQUALS:=
                              expression:simpleExpressionStringLiteral
                                STRING:'Mary'
                          K_COLUMNS:columns
                          LPAR:(
                          graphTableColumnDefinition
                            expression:binaryExpression
                              expression:simpleExpressionName
                                sqlName
                                  unquotedId
                                    keywordAsId
                                      K_A:a
                              PERIOD:.
                              expression:simpleExpressionName
                                sqlName
                                  unquotedId
                                    keywordAsId
                                      K_NAME:name
                            K_AS:as
                            sqlName
                              unquotedId
                                ID:a_name
                          COMMA:,
                          graphTableColumnDefinition
                            expression:binaryExpression
                              expression:simpleExpressionName
                                sqlName
                                  unquotedId
                                    ID:b
                              PERIOD:.
                              expression:simpleExpressionName
                                sqlName
                                  unquotedId
                                    keywordAsId
                                      K_NAME:name
                            K_AS:as
                            sqlName
                              unquotedId
                                ID:b_name
                          COMMA:,
                          graphTableColumnDefinition
                            expression:binaryExpression
                              expression:simpleExpressionName
                                sqlName
                                  unquotedId
                                    ID:c
                              PERIOD:.
                              expression:simpleExpressionName
                                sqlName
                                  unquotedId
                                    keywordAsId
                                      K_NAME:name
                            K_AS:as
                            sqlName
                              unquotedId
                                ID:c_name
                          RPAR:)
                          RPAR:)
                  sqlName
                    unquotedId
                      ID:g
      sqlEnd
        SEMI:;
  <EOF>

Table Value Constructor

Instead of reading rows from a table/view, you can produce rows on the fly using the new values_clause. This makes it possible to produce rows without writing a query_block for each row and using union all as a kind of row separator.

8a) Query using table value constructor
column english format a7
column german  format a7
with
   eng (digit, english) as (values
      (1, 'one'),
      (2, 'two')
   )
 select digit, english, german
   from eng e
natural full join (values
           (2, 'zwei'),
           (3, 'drei')
        ) as g (digit, german)
  order by digit;
/
     DIGIT ENGLISH GERMAN 
---------- ------- -------
         1 one            
         2 two     zwei   
         3         drei 

8b) Parse tree
file
  dmlStatement
    selectStatement
      select
        subquery:subqueryQueryBlock
          withClause
            K_WITH:with
            factoringClause
              subqueryFactoringClause
                sqlName
                  unquotedId
                    ID:eng
                LPAR:(
                sqlName
                  unquotedId
                    ID:digit
                COMMA:,
                sqlName
                  unquotedId
                    ID:english
                RPAR:)
                K_AS:as
                valuesClause
                  LPAR:(
                  K_VALUES:values
                  valuesRow
                    LPAR:(
                    expression:simpleExpressionNumberLiteral
                      NUMBER:1
                    COMMA:,
                    expression:simpleExpressionStringLiteral
                      STRING:'one'
                    RPAR:)
                  COMMA:,
                  valuesRow
                    LPAR:(
                    expression:simpleExpressionNumberLiteral
                      NUMBER:2
                    COMMA:,
                    expression:simpleExpressionStringLiteral
                      STRING:'two'
                    RPAR:)
                  RPAR:)
          queryBlock
            K_SELECT:select
            selectList
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:digit
              COMMA:,
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:english
              COMMA:,
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:german
            fromClause
              K_FROM:from
              fromItem:joinClause
                fromItem:tableReferenceFromItem
                  tableReference
                    queryTableExpression
                      sqlName
                        unquotedId
                          ID:eng
                    sqlName
                      unquotedId
                        ID:e
                joinVariant
                  outerJoinClause
                    K_NATURAL:natural
                    outerJoinType
                      K_FULL:full
                    K_JOIN:join
                    fromItem:tableReferenceFromItem
                      tableReference
                        queryTableExpression
                          valuesClause
                            LPAR:(
                            K_VALUES:values
                            valuesRow
                              LPAR:(
                              expression:simpleExpressionNumberLiteral
                                NUMBER:2
                              COMMA:,
                              expression:simpleExpressionStringLiteral
                                STRING:'zwei'
                              RPAR:)
                            COMMA:,
                            valuesRow
                              LPAR:(
                              expression:simpleExpressionNumberLiteral
                                NUMBER:3
                              COMMA:,
                              expression:simpleExpressionStringLiteral
                                STRING:'drei'
                              RPAR:)
                            RPAR:)
                            K_AS:as
                            sqlName
                              unquotedId
                                ID:g
                            LPAR:(
                            sqlName
                              unquotedId
                                ID:digit
                            COMMA:,
                            sqlName
                              unquotedId
                                ID:german
                            RPAR:)
          orderByClause
            K_ORDER:order
            K_BY:by
            orderByItem
              expression:simpleExpressionName
                sqlName
                  unquotedId
                    ID:digit
      sqlEnd
        SEMI:;
  <EOF>

JSON_ARRAY Constructor by Query

The function json_array has got a new JSON_ARRAY_query_content clause. This clause simplifies the creation of JSON documents, similar to SQL/XML. If you use the abbreviated syntax for json_array and json_object it feels like writing JSON documents with embedded SQL.

9a) Query using JSON_ARRAY_query_contant clause
column result format a90
select json [
          select json {
                    'ename': ename,
                    'sal': sal,
                    'comm': comm absent on null
                 }
            from emp
           where sal >= 3000
          returning json
       ] as result;
RESULT                                                                                    
------------------------------------------------------------------------------------------
[{"ename":"SCOTT","sal":3000},{"ename":"KING","sal":5000},{"ename":"FORD","sal":3000}]     

9b) Parse tree
file
  dmlStatement
    selectStatement
      select
        subquery:subqueryQueryBlock
          queryBlock
            K_SELECT:select
            selectList
              selectItem
                expression:specialFunctionExpressionParent
                  specialFunctionExpression
                    jsonArray
                      K_JSON:json
                      LSQB:[
                      jsonArrayContent
                        jsonArrayQueryContent
                          subquery:subqueryQueryBlock
                            queryBlock
                              K_SELECT:select
                              selectList
                                selectItem
                                  expression:specialFunctionExpressionParent
                                    specialFunctionExpression
                                      jsonObject
                                        K_JSON:json
                                        LCUB:{
                                        jsonObjectContent
                                          entry
                                            regularEntry
                                              expression:simpleExpressionStringLiteral
                                                STRING:'ename'
                                              COLON::
                                              expression:simpleExpressionName
                                                sqlName
                                                  unquotedId
                                                    ID:ename
                                          COMMA:,
                                          entry
                                            regularEntry
                                              expression:simpleExpressionStringLiteral
                                                STRING:'sal'
                                              COLON::
                                              expression:simpleExpressionName
                                                sqlName
                                                  unquotedId
                                                    ID:sal
                                          COMMA:,
                                          entry
                                            regularEntry
                                              expression:simpleExpressionStringLiteral
                                                STRING:'comm'
                                              COLON::
                                              expression:simpleExpressionName
                                                sqlName
                                                  unquotedId
                                                    ID:comm
                                          jsonOnNullClause
                                            K_ABSENT:absent
                                            K_ON:on
                                            K_NULL:null
                                        RCUB:}
                              fromClause
                                K_FROM:from
                                fromItem:tableReferenceFromItem
                                  tableReference
                                    queryTableExpression
                                      sqlName
                                        unquotedId
                                          ID:emp
                              whereClause
                                K_WHERE:where
                                condition
                                  expression:simpleComparisionCondition
                                    expression:simpleExpressionName
                                      sqlName
                                        unquotedId
                                          ID:sal
                                    simpleComparisionOperator:ge
                                      GT:>
                                      EQUALS:=
                                    expression:simpleExpressionNumberLiteral
                                      NUMBER:3000
                          jsonReturningClause
                            K_RETURNING:returning
                            K_JSON:json
                      RSQB:]
                K_AS:as
                sqlName
                  unquotedId
                    ID:result
      sqlEnd
        SEMI:;
  <EOF>

SQL Boolean Data Type

Where can the new Boolean data type be used in the select statement? In conversion functions, for example.

10a) Query using Boolean data type
column dump_yes_value format a20
select cast('yes' as boolean) as yes_value,
       xmlcast(xmltype('<x>no</x>') as boolean) as no_value,
       validate_conversion('maybe' as boolean) as is_maybe_boolean,
       dump(cast('yes' as boolean)) as dump_yes_value;
YES_VALUE   NO_VALUE    IS_MAYBE_BOOLEAN DUMP_YES_VALUE
----------- ----------- ---------------- --------------------
TRUE        FALSE                      0 Typ=252 Len=1: 1

10b) Parse tree
file
  dmlStatement
    selectStatement
      select
        subquery:subqueryQueryBlock
          queryBlock
            K_SELECT:select
            selectList
              selectItem
                expression:specialFunctionExpressionParent
                  specialFunctionExpression
                    cast
                      K_CAST:cast
                      LPAR:(
                      expression:simpleExpressionStringLiteral
                        STRING:'yes'
                      K_AS:as
                      dataType
                        oracleBuiltInDatatype
                          booleanDatatype
                            K_BOOLEAN:boolean
                      RPAR:)
                K_AS:as
                sqlName
                  unquotedId
                    ID:yes_value
              COMMA:,
              selectItem
                expression:specialFunctionExpressionParent
                  specialFunctionExpression
                    xmlcast
                      K_XMLCAST:xmlcast
                      LPAR:(
                      expression:functionExpressionParent
                        functionExpression
                          sqlName
                            unquotedId
                              keywordAsId
                                K_XMLTYPE:xmltype
                          LPAR:(
                          functionParameter
                            condition
                              expression:simpleExpressionStringLiteral
                                STRING:'<x>no</x>'
                          RPAR:)
                      K_AS:as
                      dataType
                        oracleBuiltInDatatype
                          booleanDatatype
                            K_BOOLEAN:boolean
                      RPAR:)
                K_AS:as
                sqlName
                  unquotedId
                    ID:no_value
              COMMA:,
              selectItem
                expression:specialFunctionExpressionParent
                  specialFunctionExpression
                    validateConversion
                      K_VALIDATE_CONVERSION:validate_conversion
                      LPAR:(
                      expression:simpleExpressionStringLiteral
                        STRING:'maybe'
                      K_AS:as
                      dataType
                        oracleBuiltInDatatype
                          booleanDatatype
                            K_BOOLEAN:boolean
                      RPAR:)
                K_AS:as
                sqlName
                  unquotedId
                    ID:is_maybe_boolean
              COMMA:,
              selectItem
                expression:functionExpressionParent
                  functionExpression
                    sqlName
                      unquotedId
                        ID:dump
                    LPAR:(
                    functionParameter
                      condition
                        expression:specialFunctionExpressionParent
                          specialFunctionExpression
                            cast
                              K_CAST:cast
                              LPAR:(
                              expression:simpleExpressionStringLiteral
                                STRING:'yes'
                              K_AS:as
                              dataType
                                oracleBuiltInDatatype
                                  booleanDatatype
                                    K_BOOLEAN:boolean
                              RPAR:)
                    RPAR:)
                K_AS:as
                sqlName
                  unquotedId
                    ID:dump_yes_value
      sqlEnd
        SEMI:;
  <EOF>

Boolean Expressions

The impact of Boolean expressions is huge. A condition becomes an expression that returns a Boolean expression. Consequently, conditions can be used wherever expressions are permitted.

11a) Query using Boolean expressions
with
   function f(p in boolean) return boolean is
   begin
      return p;
   end;
select (select count(*) from emp) = 14 and (select count(*) from dept) = 4 as is_complete,
       f(1>0) is true as is_true,
       cast(null as boolean) is not null as is_not_null;
/
IS_COMPLETE IS_TRUE     IS_NOT_NULL
----------- ----------- -----------
TRUE        TRUE        FALSE

11b) Parse tree
file
  dmlStatement
    selectStatement
      select
        subquery:subqueryQueryBlock
          withClause
            K_WITH:with
            plsqlDeclarations
              functionDeclaration
                K_FUNCTION:function
                plsqlCode
                  ID:f
                  LPAR:(
                  ID:p
                  K_IN:in
                  K_BOOLEAN:boolean
                  RPAR:)
                  K_RETURN:return
                  K_BOOLEAN:boolean
                  K_IS:is
                  ID:begin
                  K_RETURN:return
                  ID:p
                  SEMI:;
                K_END:end
                SEMI:;
          queryBlock
            K_SELECT:select
            selectList
              selectItem
                expression:simpleComparisionCondition
                  expression:simpleComparisionCondition
                    expression:scalarSubqueryExpression
                      LPAR:(
                      subquery:subqueryQueryBlock
                        queryBlock
                          K_SELECT:select
                          selectList
                            selectItem
                              expression:functionExpressionParent
                                functionExpression
                                  sqlName
                                    unquotedId
                                      keywordAsId
                                        K_COUNT:count
                                  LPAR:(
                                  functionParameter
                                    condition
                                      expression:allColumnWildcardExpression
                                        AST:*
                                  RPAR:)
                          fromClause
                            K_FROM:from
                            fromItem:tableReferenceFromItem
                              tableReference
                                queryTableExpression
                                  sqlName
                                    unquotedId
                                      ID:emp
                      RPAR:)
                    simpleComparisionOperator:eq
                      EQUALS:=
                    expression:logicalCondition
                      expression:simpleExpressionNumberLiteral
                        NUMBER:14
                      K_AND:and
                      expression:scalarSubqueryExpression
                        LPAR:(
                        subquery:subqueryQueryBlock
                          queryBlock
                            K_SELECT:select
                            selectList
                              selectItem
                                expression:functionExpressionParent
                                  functionExpression
                                    sqlName
                                      unquotedId
                                        keywordAsId
                                          K_COUNT:count
                                    LPAR:(
                                    functionParameter
                                      condition
                                        expression:allColumnWildcardExpression
                                          AST:*
                                    RPAR:)
                            fromClause
                              K_FROM:from
                              fromItem:tableReferenceFromItem
                                tableReference
                                  queryTableExpression
                                    sqlName
                                      unquotedId
                                        ID:dept
                        RPAR:)
                  simpleComparisionOperator:eq
                    EQUALS:=
                  expression:simpleExpressionNumberLiteral
                    NUMBER:4
                K_AS:as
                sqlName
                  unquotedId
                    ID:is_complete
              COMMA:,
              selectItem
                expression:isTrueCondition
                  expression:functionExpressionParent
                    functionExpression
                      sqlName
                        unquotedId
                          ID:f
                      LPAR:(
                      functionParameter
                        condition
                          expression:simpleComparisionCondition
                            expression:simpleExpressionNumberLiteral
                              NUMBER:1
                            simpleComparisionOperator:gt
                              GT:>
                            expression:simpleExpressionNumberLiteral
                              NUMBER:0
                      RPAR:)
                  K_IS:is
                  K_TRUE:true
                K_AS:as
                sqlName
                  unquotedId
                    ID:is_true
              COMMA:,
              selectItem
                expression:isNullCondition
                  expression:specialFunctionExpressionParent
                    specialFunctionExpression
                      cast
                        K_CAST:cast
                        LPAR:(
                        expression:simpleExpressionName
                          sqlName
                            unquotedId
                              keywordAsId
                                K_NULL:null
                        K_AS:as
                        dataType
                          oracleBuiltInDatatype
                            booleanDatatype
                              K_BOOLEAN:boolean
                        RPAR:)
                  K_IS:is
                  K_NOT:not
                  K_NULL:null
                K_AS:as
                sqlName
                  unquotedId
                    ID:is_not_null
      sqlEnd
        SEMI:;
        SOL:/
  <EOF>

JSON Schema

There is an extended is_JSON_condition that makes it possible to validate a JSON document against a JSON schema.

12a) Query using JSON schema
column j format a20
with
   t (j) as (values
      (json('["a", "b"]')),            -- JSON array
      (json('{"a": "a", "b": "b"}')),  -- JSON object without id property
      (json('{"id": 42}')),            -- JSON object with numeric id property
      (json('{"id": "42"}'))           -- JSON object with string id property
   )
select j,
       j is json validate '
          {
             "type": "object",
             "properties": {
                "id": { "type": "number" }
             }
          }' as is_valid
  from t;
J                    IS_VALID
-------------------- -----------
["a","b"]            FALSE
{"a":"a","b":"b"}    TRUE
{"id":42}            TRUE
{"id":"42"}          FALSE

12b) Parse tree
file
  dmlStatement
    selectStatement
      select
        subquery:subqueryQueryBlock
          withClause
            K_WITH:with
            factoringClause
              subqueryFactoringClause
                sqlName
                  unquotedId
                    ID:t
                LPAR:(
                sqlName
                  unquotedId
                    ID:j
                RPAR:)
                K_AS:as
                valuesClause
                  LPAR:(
                  K_VALUES:values
                  valuesRow
                    LPAR:(
                    expression:functionExpressionParent
                      functionExpression
                        sqlName
                          unquotedId
                            keywordAsId
                              K_JSON:json
                        LPAR:(
                        functionParameter
                          condition
                            expression:simpleExpressionStringLiteral
                              STRING:'["a", "b"]'
                        RPAR:)
                    RPAR:)
                  COMMA:,
                  valuesRow
                    LPAR:(
                    expression:functionExpressionParent
                      functionExpression
                        sqlName
                          unquotedId
                            keywordAsId
                              K_JSON:json
                        LPAR:(
                        functionParameter
                          condition
                            expression:simpleExpressionStringLiteral
                              STRING:'{"a": "a", "b": "b"}'
                        RPAR:)
                    RPAR:)
                  COMMA:,
                  valuesRow
                    LPAR:(
                    expression:functionExpressionParent
                      functionExpression
                        sqlName
                          unquotedId
                            keywordAsId
                              K_JSON:json
                        LPAR:(
                        functionParameter
                          condition
                            expression:simpleExpressionStringLiteral
                              STRING:'{"id": 42}'
                        RPAR:)
                    RPAR:)
                  COMMA:,
                  valuesRow
                    LPAR:(
                    expression:functionExpressionParent
                      functionExpression
                        sqlName
                          unquotedId
                            keywordAsId
                              K_JSON:json
                        LPAR:(
                        functionParameter
                          condition
                            expression:simpleExpressionStringLiteral
                              STRING:'{"id": "42"}'
                        RPAR:)
                    RPAR:)
                  RPAR:)
          queryBlock
            K_SELECT:select
            selectList
              selectItem
                expression:simpleExpressionName
                  sqlName
                    unquotedId
                      ID:j
              COMMA:,
              selectItem
                expression:isJsonCondition
                  expression:simpleExpressionName
                    sqlName
                      unquotedId
                        ID:j
                  K_IS:is
                  K_JSON:json
                  jsonConditionOption:jsonConditionOptionValidate
                    K_VALIDATE:validate
                    expression:simpleExpressionStringLiteral
                      STRING:'\n          {\n             "type": "object",\n             "properties": {\n                "id": { "type": "number" }\n             }\n          }'
                K_AS:as
                sqlName
                  unquotedId
                    ID:is_valid
            fromClause
              K_FROM:from
              fromItem:tableReferenceFromItem
                tableReference
                  queryTableExpression
                    sqlName
                      unquotedId
                        ID:t
      sqlEnd
        SEMI:;
  <EOF>

What Else?

There are more new features in the Oracle Database 23c that you can use in the select statement, such as:

We can also assume that more features will be added with future release updates. The AI vector search, for example, should be available with 23.4 later this year.

Outlook

The plan for IslandSQL is still the same as outlined in the previous episode. So we should cover the remaining DML statements (calldeleteexplain planinsertmerge and update) in the next episode.

The post IslandSQL Episode 5: Select in Oracle Database 23c appeared first on Philipp Salvisberg's Blog.


IslandSQL Episode 6: DML Statements in Oracle Database 23c

Introduction

The IslandSQL grammar now covers all DML statements. This means call, delete, explain plan, insert, lock table, merge, select and update.

In this episode, we will focus on new features in the Oracle Database 23c that can be used in insert, update, delete and merge statements. For the select statement see the last episode.

Table Value Constructor

The new table value constructor allows you to create rows on the fly. This simplifies statements. Furthermore, it allows you to write a single statement instead of a series of statements, which makes the execution in scripts faster. It can be used in the select, insert and merge statement.

Insert
1) Insert with table value constructor
drop table if exists d;
create table d (deptno number(2,0), dname varchar2(14), loc varchar2(13));
insert into d (deptno, dname, loc)
values (10, 'ACCOUNTING', 'NEW YORK'),
       (20, 'RESEARCH',   'DALLAS'),
       (30, 'SALES',      'CHICAGO'),
       (40, 'OPERATIONS', 'BOSTON');
Table D dropped.

Table D created.

4 rows inserted.
Merge
2) Merge with table value constructor
merge into d t
using (values 
         (10, 'ACCOUNTING', 'NEW YORK'),
         (20, 'RESEARCH',   'DALLAS'),
         (30, 'SALES',      'CHICAGO'),
         (40, 'OPERATIONS', 'BOSTON')
      ) s (deptno, dname, loc)
   on (t.deptno = s.deptno)
 when matched then
      update
         set t.dname = s.dname,
             t.loc = s.loc
 when not matched then
      insert (t.deptno, t.dname, t.loc)
      values (s.deptno, s.dname, s.loc);
4 rows merged.

Direct Joins for UPDATE and DELETE Statements

The new from_using_clause can be used in delete and update statements.

Image may be NSFW.
Clik here to view.
from_using_clause railroad diagram

With this new clause, you can avoid a self-join and, as a result, the optimizer can produce a more efficient execution plan.

Delete

The next example is based on the HR schema. We delete all countries that are not used by any department. See line 3 for the from_using_clause. The join conditions and the filter criteria are part of the where_clause.

You cannot define the join condition for the table in the from_clause in the from_using_clause. This is a documented limitation. Furthermore, we cannot mix ANSI-92 join syntax with Oracle-style outer join syntax (see ORA-25156). As a result, we have to use the Oracle-style join syntax for all tables.

3a) Delete with from_using_clause
delete 
  from countries c 
  from locations l, departments d
 where l.country_id (+) = c.country_id
   and d.location_id (+) = l.location_id
   and l.location_id is null
   and d.department_id is null; 
11 rows deleted.

--------------------------------------------------
| Id  | Operation              | Name            |
--------------------------------------------------
|   0 | DELETE STATEMENT       |                 |
|   1 |  DELETE                | COUNTRIES       |
|   2 |   FILTER               |                 |
|   3 |    HASH JOIN OUTER     |                 |
|   4 |     FILTER             |                 |
|   5 |      HASH JOIN OUTER   |                 |
|   6 |       INDEX FULL SCAN  | COUNTRY_C_ID_PK |
|   7 |       TABLE ACCESS FULL| LOCATIONS       |
|   8 |     TABLE ACCESS FULL  | DEPARTMENTS     |
--------------------------------------------------

Having two from keywords in the delete statement is funny, but it does not make the statement easier to read. I therefore recommend rewriting the statement like this:

3b) Delete with from_using_clause (simplified & clearer)
delete countries c 
 using locations l, departments d
 where l.country_id (+) = c.country_id
   and d.location_id (+) = l.location_id
   and l.location_id is null
   and d.department_id is null;
11 rows deleted.

--------------------------------------------------
| Id  | Operation              | Name            |
--------------------------------------------------
|   0 | DELETE STATEMENT       |                 |
|   1 |  DELETE                | COUNTRIES       |
|   2 |   FILTER               |                 |
|   3 |    HASH JOIN OUTER     |                 |
|   4 |     FILTER             |                 |
|   5 |      HASH JOIN OUTER   |                 |
|   6 |       INDEX FULL SCAN  | COUNTRY_C_ID_PK |
|   7 |       TABLE ACCESS FULL| LOCATIONS       |
|   8 |     TABLE ACCESS FULL  | DEPARTMENTS     |
--------------------------------------------------

Here’s an alternative, pre-23c-style delete statement without the from_using_clause. It is accessing the countries table twice, which might lead to a less efficient execution plan.

3c) Delete with subquery filter
delete 
  from countries c1
 where c1.country_id in (
          select c2.country_id
            from countries c2
            left join locations l
              on l.country_id = c2.country_id
            left join departments d
              on d.location_id = l.location_id
           where l.location_id is null
             and d.department_id is null
       );
11 rows deleted.

----------------------------------------------------------------------
| Id  | Operation                                 | Name             |
----------------------------------------------------------------------
|   0 | DELETE STATEMENT                          |                  |
|   1 |  DELETE                                   | COUNTRIES        |
|   2 |   INDEX FULL SCAN                         | COUNTRY_C_ID_PK  |
|   3 |    FILTER                                 |                  |
|   4 |     NESTED LOOPS OUTER                    |                  |
|   5 |      FILTER                               |                  |
|   6 |       NESTED LOOPS OUTER                  |                  |
|   7 |        INDEX UNIQUE SCAN                  | COUNTRY_C_ID_PK  |
|   8 |        TABLE ACCESS BY INDEX ROWID BATCHED| LOCATIONS        |
|   9 |         INDEX RANGE SCAN                  | LOC_COUNTRY_IX   |
|  10 |      TABLE ACCESS BY INDEX ROWID BATCHED  | DEPARTMENTS      |
|  11 |       INDEX RANGE SCAN                    | DEPT_LOCATION_IX |
----------------------------------------------------------------------
Update

In this example, we increase the salaries of all employees in Germany and Canada by 20%. See lines 3 to 7 for the from_using_clause where we use ANSI-92 join syntax.

4a) Update with from_using_clause
update employees e
   set e.salary = e.salary * 1.2 
 using departments d
  join locations l
    on l.location_id = d.location_id
  join countries c
    on c.country_id = l.country_id
 where d.department_id = e.department_id
   and c.country_name in ('Germany', 'Canada');
3 rows updated.

----------------------------------------------------------------------
| Id  | Operation                                | Name              |
----------------------------------------------------------------------
|   0 | UPDATE STATEMENT                         |                   |
|   1 |  UPDATE                                  | EMPLOYEES         |
|   2 |   NESTED LOOPS                           |                   |
|   3 |    NESTED LOOPS                          |                   |
|   4 |     NESTED LOOPS                         |                   |
|   5 |      NESTED LOOPS                        |                   |
|   6 |       INDEX FULL SCAN                    | COUNTRY_C_ID_PK   |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| LOCATIONS         |
|   8 |        INDEX RANGE SCAN                  | LOC_COUNTRY_IX    |
|   9 |      TABLE ACCESS BY INDEX ROWID BATCHED | DEPARTMENTS       |
|  10 |       INDEX RANGE SCAN                   | DEPT_LOCATION_IX  |
|  11 |     INDEX RANGE SCAN                     | EMP_DEPARTMENT_IX |
|  12 |    TABLE ACCESS BY INDEX ROWID           | EMPLOYEES         |
----------------------------------------------------------------------

And here’s an alternative, pre-23c-style update statement without the from_using_clause. It is accessing the employees table twice, which might lead to a less efficient execution plan.

4b) Update with subquery filter
update employees e1
   set e1.salary = e1.salary * 1.2
 where e1.employee_id in (
          select e2.employee_id
            from employees e2
            join departments d
              on d.department_id = e2.department_id
            join locations l
              on l.location_id = d.location_id
            join countries c
              on c.country_id = l.country_id
           where c.country_name in ('Germany', 'Canada')
       );
3 rows updated.

-----------------------------------------------------------------------
| Id  | Operation                                 | Name              |
-----------------------------------------------------------------------
|   0 | UPDATE STATEMENT                          |                   |
|   1 |  UPDATE                                   | EMPLOYEES         |
|   2 |   HASH JOIN SEMI                          |                   |
|   3 |    TABLE ACCESS FULL                      | EMPLOYEES         |
|   4 |    VIEW                                   | VW_NSO_1          |
|   5 |     NESTED LOOPS                          |                   |
|   6 |      NESTED LOOPS                         |                   |
|   7 |       NESTED LOOPS                        |                   |
|   8 |        NESTED LOOPS SEMI                  |                   |
|   9 |         VIEW                              | index$_join$_005  |
|  10 |          HASH JOIN                        |                   |
|  11 |           INDEX FAST FULL SCAN            | LOC_COUNTRY_IX    |
|  12 |           INDEX FAST FULL SCAN            | LOC_ID_PK         |
|  13 |         INDEX UNIQUE SCAN                 | COUNTRY_C_ID_PK   |
|  14 |        TABLE ACCESS BY INDEX ROWID BATCHED| DEPARTMENTS       |
|  15 |         INDEX RANGE SCAN                  | DEPT_LOCATION_IX  |
|  16 |       INDEX RANGE SCAN                    | EMP_DEPARTMENT_IX |
|  17 |      TABLE ACCESS BY INDEX ROWID          | EMPLOYEES         |
-----------------------------------------------------------------------

However, we can update an inline view. The Oracle database has supported this for a very long time (without a BYPASS_UJVC hint). There are some limitations, but otherwise, it works quite well. Here’s an example:

4c) Update inline-view
update (
          select e.*
            from employees e
            join departments d
              on d.department_id = e.department_id
            join locations l
              on l.location_id = d.location_id
            join countries c
              on c.country_id = l.country_id
           where c.country_name in ('Germany', 'Canada')
       )
   set salary = salary * 1.2;
3 rows updated.

---------------------------------------------------------------------
| Id  | Operation                               | Name              |
---------------------------------------------------------------------
|   0 | UPDATE STATEMENT                        |                   |
|   1 |  UPDATE                                 | EMPLOYEES         |
|   2 |   NESTED LOOPS                          |                   |
|   3 |    NESTED LOOPS                         |                   |
|   4 |     NESTED LOOPS                        |                   |
|   5 |      INDEX FULL SCAN                    | COUNTRY_C_ID_PK   |
|   6 |      TABLE ACCESS BY INDEX ROWID BATCHED| LOCATIONS         |
|   7 |       INDEX RANGE SCAN                  | LOC_COUNTRY_IX    |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED | DEPARTMENTS       |
|   9 |      INDEX RANGE SCAN                   | DEPT_LOCATION_IX  |
|  10 |    INDEX RANGE SCAN                     | EMP_DEPARTMENT_IX |
---------------------------------------------------------------------

The execution plan is similar to the variant with the from_using_clause. So, from a performance point of view, this is a good option. However, I like the from_using_clause variant better because it’s clearer which table is updated and which tables are just used for query purposes.

SQL UPDATE RETURN Clause Enhancements

The returning_clause has been extended.

Image may be NSFW.
Clik here to view.
returning_clause railroad diagram

It’s now possible to explicitly return old and new values. The default depends on the operation. new in insert/update and old in delete statements. I do not see a lot of value for delete and insert statements besides maybe making the statements more explicit and therefore easier to read. However, for the update statement, this new feature can be useful.

Here’s a small SQL script showing the new returning clause in action for insert, update and delete.

5) New returning_clause in insert, update and delete
set serveroutput on
drop table if exists t;
create table t (id integer, value integer);

declare
   l_old_value t.value%type;
   l_new_value t.value%type;
begin
   dbms_random.seed(16);

   insert into t (id, value) 
   values (1, dbms_random.value(low => 1, high => 100))
   return new value into l_new_value;
   dbms_output.put_line('Insert: new value ' || l_new_value);

   update t
      set value = value * 2
    where id = 1
   return old value, new value into l_old_value, l_new_value;
   dbms_output.put_line('Update: old value ' || l_old_value || ', new value ' || l_new_value);

   delete t
    where id = 1
   return old value into l_old_value;
   dbms_output.put_line('Delete: old value ' || l_old_value); 
end;
/
Table T dropped.

Table T created.

Insert: new value 21
Update: old value 21, new value 42
Delete: old value 42

PL/SQL procedure successfully completed.

DEFAULT ON NULL for UPDATE Statements

The column_definition clause in the create table statement has been extended.

Image may be NSFW.
Clik here to view.
column_definition clause railroad diagram

Finally, it’s possible to enforce the default on null expression also for update and merge statements.

The next SQL script demonstrates this.

6) default on null for insert and update
drop table if exists t;
create table t (
   id    integer not null primary key,
   value varchar2(10 char) default on null for insert and update 'my default'
);

insert into t(id, value)
values (1, 'value1'),
       (2, null);
select * from t order by id;

update t set value = case id
                        when 1 then
                           null
                        when 2 then
                           'value2'
                     end;
select * from t order by id;

merge into t
using (values 
         (1, 'value3'),
         (2, null),
         (3, null)
      ) s (id, value)
   on (t.id = s.id)
 when matched then
      update
         set t.value = s.value
 when not matched then
      insert (t.id, t.value)
      values (s.id, s.value);
select * from t order by id;
Table T dropped.

Table T created.

2 rows inserted.

        ID VALUE     
---------- ----------
         1 value1    
         2 my default

2 rows updated.

        ID VALUE     
---------- ----------
         1 my default
         2 value2    

3 rows merged.

        ID VALUE     
---------- ----------
         1 value3    
         2 my default
         3 my default

Lock-Free Reservation

The new datatype_domain clause comes with a reservable keyword.

Image may be NSFW.
Clik here to view.

You can update reservable columns without locking a row. As a result, updating such a column is possible from multiple sessions in a transactional way. However, only numeric columns can be declared as reserveable.

Let’s make an example.

7.1) Create and populate table with reservable column
drop table if exists e;
create table e (
   empno number(4,0)  not null primary key, 
   ename varchar2(10) not null,
   sal   number(7,2)  reservable not null
);
insert into e(empno, ename, sal)
values (7788, 'SCOTT', 3000),
       (7739, 'KING',  5000);
commit;
Table E dropped.

Table E created.

2 rows inserted.

Commit complete.

After the setup, we run two database sessions in parallel.

7.2) Session A – update sal of empno 7788
update e
   set sal = sal + 100
 where empno = 7788;
 
select * from e;
1 row updated.

     EMPNO ENAME             SAL
---------- ---------- ----------
      7788 SCOTT            3000
      7739 KING             5000
7.3) Session B – update sal of empno 7788
update e
   set sal = sal + 500
 where empno = 7788;
 
select * from e;
1 row updated.

     EMPNO ENAME             SAL
---------- ---------- ----------
      7788 SCOTT            3000
      7739 KING             5000

We’ve updated the same record in two sessions. The transactions are pending and the changes are not yet visible in the target table. Let’s complete the pending transactions.

7.4) Session A – commit changes
commit;

select * from e; 
Commit complete.

     EMPNO ENAME             SAL
---------- ---------- ----------
      7788 SCOTT            3100
      7739 KING             5000
7.5) Session B – commit changes
commit;

select * from e; 
Commit complete.

     EMPNO ENAME             SAL
---------- ---------- ----------
      7788 SCOTT            3600
      7739 KING             5000

After committing, the changes are visible in the target table. The changes from both sessions have been applied. Concurrent updates of the same row without locking. Pure magic.

How is that possible? Quite simple. Behind the scenes, the Oracle Database creates a reservation journal table named SYS_RESERVJRNL_<object_id_of_table> for every table with a reservable column. This table stores the pending changes per session and applies them on commit. You can query this table, to better understand the process.

See the Database Development Guide for more information about lock-free reservations.

More New Features

More features are applicable in DML statements. For example, using sys_row_etag for optimistic locking or when working with JSON-relational duality views. The JSON-Relational Duality Developer’s Guide explains this new feature in detail.

For a complete list see Oracle Database New Features.

Outlook

For the next episode, the IslandSQL grammar will be extended to cover the PostgreSQL 16 grammar for the current statements in scope. This means all DML statements. I’m sure I will be able to show some interesting differences between the Oracle Database and PostgreSQL. Stay tuned.

The post IslandSQL Episode 6: DML Statements in Oracle Database 23c appeared first on Philipp Salvisberg's Blog.

IslandSQL Episode 7: DML Statements in PostgreSQL 16 and What I Miss in Oracle Database 23c

Introduction

In the last episode, we covered DML statements in SQL*Plus/SQLcl scripts for the Oracle Database 23c. The IslandSQL grammar can now also handle PostgreSQL 16 DML statements in psql scripts.

In this blog post, we will look at some features in PostgreSQL 16 which I miss in the Oracle Database 23c.

Since we now have the Table Value Constructor, Booleans and IF [NOT] EXISTS syntax support I truly hope that some features, if not all, will make it into a future release of the Oracle Database.

Returning Clause

Let’s create a table t1 with some test data. The script works in PostgreSQL 16 and Oracle Database 23c.

1) Setup (PostgreSQL & OracleDB)
create table t1 as
select * 
  from (values
          (7839, 'KING',   'PRESIDENT', null, date '1981-11-17', 5000, null, 10),
          (7566, 'JONES',  'MANAGER',   7839, date '1981-04-02', 2975, null, 20),
          (7698, 'BLAKE',  'MANAGER',   7839, date '1981-05-01', 2850, null, 30),
          (7782, 'CLARK',  'MANAGER',   7839, date '1981-06-09', 2450, null, 10),
          (7788, 'SCOTT',  'ANALYST',   7566, date '1987-04-19', 3000, null, 20),
          (7902, 'FORD',   'ANALYST',   7566, date '1981-12-03', 3000, null, 20),
          (7499, 'ALLEN',  'SALESMAN',  7698, date '1981-02-20', 1600,  300, 30),
          (7521, 'WARD',   'SALESMAN',  7698, date '1981-02-22', 1250,  500, 30),
          (7654, 'MARTIN', 'SALESMAN',  7698, date '1981-09-28', 1250, 1400, 30),
          (7844, 'TURNER', 'SALESMAN',  7698, date '1981-09-08', 1500,    0, 30),
          (7900, 'JAMES',  'CLERK',     7698, date '1981-12-03',  950, null, 30),
          (7934, 'MILLER', 'CLERK',     7782, date '1982-01-23', 1300, null, 10),
          (7369, 'SMITH',  'CLERK',     7902, date '1980-12-17',  800, null, 20),
          (7876, 'ADAMS',  'CLERK',     7788, date '1987-05-23', 1100, null, 20)                        
       ) s (empno, ename, job, mgr, hiredate, sal, comm, deptno);

Now we want to increase the salary by 20 per cent for all employees that earn less than 2000 and we want to get the changed rows with the new values as a result.

In PostgreSQL we can do the following:

2a) PostgreSQL: update with returning clause
begin;
update t1 
   set sal = sal * 1.2
 where sal + coalesce(comm, 0) < 2000
returning empno, ename, cast(sal / 1.2 as int) as old_sal, sal as new_sal;
rollback;
BEGIN
 empno | ename  | old_sal | new_sal 
-------+--------+---------+---------
  7499 | ALLEN  |    1600 |    1920
  7521 | WARD   |    1250 |    1500
  7844 | TURNER |    1500 |    1800
  7900 | JAMES  |     950 |    1140
  7934 | MILLER |    1300 |    1560
  7369 | SMITH  |     800 |     960
  7876 | ADAMS  |    1100 |    1320
(7 rows)

UPDATE 7
ROLLBACK

The returning clause allows us to define a list of expressions to be returned for each changed row. So, an update, insert and delete statement in PostgreSQL can return a result similar to a select statement.

Doing the same in the Oracle Database 23c requires some PL/SQL code. For example something like this:

2b) OracleDB: update with returning clause
set serveroutput on size unlimited
declare
   cursor c1 is select empno, ename, sal as old_sal, sal as new_sal from t1;
   type t_row_type is table of c1%rowtype;
   t_row t_row_type;
begin
   update t1
      set sal = sal * 1.2
    where sal + coalesce(comm, 0) < 2000
   return empno, ename, old sal, new sal
     bulk collect into t_row;
   dbms_output.put_line(sql%rowcount || ' rows updated.');
   dbms_output.put_line(null);
   dbms_output.put_line('EMPNO ENAME  OLD_SAL NEW_SAL');
   dbms_output.put_line('----- ------ ------- -------');
   for i in t_row.first..t_row.last
   loop
      dbms_output.put_line(rpad(t_row(i).empno, 6)
         || rpad(t_row(i).ename, 7)
         || lpad(t_row(i).old_sal, 7)
         || ' '
         || lpad(t_row(i).new_sal, 7));
   end loop;
   rollback;
end;
/
7 rows updated.

EMPNO ENAME  OLD_SAL NEW_SAL
----- ------ ------- -------
7499  ALLEN     1600    1920
7521  WARD      1250    1500
7844  TURNER    1500    1800
7900  JAMES      950    1140
7934  MILLER    1300    1560
7369  SMITH      800     960
7876  ADAMS     1100    1320


PL/SQL procedure successfully completed.

There is one thing I like about this solution. On line 10 we refer to the old value of the column sal. We do not need to reverse the logic used in the update to get the old value.

Of course, there are other solutions. Like the next one:

2c) OracleDB: update and select
set verify off
column start_scn new_value scn noprint
lock table t1 in share row exclusive mode;
select dbms_flashback.get_system_change_number as start_scn;
update t1
   set sal = sal * 1.2
 where sal + coalesce(comm, 0) < 2000;
select empno, ename, cast(sal / 1.2 as int) as old_sal, sal as new_sal
  from t1
 where (empno, sal) not in (select empno, sal from t1 as of scn &&scn)
 order by rowid;
rollback;
Lock succeeded.


7 rows updated.


     EMPNO ENAME     OLD_SAL    NEW_SAL
---------- ------ ---------- ----------
      7499 ALLEN        1600       1920
      7521 WARD         1250       1500
      7844 TURNER       1500       1800
      7900 JAMES         950       1140
      7934 MILLER       1300       1560
      7369 SMITH         800        960
      7876 ADAMS        1100       1320

7 rows selected. 


Rollback complete.

This solution does not use the returning clause. Instead, after the update statement, a rather costly query produces the result. However, it’s important to note that the lock table statement on line 3 is necessary to ensure that another session cannot change the table t1 after querying the current SCN and before the start of the update statement. In other words, this guarantees that we get only the rows changed by the update statement.

The cool thing about this solution is that it allows us to sort the result (this was necessary to override the default order produced by the optimizer). Can we sort the result set of the returning clause in PostgreSQL?

Insert, Update and Delete in the With Clause

Yes, we can. In PostgreSQL we sort the result of an update statement like this:

2d) PostgreSQL: sort result of an update
begin;
with
   upd as (
      update t1 
         set sal = sal * 1.2
       where sal + coalesce(comm, 0) < 2000
      returning empno, ename, cast(sal / 1.2 as int) as old_sal, sal as new_sal
   )
select *
  from upd
 order by new_sal desc;
rollback;
BEGIN
 empno | ename  | old_sal | new_sal 
-------+--------+---------+---------
  7499 | ALLEN  |    1600 |    1920
  7844 | TURNER |    1500 |    1800
  7934 | MILLER |    1300 |    1560
  7521 | WARD   |    1250 |    1500
  7876 | ADAMS  |    1100 |    1320
  7900 | JAMES  |     950 |    1140
  7369 | SMITH  |     800 |     960
(7 rows)

ROLLBACK

We could use this feature to implement an archiving process. Let’s say we want to move the employees of the departments 10 and 20 to a new table t2. In PostgreSQL we can do that as follows:

3a) PostgreSQL: move rows
begin;
drop table if exists t2;
create table t2 as select * from t1 where false;
with
   del as (
      delete from t1
       where deptno in (10, 20)
      returning *
   )
insert into t2 (empno, ename, job, mgr, hiredate, sal, comm, deptno)
select empno, ename, job, mgr, hiredate, sal, comm, deptno
  from del
rollback;
BEGIN
DROP TABLE
SELECT 0
INSERT 0 8

The move is implemented as a single insert statement.

In the Oracle Database 23c you can use the Partitioning option to implement archiving logic efficiently. However, without this option, you could do something like this:

3b) OracleDB: move rows
drop table if exists t2;
create table t2 as select * from t1 where false;
lock table t1, t2 in share row exclusive mode;
insert into t2 (empno, ename, job, mgr, hiredate, sal, comm, deptno)
select empno, ename, job, mgr, hiredate, sal, comm, deptno
  from t1
 where deptno in (10, 20);
delete from t1
 where deptno in (10, 20);
rollback;
Table T2 dropped.


Table T2 created.


Lock succeeded.


8 rows inserted.


8 rows deleted.


Rollback complete.

Please note that it is not possible in the Oracle Database 23c to use a returning clause in an insert statement together with a subquery.

Image may be NSFW.
Clik here to view.
single_table_insert in Oracle Database 23c

Supporting the with clause in select, insert, update and delete statements and allowing select, insert, update and delete in named queries is a great PostgreSQL feature thanks to the powerful returning clause.

Deleting Rows With Merge

In this blog post, I explained that we must first update a row before we can delete it when we use a merge statement. This is still true for Oracle Database 23c. Let’s see if this limitation also exists in PostgreSQL.

Here’s the setup script that works in PostgreSQL 16 and Oracle Database 23c. A table t (target) with three rows and a table s (source) with 4 rows. 

4) Setup (PostgreSQL & OracleDB)
drop table if exists t;
create table t (
   id integer     not null primary key,
   c1 varchar(20) not null
);
insert into t 
values (1, 'original 1'), 
       (2, 'original 2'), 
       (3, 'original 3');
drop table if exists s;
create table s (
   id integer     not null,
   op varchar(1)  not null check (op in ('I', 'U', 'D')),
   c1 varchar(20) not null
);
insert into s 
values (1, 'U', 'original 1'), 
       (2, 'U', 'changed 2'), 
       (3, 'D', 'deleted 3'),
       (4, 'I', 'new 4');

Now let’s run a merge statement in PostgreSQL 16 and Oracle Database 23c. The syntax is different. However, the example should be self-explanatory.

5a) PostgreSQL: merge with delete
merge into t
using s
   on t.id = s.id
 when matched and op = 'U' 
              and t.c1 != s.c1 then
      update
         set c1 = s.c1
 when matched and op = 'D' then
      delete
 when not matched then
      insert (id, c1)
      values (id, c1);
select * from t;
MERGE 3
 id |     c1     
----+------------
  1 | original 1
  2 | changed 2
  4 | new 4
(3 rows)
5b) OracleDB: merge with delete
merge into t
using s
   on (t.id = s.id)
 when matched then
      update
         set c1 = s.c1
       where op = 'U'
         and t.c1 != s.c1
      delete
       where op = 'D'
 when not matched then
      insert (id, c1)
      values (s.id, s.c1);
select * from t;
2 rows merged.


        ID C1                  
---------- --------------------
         1 original 1          
         2 changed 2           
         3 original 3          
         4 new 4         

The predicates for insert, update and delete are the same in the statement applied in PostgreSQL and Oracle Database. However, the row with the ID 3 was not deleted in the Oracle Database because it was not updated by the merge statement.

I like that this limitation does not exist in PostgreSQL.

Select Without Select List

Does that make sense?

6) PostgreSQL: select without select list
select;
--
(1 row)

Hardly. However, what about the next example?

7) PostgreSQL: exists subquery without select list
select *
  from t1 a
 where exists ( -- bosses only
          select 
            from t1 b
           where b.mgr = a.empno
       );
 empno | ename |    job    | mgr  |  hiredate  | sal  | comm | deptno 
-------+-------+-----------+------+------------+------+------+--------
  7839 | KING  | PRESIDENT |      | 1981-11-17 | 5000 |      |     10
  7566 | JONES | MANAGER   | 7839 | 1981-04-02 | 2975 |      |     20
  7698 | BLAKE | MANAGER   | 7839 | 1981-05-01 | 2850 |      |     30
  7782 | CLARK | MANAGER   | 7839 | 1981-06-09 | 2450 |      |     10
  7788 | SCOTT | ANALYST   | 7566 | 1987-04-19 | 3000 |      |     20
  7902 | FORD  | ANALYST   | 7566 | 1981-12-03 | 3000 |      |     20
(6 rows)

Now it makes sense.

Look at line 4. In the Oracle Database, we would define an arbitrary expression to match the syntax. Similar to from dual in versions before 23c. Similar to an order_by_clause. We should not be forced to provide an unnecessary clause. IMO it’s something that should be changed in the SQL standard as well.

More?

Yes, there is more.

For example, I like that a transaction in PostgreSQL 16 also covers DDL statements. Replicating this in Oracle Database 23c might be possible for simple cases with the help of flashback table or other flashback features, but it is certainly somewhat more laborious.

And there are a lot of small differences between PostgreSQL 16 and Oracle Database 23c. They are different SQL dialects after all. They express the same thing differently. For example:

  • limit clause in PostgreSQL vs. SQL:2023-compliant fetch_first_clause in PostgreSQL and OracleDB
  • table t1; in PostgreSQL vs. select * from t1; in PostgreSQL and OracleDB
  • select empno, sal from t1 where empno=7788 for update of t1; in PostgreSQL vs. select empno, sal from t1 where empno=7788 for update of sal; in OracleDB
  • select $id$'$$text$$'$id$; in PostgreSQL vs. select q'['$$text$$']'; in OracleDB
  • select @ -42; in PostgreSQL vs. select abs(-42); in PostgreSQL and OracleDB
  • select distinct on (job) job, ename from t1 in PostgreSQL vs. select job, any_value(ename) as ename from t1 group by job; in PostgreSQL and OracleDB
  • select distinct on (job) job, ename from t1 order by job, sal desc; in PostgreSQL vs. select distinct job, first_value(ename) over (partition by job order by sal desc) as ename from t1 order by job; in PostgreSQL and OracleDB
  • select '42'::int as val; in PostgreSQL vs. select cast('42' as int) as val; in PostgreSQL and OracleDB

It is important to note that even if the syntax in PostgreSQL 16 and Oracle Database 23c look the same, there might be semantic differences. For example select cast('42.42' as int); produces 42 in OracleDB but an error in PostgreSQL.

Outlook

In the next episode, the IslandSQL grammar will be extended to cover the complete PL/SQL grammar. The plan is to focus on anonymous PL/SQL blocks along with functions and procedures in plsql_declarations of the with clause. Further SQL statements will be added afterwards.

The post IslandSQL Episode 7: DML Statements in PostgreSQL 16 and What I Miss in Oracle Database 23c appeared first on Philipp Salvisberg's Blog.

IslandSQL Episode 8: What’s New in Oracle Database 23.4?

Introduction

In the last episode, we looked at some features in PostgreSQL which I miss in the Oracle Database. The IslandSQL grammar now covers PL/SQL and the related DDL statements. The implementation was more complex than expected, mainly because of the incompatibilities between PostgreSQL and the Oracle Database. I will probably deal with this topic in a future blog post.

Now I’d like to talk about the new features in Oracle Database 23ai. Not about all features in the New Features Guide, but only about some changes since the release of 23.3, which was known as 23c. It’s hard to find changes from 23.3 to 23.4 in the documentation. So, I guess it’s worth a blog post. I focus on the features that are relevant to the IslandSQL grammar. In other words, the interesting ones from a developer’s perspective.

1. Vector Data Type

A vector is a number array for which you can optionally define the number of dimensions (size) and the data type (int8, float32 or float64) of the dimension values. This data type is the basis for the vector search functionality.

Here’s a slightly amended example from the documentation creating and populating a table with a vector data type.

1) Table with vector column
drop table if exists galaxies purge;
create table galaxies (
   id        number             not null primary key,
   name      varchar2(10 char)  not null unique,
   embedding vector(5, int8)    not null, -- 5 dimensions, stored as int8
   doc       varchar2(120 char) not null
);

insert into galaxies 
   (id, name, embedding, doc)
values 
   (1, 'M31',     '[0,2,2,0,0]', 'Messier 31 is a barred spiral galaxy in the Andromeda constellation which has a lot of barred spiral galaxies.'),
   (2, 'M33',     '[0,0,1,0,0]', 'Messier 33 is a spiral galaxy in the Triangulum constellation.'),
   (3, 'M58',     '[1,1,1,0,0]', 'Messier 58 is an intermediate barred spiral galaxy in the Virgo constellation.'),
   (4, 'M63',     '[0,0,1,0,0]', 'Messier 63 is a spiral galaxy in the Canes Venatici constellation.'),
   (5, 'M77',     '[0,1,1,0,0]', 'Messier 77 is a barred spiral galaxy in the Cetus constellation.'),
   (6, 'M91',     '[0,1,1,0,0]', 'Messier 91 is a barred spiral galaxy in the Coma Berenices constellation.'),
   (7, 'M49',     '[0,0,0,1,1]', 'Messier 49 is a giant elliptical galaxy in the Virgo constellation.'),
   (8, 'M60',     '[0,0,0,0,1]', 'Messier 60 is an elliptical galaxy in the Virgo constellation.'),
   (9, 'NGC1073', '[0,1,1,0,0]', 'NGC 1073 is a barred spiral galaxy in Cetus constellation.');
commit;

The vector in this example has 5 dimensions with the following meaning:

  1. Number of occurrences of intermediate in the doc column
  2. Number of occurrences of barred in the doc column
  3. Number of occurrences of spiral in the doc column
  4. Number of occurrences of giant in the doc column
  5. Number of occurrences of elliptical in the doc column

2. Vector Functions and PL/SQL Packages

The vector_distance function is a key functionality for similarity searches. Let’s say we want to see galaxies that are similar to NGC1073 (barred and spiral). The following query shows how vector functions can help to get this result:

2) Different vector_distance metrics in action
with ngc1073 as (select vector('[0,1,1,0,0]', 5, int8) as query_vector)
select name, 
       round(vector_distance(embedding, query_vector, cosine), 3) as cosine_distance,
       round(vector_distance(embedding, query_vector, dot), 3) as inner_product,
       round(vector_distance(embedding, query_vector, euclidean), 3) as l2_distance,
       round(vector_distance(embedding, query_vector, euclidean_squared), 3) as l2_squared,
       round(vector_distance(embedding, query_vector, hamming), 3) as hamming_distance,
       round(vector_distance(embedding, query_vector, manhattan), 3) as l1_distance
  from galaxies, ngc1073
 order by cosine_distance;
NAME       COSINE_DISTANCE INNER_PRODUCT L2_DISTANCE L2_SQUARED HAMMING_DISTANCE L1_DISTANCE
---------- --------------- ------------- ----------- ---------- ---------------- -----------
M31                      0            -4       1.414          2                2           2
M77                      0            -2           0          0                0           0
M91                      0            -2           0          0                0           0
NGC1073                  0            -2           0          0                0           0
M58                   .184            -2           1          1                1           1
M63                   .293            -1           1          1                1           1
M33                   .293            -1           1          1                1           1
M60                      1             0       1.732          3                3           3
M49                      1             0           2          4                4           4

9 rows selected.

M77 and M31 have identical vectors and they are expected to be very similar. However, M31 is interesting. It has a similar shape to NGC1073 but the words barred and spiral appear twice. This is a good match only for some distance metrics.

A conventional query, e.g. based on Oracle Text might be good enough to find similar galaxies. However, if you have vectors with tons of dimensions then a vector similarity search might become appealing. Especially, since you can index vector columns to speed up your queries.

The following SQL functions are provided for the vector data type in the Oracle Database:

Furthermore, the following supplied PL/SQL packages provide additional functionality related to the vector data type:

3. Shorthand Operators for Distances

For some vector_distance metrics shorthand operators (<=>, <->, <#>) are available. Here’s an example:

3) Shorthand operators for vector_distance metrics
with ngc1073 as (select vector('[0,1,1,0,0]', 5, int8) as query_vector)
select name, 
       round(embedding <=> query_vector, 3) as cosine_distance,
       round(embedding <-> query_vector, 3) as l2_distance,
       round(embedding <#> query_vector, 3) as inner_product
  from galaxies, ngc1073
 order by cosine_distance;
NAME       COSINE_DISTANCE L2_DISTANCE INNER_PRODUCT
---------- --------------- ----------- -------------
M31                      0       1.414            -4
M77                      0           0            -2
M91                      0           0            -2
NGC1073                  0           0            -2
M58                   .184           1            -2
M63                   .293           1            -1
M33                   .293           1            -1
M60                      1       1.732             0
M49                      1           2             0

9 rows selected. 

4. Approximate Similarity Searches

You can imagine producing a top-N result of the previous queries using the row_limiting_clause. This clause was introduced in 12.1 to limit search results. Exactly to be precise. In 23.4 the clause was extended to support approximate similarity searches. The idea is to get a good enough result with a better performance when using vector indexes. Here’s an example:

4) Approximate Similarity Search
select name, embedding
  from galaxies
 order by embedding <=> vector('[0,1,1,0,0]', 5, int8)
 fetch approx first 3 rows only
  with target accuracy 80 percent;
NAME       EMBEDDING           
---------- --------------------
M31        [0,2,2,0,0]         
M91        [0,1,1,0,0]         
M77        [0,1,1,0,0]         

5. Source and Destination Predicates in Graph Operator

The graph operator is a new 23ai feature. In 23.4 it got two additional predicates: source_predicate and destination_predicate. It allows us to test if a vertex is a source or a destination of an edge. The direction of the arrow so to speak. Here’s a formatted example from the documentation, based on this model.

5) Source and destination predicates
select *
  from graph_table (students_graph
          match (p1 is person) -[e is friends]- (p2 is person)
          where p1.name = 'Mary'
          columns (
             e.friendship_id,
             e.meeting_date,
             case 
                when p1 is source of e then 
                   p1.name 
                else
                   p2.name 
             end as from_person,
             case 
                when p1 is destination of e then
                   p1.name
               else
                   p2.name 
             end as to_person
          )
       )
 order by friendship_id;
FRIENDSHIP_ID MEETING_DATE        FROM_PERSON TO_PERSON 
------------- ------------------- ----------- ----------
            1 19.09.2000 00:00:00 Mary        Alice     
            5 19.09.2000 00:00:00 Mary        John      
            7 10.07.2001 00:00:00 Bob         Mary 

6. Breaking Change for Inlined MLE Call Specification

In 23.3 the following code works.

6) MLE Call Specification in 23.3
create or replace function get42 return number is 
   mle language javascript q'[return 42;]';
/
select get42();
Function GET42 compiled


   GET42()
----------
        42

In 24.4 the same code produces this error:

Function GET42 compiled

LINE/COL  ERROR
--------- -------------------------------------------------------------
2/28      PLS-00881: missing closing delimiter 'q'[return' for MLE language code
Errors: check compiler log

The reason is a change in the syntax of the inlined MLE call specification. The JavaScript code cannot be passed as a string anymore. Instead, a new type of delimiter must be used as documented here. You can use almost any sequence of characters as a delimiter for the JavaScript code. The same character sequence must be used for the start and the end delimiter. Exceptions are the pairs (), [], {} and <>.

PostgreSQL dollar-quoted string constants are also valid delimiters. This works in 23.4:

7) MLE Call Specification in 23.4
create or replace function get42 return number is 
   mle language javascript $code$ return 42;$code$;
/

select get42();
Function GET42 compiled


   GET42()
----------
        42

Please note that the space after the first $code$ is required for the Oracle Database to recognize the end of the delimiter.

This change might simplify the implementation of additional MLE languages. Nevertheless, this is a breaking change that requires an amendment of the existing code base.

Outlook

In the next episode, the IslandSQL grammar will be extended to cover the missing statements with a query block. This means the

  • create view,
  • create materialized view and
  • create table statement

for PostgreSQL and the Oracle Database. Only two episodes are left until the end of the first IslandSQL season.

The post IslandSQL Episode 8: What’s New in Oracle Database 23.4? appeared first on Philipp Salvisberg's Blog.

IslandSQL Episode 9: GraphQL, JSON and Flexible Schemas With Duality Views

Introduction

In the last episode, we looked at some new features in Oracle Database 23.4. The IslandSQL grammar now covers all statements that can contain static DML statements and code in PL/SQL and PL/pgSQL.

While implementing the ANTLR grammar for the create JSON relational duality view statement I stumbled over GraphQL in this syntax diagram:

Image may be NSFW.
Clik here to view.
create_json_relational_duality_view railroad diagram

You can use GraphQL as an alternative to a subquery to describe the source of your JSON relational duality view. This feature was already part of 23.3. Usually, I don’t like it when there are several ways to do the same thing. However, in this case, GraphQL helped me to understand the variant of the select statement in the duality view better. GraphQL might even be better suited to describe the content of a duality view.

The funny thing is that JSON is about schema flexibility and GraphQL needs a schema to work. That sounds contradictory. However, if we reduce the scope of schema flexibility to a JSON object within existing entities, this can work quite well.

In this blog post, I explore some features related to the schema flexibility of JSON relational duality views.

  1. Setup
  2. Read-only View à la 19c
  3. Read-only Duality View
  4. Updateable Duality View Using SELECT
  5. Updateable Duality View Using GraphQL
  6. GraphQL vs. SELECT
  7. Insert Into Duality View
  8. Update Duality View
  9. Delete From Duality View

1. Setup

The examples in this blog post require an Oracle Database 23.4 (yes, 23.3 is not enough). In a schema of your choice you can run the following setup script:

1) Setup for extended dept and emp table
set linesize 200
set pagesize 1000  
set long 32767
column ext format a72
column data format a130
alter session set nls_date_format = 'YYYY-MM-DD';

drop table if exists emp;
drop table if exists dept;

create table dept (
   deptno number(2, 0)      not null constraint dept_pk primary key,
   dname  varchar2(14 char) not null,
   loc    varchar2(13 char) not null,
   ext    json(object)
);

create table emp (
   empno    number(4, 0)      not null  constraint emp_pk primary key,
   ename    varchar2(10 char) not null,
   job      varchar2(9 char)  not null,
   mgr      number(4, 0)                constraint emp_mgr_fk references emp,
   hiredate date              not null,
   sal      number(7, 2)      not null,
   comm     number(7, 2),
   deptno   number(2, 0)      not null  constraint emp_deptno_fk references dept,
   ext      json(object)
);

insert into dept (deptno, dname, loc)
values (10, 'ACCOUNTING', 'NEW YORK'),
       (20, 'RESEARCH',   'DALLAS'),
       (30, 'SALES',      'CHICAGO'),
       (40, 'OPERATIONS', 'BOSTON');
commit;
       
insert into emp (empno, ename, job, mgr, hiredate, sal, comm, deptno)
values (7566, 'JONES',  'MANAGER',   7839, date '1981-04-02', 2975, null, 20),
       (7698, 'BLAKE',  'MANAGER',   7839, date '1981-05-01', 2850, null, 30),
       (7782, 'CLARK',  'MANAGER',   7839, date '1981-06-09', 2450, null, 10),
       (7788, 'SCOTT',  'ANALYST',   7566, date '1987-04-19', 3000, null, 20),
       (7902, 'FORD',   'ANALYST',   7566, date '1981-12-03', 3000, null, 20),
       (7499, 'ALLEN',  'SALESMAN',  7698, date '1981-02-20', 1600,  300, 30),
       (7521, 'WARD',   'SALESMAN',  7698, date '1981-02-22', 1250,  500, 30),
       (7654, 'MARTIN', 'SALESMAN',  7698, date '1981-09-28', 1250, 1400, 30),
       (7844, 'TURNER', 'SALESMAN',  7698, date '1981-09-08', 1500,    0, 30),
       (7900, 'JAMES',  'CLERK',     7698, date '1981-12-03',  950, null, 30),
       (7934, 'MILLER', 'CLERK',     7782, date '1982-01-23', 1300, null, 10),
       (7369, 'SMITH',  'CLERK',     7902, date '1980-12-17',  800, null, 20),
       (7839, 'KING',   'PRESIDENT', null, date '1981-11-17', 5000, null, 10),
       (7876, 'ADAMS',  'CLERK',     7788, date '1987-05-23', 1100, null, 20);
commit;

Session altered.


Table EMP dropped.


Table DEPT dropped.


Table DEPT created.


Table EMP created.


4 rows inserted.


Commit complete.


14 rows inserted.


Commit complete.

There are a few changes to the well-known dept and emp tables I’d like to highlight:

  • Firstly, primary keys and foreign keys. They are required for the duality views, however, they do not need to be enabled.
  • Secondly, both tables got an additional ext column. The data type is json(object). Before 23.4 there was just a generic json data type. With 23.4 it’s possible to add modifiers. The modifier object is required for flex columns in duality views.

2. Read-Only View à la 19c

Let’s step back and create a view that returns a single JSON column, as in Oracle Database 19c (to make it work in 19c you have to use for example ext clob check (ext is json) instead of ext json(object) in the tables dept and emp).

2) Read-only view à la 19c
create or replace view dept_v as
select json_object(
          deptno,
          dname,
          loc,
          ext,
          'sal': (select sum(sal) from emp where emp.deptno = dept.deptno)
          absent on null
       ) as data
  from dept;

select * from dept_v;
View DEPT_V created.


DATA
--------------------------------------------------------------
{"deptno":10,"dname":"ACCOUNTING","loc":"NEW YORK","sal":8750}
{"deptno":20,"dname":"RESEARCH","loc":"DALLAS","sal":10875}
{"deptno":30,"dname":"SALES","loc":"CHICAGO","sal":9400}
{"deptno":40,"dname":"OPERATIONS","loc":"BOSTON"}

3. Read-Only Duality View

And now let’s try to use the previous subquery in a duality view.

3a) Read-only duality view (ORA-40616)
create or replace json duality view dept_dv as
select json_object(
          deptno,
          dname,
          loc,
          ext,
          'sal': (select sum(sal) from emp where emp.deptno = dept.deptno)
          absent on null
       ) as data
  from dept;
ORA-40616: Cannot create JSON Relational Duality View 'DEPT_DV': using ABSENT ON NULL in JSON_OBJECT() is not permitted.

This does not work. The absent on null clause on line 8 is not supported in a duality view. Let’s remove this line and try again.

3b) Read-only duality view (ORA-40895)
create or replace json duality view dept_dv as
select json_object(
          deptno,
          dname,
          loc,
          ext,
          'sal': (select sum(sal) from emp where emp.deptno = dept.deptno)
       ) as data
  from dept;
ORA-40895: invalid SQL expression in JSON relational duality view (operators except JSON_OBJECT or JSON_ARRAYAGG not allowed)

This still does not work. It’s not allowed to include an aggregate as in line 7. Let’s also remove this line and try again.

3c) Read-only duality view (ORA-40941)
create or replace json duality view dept_dv as
select json_object(
          deptno,
          dname,
          loc,
          ext
       ) as data
  from dept;
ORA-40941: cannot specify a column name or subquery alias for JSON relational duality view

Argh. We cannot use the column alias data on line 7. Let’s remove the alias and try again.

3d) Read-only duality view (ORA-42647)
create or replace json duality view dept_dv as
select json_object(
          deptno,
          dname,
          loc,
          ext
       )
  from dept;
ORA-42647: Missing '_id' field at the root level for JSON-relational duality view 'DEPT_DV'.

Okay, an _id field is required to identify a document. Composite keys can be passed as a JSON array. In this case, we do not need that. We can use deptno. Let’s amend the query and try again.

3e) Read-only duality view
create or replace json duality view dept_dv as
select json_object(
          '_id': deptno,
          dname,
          loc,
          ext
       )
  from dept;

select * from dept_dv;
Json DUALITY created.


DATA
----------------------------------------------------------------------------------------------------------------------------------
{"_id":10,"dname":"ACCOUNTING","loc":"NEW YORK","_metadata":{"etag":"38CBB37294BEE09C6D9867B5B1871FE2","asof":"000025652FBC556F"}}
{"_id":20,"dname":"RESEARCH","loc":"DALLAS","_metadata":{"etag":"1D1973E9B068183129F7DA59F6A9C283","asof":"000025652FBC556F"}}
{"_id":30,"dname":"SALES","loc":"CHICAGO","_metadata":{"etag":"11CC2AE352D52FFDAE0B7A3DFC99F836","asof":"000025652FBC556F"}}
{"_id":40,"dname":"OPERATIONS","loc":"BOSTON","_metadata":{"etag":"B33BBA9A74C046813C59BA0763AD81C9","asof":"000025652FBC556F"}}

Finally, we have a working read-only duality view.

Please note that each document contains a _metadata object. The etag field is a hash value based on all document fields by default. It can be used for optimistic locking. The asof field represents the SCN (system change number). It’s useful for read consistency, this means querying related data in subsequent queries using the flashback_query_clause.

4. Updateable Duality View Using SELECT

Let’s create an updateable duality view that represents all data in our model and uses all relationships.

The highlighted lines contain clauses that work only in a duality view. In other words, the “select” part does not work as a standalone statement as in common relational views.

4) Updateable duality view using select
create or replace json duality view dept_dv as
select json {
          '_id': deptno,
          dname,
          loc,
          ext as flex,
          'emps':
             (
                select json_arrayagg(
                          JSON {
                             emp.empno,
                             emp.ename,
                             emp.job,
                             unnest
                                (
                                   select json {
                                             'mgr'    : mgr.empno with nocheck,
                                             'mgrname': mgr.ename with nocheck
                                          }
                                     from emp mgr
                                    where mgr.empno = emp.mgr
                                ),
                             emp.hiredate,
                             emp.sal,
                             emp.comm,
                             ext as flex
                          }
                       )
                  from emp with insert update delete
                 where emp.deptno = dept.deptno
             )
       }
  from dept with insert update delete;

select json_serialize(data returning clob pretty) as data
  from dept_dv dv
 where dv.data."_id".numberOnly() in (20, 40);
Json DUALITY created.


DATA
------------------------------------------------
{
  "_id" : 20,
  "_metadata" :
  {
    "etag" : "88A4A9648C2CA1752E477545DCA85FD3",
    "asof" : "00002565300350E5"
  },
  "dname" : "RESEARCH",
  "loc" : "DALLAS",
  "emps" :
  [
    {
      "empno" : 7369,
      "ename" : "SMITH",
      "job" : "CLERK",
      "mgr" : 7902,
      "mgrname" : "FORD",
      "hiredate" : "1980-12-17T00:00:00",
      "sal" : 800,
      "comm" : null
    },
    {
      "empno" : 7566,
      "ename" : "JONES",
      "job" : "MANAGER",
      "mgr" : 7839,
      "mgrname" : "KING",
      "hiredate" : "1981-04-02T00:00:00",
      "sal" : 2975,
      "comm" : null
    },
    {
      "empno" : 7788,
      "ename" : "SCOTT",
      "job" : "ANALYST",
      "mgr" : 7566,
      "mgrname" : "JONES",
      "hiredate" : "1987-04-19T00:00:00",
      "sal" : 3000,
      "comm" : null
    },
    {
      "empno" : 7876,
      "ename" : "ADAMS",
      "job" : "CLERK",
      "mgr" : 7788,
      "mgrname" : "SCOTT",
      "hiredate" : "1987-05-23T00:00:00",
      "sal" : 1100,
      "comm" : null
    },
    {
      "empno" : 7902,
      "ename" : "FORD",
      "job" : "ANALYST",
      "mgr" : 7566,
      "mgrname" : "JONES",
      "hiredate" : "1981-12-03T00:00:00",
      "sal" : 3000,
      "comm" : null
    }
  ]
}

{
  "_id" : 40,
  "_metadata" :
  {
    "etag" : "28E9C49240CD26A29FDED7B253A38ED7",
    "asof" : "00002565300350E5"
  },
  "dname" : "OPERATIONS",
  "loc" : "BOSTON",
  "emps" :
  [
  ]
}

Since the ext column in the dept and emp table is empty for all rows, we do not see any additional fields in the two JSON documents.

Due to the unest clause in line 14, the fields mgr and mgrname appear on the same level as all other fields of the table emp.

5. Updateable Duality View Using GraphQL

The next duality view is equivalent to the one in the previous chapter.

The highlighted lines with annotations match the highlighted clauses in the previous statement.

GraphQL requires a model for a query. The Oracle implementation uses tables, primary, and foreign keys to build the underlying model.

Some explanations
  • In line 2, we use the dept table as root. For select, insert, update and delete. This means we get a JSON document per department.
  • In line 8, we use the emp table for the field emps. We expect an array of objects. However, we do not have to tell that explicitly. The Oracle Database will figure that out. There is just one relationship between dept and emp. Therefore it is clear how to join the tables and access the data. For select, insert, update and delete.
  • In line 13, we use the emp table for the fields mgr and mgrname. The access is possible via the foreign keys emp_mgr_fk and emp_deptno_fk. We know that it is emp_mgr_fk but the Oracle Database does not. We have to tell it. We do that with the @link(from: [mgr]) annotation, to use the mgr field for the recursive join. An update of mgrname is not allowed (it is read-only by default). An update of mgr is allowed (it will update the foreign key column in emp).
5) Updateable duality view using GraphQL
create or replace json duality view dept_dv as
dept @insert @update @delete
{
   _id: deptno
   dname
   loc
   ext @flex
   emps: emp @insert @update @delete
      {
         empno
         ename
         job
         emp @unnest @link(from: [mgr])
            {
               mgr    : empno @nocheck
               mgrname: ename @nocheck
            }
         hiredate
         sal
         comm
         ext @flex
      }
};

select json_serialize(data returning clob pretty) as data
  from dept_dv dv
 where dv.data."_id".numberOnly() in (20, 40);
Json DUALITY created.


DATA
------------------------------------------------
{
  "_id" : 20,
  "_metadata" :
  {
    "etag" : "88A4A9648C2CA1752E477545DCA85FD3",
    "asof" : "00002565301D6D5C"
  },
  "dname" : "RESEARCH",
  "loc" : "DALLAS",
  "emps" :
  [
    {
      "empno" : 7369,
      "ename" : "SMITH",
      "job" : "CLERK",
      "mgr" : 7902,
      "mgrname" : "FORD",
      "hiredate" : "1980-12-17T00:00:00",
      "sal" : 800,
      "comm" : null
    },
    {
      "empno" : 7566,
      "ename" : "JONES",
      "job" : "MANAGER",
      "mgr" : 7839,
      "mgrname" : "KING",
      "hiredate" : "1981-04-02T00:00:00",
      "sal" : 2975,
      "comm" : null
    },
    {
      "empno" : 7788,
      "ename" : "SCOTT",
      "job" : "ANALYST",
      "mgr" : 7566,
      "mgrname" : "JONES",
      "hiredate" : "1987-04-19T00:00:00",
      "sal" : 3000,
      "comm" : null
    },
    {
      "empno" : 7876,
      "ename" : "ADAMS",
      "job" : "CLERK",
      "mgr" : 7788,
      "mgrname" : "SCOTT",
      "hiredate" : "1987-05-23T00:00:00",
      "sal" : 1100,
      "comm" : null
    },
    {
      "empno" : 7902,
      "ename" : "FORD",
      "job" : "ANALYST",
      "mgr" : 7566,
      "mgrname" : "JONES",
      "hiredate" : "1981-12-03T00:00:00",
      "sal" : 3000,
      "comm" : null
    }
  ]
}

{
  "_id" : 40,
  "_metadata" :
  {
    "etag" : "28E9C49240CD26A29FDED7B253A38ED7",
    "asof" : "00002565301D6D5C"
  },
  "dname" : "OPERATIONS",
  "loc" : "BOSTON",
  "emps" :
  [
  ]
}

The result is the same as for the variant based on select. The only difference is the asof fields, which is expected.

6. GraphQL vs. SELECT

What I like about the GraphQL variant is that the syntax is simple. It looks similar to the result document and is easy to read. No compromises due to feature parity. I run less risk of trying SQL expressions that are not applicable in a duality view. The downside is that the definition might become ambiguous when extending the model with additional foreign key relationships. You might need to add @link annotations to your existing duality views to successfully recreate them. The variant using select cannot become ambiguous. There is no default join logic in SQL yet.

However, writing complex duality views might be easier with the variant using select. I can temporarily comment out all duality-view-specific clauses to make the select part work as a standalone statement until I’m happy with the result.

From a performance point of view, it should theoretically not matter which syntax variant you use. Any duality view can be built on GraphQL or select. The optimizer has all the information it needs to produce an optimal execution plan for both variants. I see no reason why the internal representation should differ.

Maybe a future version of the Oracle Database will offer options to generate the preferred syntax variant independently of the originally deployed variant. By extending dbms_metadata.get_ddl, for example.

7. Insert Into Duality View

Here’s an example of inserting a JSON document with one department and two employees into the previously created duality view.

Some explanations
  • In line 6, we populate a department field named secret with the boolean value true. This field does not exist in the model. Therefore it will be stored in the ext column of the dept table.
  • In line 13, we set the mgr field to 1. That’s the foreign key column in the emp table.
  • In line 16 we populate an employee field named tools with an array. The field does not exist in the model. Therefore it will be stored in the ext column of the emp table.
7) Insert into duality view (extending the schema)
insert into dept_dv values ('
{
  "_id" : 50,
  "dname" : "MI6",
  "loc" : "LONDON",
  "secret" : true,
  "emps" :
  [
    {
      "empno" : 7,
      "ename" : "BOND",
      "job" : "AGENT",
      "mgr" : 1,
      "hiredate" : "1950-01-01T00:00:00",
      "sal" : 500,
      "tools" : ["Knife", "Garrote Watch", "Walther PPK"]
    },
    {
      "empno" : 1,
      "ename" : "M",
      "job" : "MANAGER",
      "hiredate" : "1940-01-01T00:00:00",
      "sal" : 1000,
      "comm" : 8000
    }
  ]
}
');
commit;

select * from dept where deptno = 50;
select * from emp where deptno = 50;
select json_serialize(data returning clob pretty) as data 
  from dept_dv dv
 where dv.data.secret.booleanOnly();
1 row inserted.


Commit complete.


    DEPTNO DNAME          LOC           EXT
---------- -------------- ------------- --------------------
        50 MI6            LONDON        {"secret":true}


     EMPNO ENAME      JOB              MGR HIREDATE          SAL       COMM     DEPTNO EXT
---------- ---------- --------- ---------- ---------- ---------- ---------- ---------- ---------------------------------------------------
         1 M          MANAGER              1940-01-01       1000       8000         50
         7 BOND       AGENT              1 1950-01-01        500                    50 {"tools":["Knife","Garrote Watch","Walther PPK"]}


DATA
------------------------------------------------
{
  "_id" : 50,
  "_metadata" :
  {
    "etag" : "486256AA33D638F4D339FFE534EC910F",
    "asof" : "0000256530950ADB"
  },
  "dname" : "MI6",
  "loc" : "LONDON",
  "emps" :
  [
    {
      "empno" : 1,
      "ename" : "M",
      "job" : "MANAGER",
      "hiredate" : "1940-01-01T00:00:00",
      "sal" : 1000,
      "comm" : 8000
    },
    {
      "empno" : 7,
      "ename" : "BOND",
      "job" : "AGENT",
      "mgr" : 1,
      "mgrname" : "M",
      "hiredate" : "1950-01-01T00:00:00",
      "sal" : 500,
      "comm" : null,
      "tools" :
      [
        "Knife",
        "Garrote Watch",
        "Walther PPK"
      ]
    }
  ],
  "secret" : true
}

One insert statement leads to three new rows in two tables. Before 23ai, a view and an instead-of trigger would have been required for this.

The fields secret and tools are automatically stored in the flex columns ext. This shows how easy it is to extend the data model on the fly with an insert statement. Without DDL statements. Without PL/SQL code.

8. Update Duality View

Let’s update the previously created document.

Some explanations
  • In line 4, we add a new field named street for the department 50.
  • In line 6, we change the salary for all employees in the department 50 by the factor of 42.
  • In line 9, we increase the salary of BOND by 1. Please note that this is the second change of the salary for this employee in this update statement.
  • In line 10, we append the Aston Martin DB5 to the list of BOND‘s tools.
8) Update duality view (extending the schema again)
update dept_dv v
   set v.data = json_transform(
                   v.data, 
                   set '$.street' = '85 Albert Embankment',
                   nested '$.emps[*]' (
                      set '@.sal' = path '@.sal * 42'
                   ),
                   nested '$.emps[*]?(@.ename == "BOND")' (
                      set '@.sal' = path '@.sal + 1',
                      append '@.tools' = 'Aston Martin DB5'
                   )
                )
 where v.data."_id".numberOnly() = 50;
commit;

select * from dept where deptno = 50;
select * from emp where deptno = 50;
select json_serialize(data returning clob pretty) as data 
  from dept_dv v
 where v.data."_id".numberOnly() = 50;
1 row updated.


Commit complete.


    DEPTNO DNAME          LOC           EXT
---------- -------------- ------------- -----------------------------------------------
        50 MI6            LONDON        {"secret":true,"street":"85 Albert Embankment"}


     EMPNO ENAME      JOB              MGR HIREDATE          SAL       COMM     DEPTNO EXT
---------- ---------- --------- ---------- ---------- ---------- ---------- ---------- ----------------------------------------------------------------------
         1 M          MANAGER              1940-01-01      42000       8000         50
         7 BOND       AGENT              1 1950-01-01      21001                    50 {"tools":["Knife","Garrote Watch","Walther PPK","Aston Martin DB5"]}


DATA
------------------------------------------------
{
  "_id" : 50,
  "_metadata" :
  {
    "etag" : "1591A6C3A20C4BC85C0498F7B1F4031F",
    "asof" : "000025653374F0C7"
  },
  "dname" : "MI6",
  "loc" : "LONDON",
  "emps" :
  [
    {
      "empno" : 1,
      "ename" : "M",
      "job" : "MANAGER",
      "hiredate" : "1940-01-01T00:00:00",
      "sal" : 42000,
      "comm" : 8000
    },
    {
      "empno" : 7,
      "ename" : "BOND",
      "job" : "AGENT",
      "mgr" : 1,
      "mgrname" : "M",
      "hiredate" : "1950-01-01T00:00:00",
      "sal" : 21001,
      "comm" : null,
      "tools" :
      [
        "Knife",
        "Garrote Watch",
        "Walther PPK",
        "Aston Martin DB5"
      ]
    }
  ],
  "secret" : true,
  "street" : "85 Albert Embankment"
}

9. Delete From Duality View

And now, let’s delete department 50 with all its employees to restore the original content of the dept and emp tables.

9) Delete from duality view
delete dept_dv v
 where v.data."_id".numberOnly() = 50;
commit;
1 row deleted.


Commit complete.


    DEPTNO DNAME          LOC           EXT
---------- -------------- ------------- --------------------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON


     EMPNO ENAME      JOB              MGR HIREDATE          SAL       COMM     DEPTNO EXT
---------- ---------- --------- ---------- ---------- ---------- ---------- ---------- --------------------
      7782 CLARK      MANAGER         7839 1981-06-09       2450                    10
      7839 KING       PRESIDENT            1981-11-17       5000                    10
      7934 MILLER     CLERK           7782 1982-01-23       1300                    10
      7369 SMITH      CLERK           7902 1980-12-17        800                    20
      7566 JONES      MANAGER         7839 1981-04-02       2975                    20
      7788 SCOTT      ANALYST         7566 1987-04-19       3000                    20
      7876 ADAMS      CLERK           7788 1987-05-23       1100                    20
      7902 FORD       ANALYST         7566 1981-12-03       3000                    20
      7499 ALLEN      SALESMAN        7698 1981-02-20       1600        300         30
      7521 WARD       SALESMAN        7698 1981-02-22       1250        500         30
      7654 MARTIN     SALESMAN        7698 1981-09-28       1250       1400         30
      7698 BLAKE      MANAGER         7839 1981-05-01       2850                    30
      7844 TURNER     SALESMAN        7698 1981-09-08       1500          0         30
      7900 JAMES      CLERK           7698 1981-12-03        950                    30

14 rows selected.

Outlook

One thing is missing in version 0.9 of IslandSQL grammar. The support of PL/pgSQL in PostgreSQL statements create function, create procedure, create trigger and do. These statements can already be parsed but the PL/pgSQL code passed as string is not further analyzed. This will change in the next and final episode of this season.

The post IslandSQL Episode 9: GraphQL, JSON and Flexible Schemas With Duality Views appeared first on Philipp Salvisberg's Blog.

IslandSQL Final Episode 10: Parsing PL/pgSQL

Introduction

IslandSQL is a parser for SQL files targeting OracleDB or PostgreSQL. The parser is available on Maven Central and can process SQL*Plus, SQLcl or psql statements besides SQL statements. However, the focus is on statements with static DML statements and code in PL/SQL and PL/pgSQL. For static code analysis, for example.

In PostgreSQL create function, create procedure and do accept code as a string. This simplifies parsing and the implementation of additional languages. As a result, you can write functions and procedures in SQL, PL/pgSQL, PL/Tcl, PL/Perl and PL/Python in any standard PostgreSQL distribution.

Starting with IslandSQL version 0.10 it’s possible to parse SQL and PL/pgSQL in strings and extend the parse tree accordingly. In this blog post, I will explain how this works.

This VS Code extension uses IslandSQL in a language server to report syntax errors and to produce parse trees as shown in this blog post.

PL/pgSQL as String

Let’s look at a do statement executing a PL/pgSQL block provided as a string.

1) hello_world.sql
do '
begin
   raise notice $$Hello World!$$;
end
';
NOTICE:  Hello World!
DO

The parse tree in IslandSQL version 0.9 looks as follows:

Image may be NSFW.
Clik here to view.
Parse tree with PL/pgSQL as string

Look at the parse tree. It is quite simple. Interesting is the PL/pgSQL block. The content in single quotes is represented as a single token. The big rectangle at the bottom. It’s a single token regardless of the code size.

PL/pgSQL as Subtree

Parsing PL/pgSQL as a string is easy. The lexer produces the token for the string containing the PL/pgSQL code. The parser does not need to understand the PL/pgSQL at all.

But this does not help analyse the PL/pgSQL code. We need a parser that

  • understands PL/pgSQL
  • parses PL/pgSQL code provided as a string
  • extends the main parse tree by sub-parse trees with PL/pgSQL code

The IslandSQL parser in version 0.10 does exactly that. It produces this parse tree by default:

Image may be NSFW.
Clik here to view.
Parse tree with PL/pgSQL as string and as subtree

The PL/pgSQL code is represented twice in this parse tree. Once as a single token. And once as a node named postgresqlPlpgsqlCode.

PL/pgSQL as Subtree Only

Maybe you do not like it when PL/pgSQL code is represented twice in the parse tree. In this case, you can override the default options when creating an IslandSQL document.

The IslandSQL library requires Java 8 or newer. In the next example, we use Java 22 to reduce boilerplate code. Implicitly declared classes are still in preview. Hence we have to pass the parameters --enable-preview --source 22 when running the HelloWorld.java program.

Explaining HelloWorld.java

In line 13 we create the IslandSQL document with a series of parameters. One of them is removeCode(true) in line 19 which oversteers the default behaviour and removes the code as a string from the parse tree.

In line 21 we print a profile. Thanks to Cary Millsap, I can’t deal with performance issues without thinking about a bicycle, a Ferrari and kissing… The profile helps to understand where the parser has spent its time.

Finally, in lines 22 and 23 we print the parse tree. Once as a simple textual hierarchy and once as a DOT graph. You can visualise the result in Edotor.net, for example.

2) HelloWorld.java
import ch.islandsql.grammar.IslandSqlDialect;
import ch.islandsql.grammar.IslandSqlDocument;
import ch.islandsql.grammar.util.ParseTreeUtil;

void main() {
    var source = """
            do '
            begin
               raise notice $$Hello World!$$;
            end
            ';
            """;
    var doc = new IslandSqlDocument.Builder()
            .sql(source)
            .hideOutOfScopeTokens(false)
            .dialect(IslandSqlDialect.POSTGRESQL)
            .profile(true)
            .subtrees(true)
            .removeCode(true)
            .build();
    System.out.println(doc.getParserMetrics().printProfile());
    System.out.println(ParseTreeUtil.printParseTree(doc.getFile()));
    System.out.println(ParseTreeUtil.dotParseTree(doc.getFile()));
}
java --enable-preview --source 22 -cp .:islandsql-0.10.0.jar:antlr4-runtime-4.13.1.jar HelloWorld.java
Profile
=======

Total memory used by parser    : 5’164 KB
Total time spent in parser     : 27.918 ms
Total time recorded by profiler: 14.167 ms (100%)

Rule Name (Decision)                          Time (ms) Percent Invocations Lookahead Max Lookahead Ambiguities Errors
---------------------------------------- -------------- ------- ----------- --------- ------------- ----------- ------
string (1887)                                    10.302   72.72           2         0             0           0      0
plsqlStatement (1129)                             1.337    9.44           1         0             0           0      0
postgresqlDo (576)                                0.750    5.30           1         0             0           0      0
postgresqlPlpgsqlCode (6)                         0.735    5.19           2         0             0           0      0
statement (11)                                    0.566    3.99           1         0             0           0      0
sqlEnd (1888)                                     0.314    2.21           1         0             0           0      0
postgresqlRaiseStatement (1260)                   0.134    0.95           1         0             0           0      0
postgresqlRaiseStatement (1269)                   0.030    0.21           1         0             0           0      0

file
  statement
    doStatement
      postgresqlDo
        K_DO:do
        postgresqlPlpgsqlCode
          K_BEGIN:begin
          plsqlStatement
            postgresqlRaiseStatement
              K_RAISE:raise
              raiseLevel
                K_NOTICE:notice
              string:dollarString
                DOLLAR_STRING:$$Hello World!$$
              SEMI:;
          K_END:end
      sqlEnd
        SEMI:;
  <EOF>

digraph islandSQL {
  bgcolor="transparent"
  "1155757579" [shape=ellipse label="file" style=filled fillcolor="#bfe6ff" fontname="Helvetica"]
  "1155757579" -> "1785111044"
  "1155757579" -> "1482748887"
  "1785111044" [shape=ellipse label="statement" style=filled fillcolor="#bfe6ff" fontname="Helvetica"]
  "1785111044" -> "494894055"
  "494894055" [shape=ellipse label="doStatement" style=filled fillcolor="#bfe6ff" fontname="Helvetica"]
  "494894055" -> "1123226989"
  "494894055" -> "500885941"
  "1123226989" [shape=ellipse label="postgresqlDo" style=filled fillcolor="#bfe6ff" fontname="Helvetica"]
  "1123226989" -> "1115381650"
  "1123226989" -> "616412281"
  "1115381650" [shape=box label="do" style=filled fillcolor="#fadabd" fontname="Helvetica"]
  "616412281" [shape=ellipse label="postgresqlPlpgsqlCode" style=filled fillcolor="#bfe6ff" fontname="Helvetica"]
  "616412281" -> "2118096382"
  "616412281" -> "878861517"
  "616412281" -> "746394140"
  "2118096382" [shape=box label="begin" style=filled fillcolor="#fadabd" fontname="Helvetica"]
  "878861517" [shape=ellipse label="plsqlStatement" style=filled fillcolor="#bfe6ff" fontname="Helvetica"]
  "878861517" -> "1705665942"
  "1705665942" [shape=ellipse label="postgresqlRaiseStatement" style=filled fillcolor="#bfe6ff" fontname="Helvetica"]
  "1705665942" -> "1731763384"
  "1705665942" -> "1100619942"
  "1705665942" -> "87242619"
  "1705665942" -> "864248990"
  "1731763384" [shape=box label="raise" style=filled fillcolor="#fadabd" fontname="Helvetica"]
  "1100619942" [shape=ellipse label="raiseLevel" style=filled fillcolor="#bfe6ff" fontname="Helvetica"]
  "1100619942" -> "285074186"
  "285074186" [shape=box label="notice" style=filled fillcolor="#fadabd" fontname="Helvetica"]
  "87242619" [shape=ellipse label="dollarString" style=filled fillcolor="#bfe6ff" fontname="Helvetica"]
  "87242619" -> "15892131"
  "15892131" [shape=box label="$$Hello World!$$" style=filled fillcolor="#fadabd" fontname="Helvetica"]
  "864248990" [shape=box label=";" style=filled fillcolor="#fadabd" fontname="Helvetica"]
  "746394140" [shape=box label="end" style=filled fillcolor="#fadabd" fontname="Helvetica"]
  "500885941" [shape=ellipse label="sqlEnd" style=filled fillcolor="#bfe6ff" fontname="Helvetica"]
  "500885941" -> "484841769"
  "484841769" [shape=box label=";" style=filled fillcolor="#fadabd" fontname="Helvetica"]
  "1482748887" [shape=box label="<EOF>" style=filled fillcolor="#fadabd" fontname="Helvetica"]
}

The parse tree looks now like this:

Image may be NSFW.
Clik here to view.
Parse tree with PL/pgSQL as subtree only

SQL Dialect

In line 16 of the previous HelloWorld.java program we set the dialect to POSTGRESQL. Is that required? – No, it’s not. But when do we need to set the SQL dialect in IslandSQL? – When the lexical incompatibility between OracleDB and PostgreSQL leads to syntax errors in the code to be parsed.

What? – Let me explain.

Identifiers

OracleDB and PostgreSQL use different characters to build an identifier. The following table shows the differences. The allowed characters are listed in square brackets. Read \p{Alpha} as any alphabetic letter in the character set of the database.

DBMSAllowed as First CharacterAllowed in Subsequent Characters
OracleDB[\p{Alpha}][_$#0-9\p{Alpha}]
PostgreSQL[_\p{Alpha}][_$0-9\p{Alpha}]

PostgreSQL allows identifiers that start with an underscore. That’s not a problem. However, OracleDB allows the hash sign (#) to be used in an identifier. That leads to unexpected results when the PostgreSQL code uses the bitwise XOR operator without spaces around it. Here’s an example

3) One or two identifiers in select_list?
select a#b from t;

PostgreSQL expects that the columns a and b exists in table t.

OracleDB expects that the column a#b exists in table t.

In this case, IslandSQL can parse the code without errors. However, the parse tree might not look as expected. That’s a documented limitation and cannot be influenced by setting the dialect. At least not in version 0.10 of IslandSQL. Nevertheless, it shows the impact of a lexical incompatibility.

Inquiry Directives

PL/SQL supports predefined and custom Inquiry Directives. These directives are lexically incompatible with PostgreSQL dollar-quoted string constants.

Here’s an example:

4) Using custom inquiry directives
alter session set plsql_ccflags = 'custom1:41, custom2:42';
begin
   dbms_output.put_line($$custom1);
   dbms_output.put_line($$custom2 || '(2)');
end;

By default, this causes a parse error, because $$custom1);\n dbms_output.put_line($$ is identified as a dollar-quoted string constant by the lexer. As a result, the code is interpreted like this:

5) Visualising how two custom inquiry directives are treated as a string
alter session set plsql_ccflags = 'custom1:41, custom2:42';
begin
   dbms_output.put_line('custom1);
   dbms_output.put_line('custom2 || '(2)');
end;

This makes it clearer, why we get a syntax error in line 4 at custom2.

In this case, we have to set the SQL dialect to ORACLEDB to parse custom_inquiry_directives.sql without errors.

Predefined Inquiry Directives

To simplify the use, we want to avoid specifying the SQL dialect. One way to achieve that is to handle predefined inquiry directives in the GENERIC SQL dialect.

Here’s an example, that does not report syntax errors:

6) Using predefined inquiry directives
begin
   dbms_output.put_line($$plsql_line);
   dbms_output.put_line($$plsql_line || '(2)');
end;

We know the predefined inquiry directives and can deal with them in the lexer.

However, this special treatment can cause problems in corner cases like this one:

7) Using the name of a predefined inquiry directive at the start of a dollar-quoted string constant
do '
begin
   raise notice $$plsql_line is a predefined inquiry directive$$;
end
';

In such cases, we should use the POSTGRESQL dialect to avoid parse errors.

Detect SQL Dialect

Is there a way to detect the SQL dialect of an SQL input automatically? Sure. Several. I’m sure there is a way to use an LLM to get a reasonable result. I’m more of a rule-based guy. So we could parse the code with one dialect and on parse errors try other dialects. If all dialects produce errors, we could choose the one with the fewest errors.

This sounds costly, right? Therefore I decided to start with a simple SQL dialect detection mechanism. The current implementation looks like this:

8) Excerpt of IslandSqlDocument.java in version 0.10.0
private static IslandSqlDialect guessDialect(String sql) {
    return sql.contains("\n/\n") ? IslandSqlDialect.ORACLEDB : IslandSqlDialect.GENERIC;
}

In other words, if the code contains a slash followed by a newline character in the first column of a line, then we go with the ORACLEDB dialect. In all other cases, we go with the GENERIC SQL dialect.

My first version was even simpler. However, I had to add a final \n to the search string to ensure that SQL code containing multiline comments is not recognized as ORACLEDB dialect. Files with Windows newline characters are always recognized as GENERIC. That’s the price when trying to keep things simple and fast.

Now, the ORACLEDB SQL dialect is correctly detected in the next example. As a result, no syntax errors are reported.

9) Using custom inquiry directives in a PL/SQL Block ending on slash
alter session set plsql_ccflags = 'custom1:41, custom2:42';
begin
   dbms_output.put_line($$custom1);
   dbms_output.put_line($$custom2 || '(2)');
end;
/

The SQL dialect detection mechanism kicks in when no SQL dialect is specified (null).

Conclusion

Developing a single grammar for OracleDB and PostgreSQL was an interesting work. I learned a lot about the underlying DBMS. I often looked at the grammar documentation and did not understand it fully. So I had to run the provided examples or create some myself. The typical test cases are based on working examples, extended by tests based on the cartesian product of a subset of clauses that verified if I had defined the order, the optionality, and the cardinality according to the documentation.

A challenge is the undocumented stuff. There are various reasons why something is not documented. In the end, it does not matter why something is missing. The parser fails when processing code that works but should not according to the docs. This cannot be covered by tests based on the official documentation. I found some bugs while processing real-life code. And I expect to find more.

This is the season finale of IslandSQL. There won’t be a second season. However, there might be some spin-offs since I plan to build products based on IslandSQL. Therefore I plan to keep the parser compatible with the latest versions of OracleDB and PostgreSQL.

Feedback is welcome. Please leave your comments on this blog post or open an issue in the IslandSQL GitHub repository for questions, bugs or feature requests.

Thank you.

The post IslandSQL Final Episode 10: Parsing PL/pgSQL appeared first on Philipp Salvisberg's Blog.

How Many Bytes Are in an Emoji?

What Is a Byte?

A byte is made up of 8 bits. And in the old days, it represented a character. If you use a single-byte character set such as WE8MSWIN1252, WE8ISO8859P15 or similar, it still is.

What Is a Character?

We can find definitions for example on Wikipedia and in the documentation for the Oracle Database. The current Unicode standard (version 16) defines 154,998 characters. The CodeChart.pdf is part of the standard and describes all those characters on 3113 pages.

Why Does It Matter?

A character is a part of a string and we store strings in the database. The SQL standard defines a <character string type> with an optional <character maximum length> in octets (bytes) or characters. In other words, the size of a character string is defined by the number of characters it contains. In the Oracle Database, we also have a hard limit of 4000 or 32767 bytes for data types like char, varchar or varchar2 depending on the max_string_size parameter. So we should know exactly what the size of a string data type means.

Querying UTF-8 Characters

The following query provides details about a bunch of UTF-8 characters. I ran the query in my Oracle Database 23.5 with an AL32UTF8 character set.

with
   data (name, value) as (values
      ('Latin capital letter A',                       'A'),
      ('Dollar sign',                                  '$'),
      ('Copyright sign',                               '©'),
      ('Pound sign',                                   '£'),
      ('Euro sign',                                    '€'),
      ('Double exlamation mark',                       '‼'),
      ('Trademark sign',                               '™'),
      ('Grinning face',                                '😀'),
      ('Double exclamation mark (emoji)',              '‼️'),
      ('Trademark sign (emoji)',                       '™️'),
      ('Information',                                  'ℹ️'),
      ('No entry',                                     '⛔'),
      ('Woman',                                        '👩'),
      ('Woman with white hair, medium-dark skin tone', '👩🏾‍🦳'),
      ('Man',                                          '👨'),
      ('Girl',                                         '👧'),
      ('Boy',                                          '👦'),
      ('Family',                                       '👨‍👩‍👧‍👦'),
      ('Kiss: woman, man, medium-light skin tone',     '👩🏼‍❤️‍💋‍👨🏼')
   )
select name,
       value,
       length(value)  as len_in_chars,
       lengthb(value) as len_in_bytes,
       substr(dump(value, 16), instr(dump(value, 16), ':') + 2) as bytes_as_hex_list
  from data;
NAME                                         VALUE LEN_IN_CHARS LEN_IN_BYTES BYTES_AS_HEX_LIST                                                         
-------------------------------------------- ----- ------------ ------------ --------------------------------------------------------------------------------------------------------
Latin capital letter A                       A                1            1 41                                                                        
Dollar sign                                  $                1            1 24                                                                        
Copyright sign                               ©                1            2 c2,a9                                                                     
Pound sign                                   £                1            2 c2,a3                                                                     
Euro sign                                    €                1            3 e2,82,ac                                                                  
Double exlamation mark                       ‼                1            3 e2,80,bc                                                                  
Trademark sign                               ™                1            3 e2,84,a2                                                                  
Grinning face                                😀               1            4 f0,9f,98,80                                                               
Double exclamation mark (emoji)              ‼️               2            6 e2,80,bc,ef,b8,8f                                                         
Trademark sign (emoji)                       ™️               2            6 e2,84,a2,ef,b8,8f                                                         
Information                                  ℹ️               2            6 e2,84,b9,ef,b8,8f                                                         
No entry                                     ⛔               1            3 e2,9b,94                                                                  
Woman                                        👩               1            4 f0,9f,91,a9                                                               
Woman with white hair, medium-dark skin tone 👩🏾‍🦳               4           15 f0,9f,91,a9,f0,9f,8f,be,e2,80,8d,f0,9f,a6,b3
Man                                          👨               1            4 f0,9f,91,a8                                                               
Girl                                         👧               1            4 f0,9f,91,a7                                                               
Boy                                          👦               1            4 f0,9f,91,a6                                                               
Family                                       👨‍👩‍👧‍👦               7           25 f0,9f,91,a8,e2,80,8d,f0,9f,91,a9,e2,80,8d,f0,9f,91,a7,e2,80,8d,f0,9f,91,a6
Kiss: woman, man, medium-light skin tone     👩🏼‍❤️‍💋‍👨🏼              10           35 f0,9f,91,a9,f0,9f,8f,bc,e2,80,8d,e2,9d,a4,ef,b8,8f,e2,80,8d,f0,9f,92,8b,e2,80,8d,f0,9f,91,a8,f0,9f,8f,bc

19 rows selected. 
Value vs. Character

The len_in_chars column should make it clear that the thing represented in the value column is not a character. It’s a grapheme – the smallest functional unit of a writing system. A grapheme can be built by more than one character.

Monospaced Font

I’m using a monospaced font for code in this blog. However, some graphemes are wider than others. The fixed-width font no longer works as expected. This is why the result is not nicely formatted. Using spaces to format a result grid does not work anymore. Most emojis use more than two spaces but less than three spaces.

Same Looking Graphemes

The (Double exclamation mark) can be represented with 3 bytes or 6 bytes. The additional three bytes are efb88f. It’s a variation selector U+FE0F. It marks a “normal” character as emoji. As a result, we expect the emoji to look different. However, the representation depends on the font and in this case on the browser. Just because two graphemes look the same does not mean they are identical.

Codepoint to Bytes

A UTF-8 character is defined as a code point. Wikipedia describes quite well how a code point is converted into a byte sequence. I’ve created a PL/SQL package to convert a code point to bytes and vice versa. It’s available as Gist on GitHub.

Here’s an example of how to use it:

select utf8.codepoint_to_bytes('U+FE0F') as to_bytes,
       utf8.bytes_to_codepoint('EFB88F') as to_codepoint;
TO_BYTES TO_CODEPOINT
-------- ------------
EFB88F   U+FE0F      
Skin tones and Joined Emojies

For various emojis, you can define a skin tone. This increases the size of a grapheme. The 👩🏾‍🦳 (Woman with white hair, medium-dark skin tone) consists of the following 4 characters:

  • 👩 (Woman): U+1F469, 4 bytes
  • 🏾 (Medium-Dark Skin Tone Modifier): U+1F3FE, 4 bytes
  • Zero Width Joiner: U+200D, 3 bytes
  • 🦳 (White Hair): U+1F9B3, 4 bytes
Emoji Breakdown via SQL

The following query shows the characters of the largest three emojis in my example set.

with
   data (seq, emoji, bytes) as (values
      (1, '👩🏾‍🦳', json[4, 4, 3, 4]),
      (2, '👨‍👩‍👧‍👦', json[4, 3, 4, 3, 4, 3, 4]),
      (3, '👩🏼‍❤️‍💋‍👨🏼', json[4, 4, 3, 3, 3, 3, 4, 3, 4, 4])
   )
select d.emoji,
       j.seq as part,
       substr(d.emoji, j.seq, 1) as e_char,
       sum(j.bytes) over (partition by d.seq order by j.seq) as e_char_len,
       substr(dump(substr(d.emoji, j.seq, 1), 16), 14) as e_char_bytes
  from data d,
       json_table(
          d.bytes, '$[*]'
          columns (
             seq for ordinality,
             bytes number path '$'
          )
       ) j
 order by d.seq, j.seq;
EMOJI       PART E_CHAR E_CHAR_LEN E_CHAR_BYTES
----- ---------- ------ ---------- ------------
👩🏾‍🦳             1 👩              4 f0,9f,91,a9 
👩🏾‍🦳             2 🏾               8 f0,9f,8f,be 
👩🏾‍🦳             3 ‍               11 e2,80,8d    
👩🏾‍🦳             4 🦳             15 f0,9f,a6,b3 
👨‍👩‍👧‍👦             1 👨              4 f0,9f,91,a8 
👨‍👩‍👧‍👦             2 ‍                7 e2,80,8d    
👨‍👩‍👧‍👦             3 👩             11 f0,9f,91,a9 
👨‍👩‍👧‍👦             4 ‍               14 e2,80,8d    
👨‍👩‍👧‍👦             5 👧             18 f0,9f,91,a7 
👨‍👩‍👧‍👦             6 ‍               21 e2,80,8d    
👨‍👩‍👧‍👦             7 👦             25 f0,9f,91,a6 
👩🏼‍❤️‍💋‍👨🏼             1 👩              4 f0,9f,91,a9 
👩🏼‍❤️‍💋‍👨🏼             2 🏼               8 f0,9f,8f,bc 
👩🏼‍❤️‍💋‍👨🏼             3 ‍               11 e2,80,8d    
👩🏼‍❤️‍💋‍👨🏼             4 ❤              14 e2,9d,a4    
👩🏼‍❤️‍💋‍👨🏼             5 ️               17 ef,b8,8f    
👩🏼‍❤️‍💋‍👨🏼             6 ‍               20 e2,80,8d    
👩🏼‍❤️‍💋‍👨🏼             7 💋             24 f0,9f,92,8b 
👩🏼‍❤️‍💋‍👨🏼             8 ‍               27 e2,80,8d    
👩🏼‍❤️‍💋‍👨🏼             9 👨             31 f0,9f,91,a8 
👩🏼‍❤️‍💋‍👨🏼            10 🏼              35 f0,9f,8f,bc 

21 rows selected. 

Depending on your browser the result may or may not show the skin tone modifier emoji.

Conclusion

Theoretically, a grapheme can be built by an unlimited number of characters. As far as I know, 10 characters and 35 bytes for the kiss emoji 👩🏼‍❤️‍💋‍👨🏼 with two persons and a skin tone modifier are currently the maximum. This makes sizing a column with emojis a bit challenging.

Going for the maximum byte size is still a bad idea IMO. We would lose important information about our data. Even if the number of characters is not a perfect fit for a string containing emojis, it still gives the consumers an idea of how large a string can be. This helps when writing reports and similar.

I reckon columns with emojis are still the exception in an Oracle Database. However, it is good to know that an emoji can take up to 35 times more bytes than a regular Latin character.

The post How Many Bytes Are in an Emoji? appeared first on Philipp Salvisberg's Blog.

PL/SQL vs. JavaScript in the Oracle Database 23ai #JoelKallmanDay

JavaScript is the first language supported by the Multilingual Engine (MLE) in Oracle Database 23ai. Having additional languages in the Oracle Database allows us to use existing libraries within the database. Also, it makes it easier for those without PL/SQL skills to get started with database development. Wasn’t that also the argument for Java in the database? What is easier and better in JavaScript than in Java? How performant are JavaScript modules? When is JavaScript a good alternative to PL/SQL and when is it not?

This is a translation of my German article “PL/SQL oder JavaScript in der Oracle Datenbank 23ai?” published in the Red Stack Magazine No. 6/2024 on 11 October 2024.

Why Do We Need Code in the Database?

I see the following reasons for this.

  1. We bring the code to the data rather than the data to the code. This allows us to process the data efficiently on the database server and deliver the result to the client in just a few network round trips. This uses fewer resources, is more cost-effective and faster than if we had to transport the data to the client and process it there.
  2. We take responsibility for the quality of the data stored in the database. Typically, we write data once and read it often. Therefore, we should store data correctly so that consumers can rely on the data when they read it. In this sense, the logic for validating the data belongs in the database. This logic is often more extensive than what today’s database constraints provide. In other words, we need code in the database as part of an API to keep our data consistent and correct.

Even if your database applications do not follow the principles of SmartDB or PinkDB, there are benefits to selectively using code in the database. And if your company policy categorically forbids code in the database, it is probably time to reconsider.

PL/SQL Without SQL

Let’s pretend we need a function to convert a timestamp to Unix time. Wikipedia defines the Unix time as follows.

Unix time is a date and time representation widely used in computing. It measures time by the number of non-leap seconds that have elapsed since 00:00:00 UTC on 1 January 1970, the Unix epoch.

Listing 1 shows how we can implement this in PL/SQL.

Listing 1: to_epoch_plsql
create or replace function to_epoch_plsql(
   in_ts in timestamp
) return number is
   co_epoch_date constant timestamp with time zone := 
      timestamp '1970-01-01 00:00:00 UTC';
   l_interval    interval day(9) to second (3);
begin
   l_interval := in_ts - co_epoch_date;
   return 1000 * (extract(second from l_interval)
         + extract(minute from l_interval) * 60
         + extract(hour from l_interval) * 60 * 60
         + extract(day from l_interval) * 60 * 60 * 24);
end;
/

The to_epoch_plsql function expects a timestamp, a timestamp with timezone would be better. We have omitted this to keep the example as simple as possible. Although the solution may seem simple, we are reimplementing existing functionality. We had to find out how Unix time works, what role time zones play, what leap seconds are for, and that Unix time is used in milliseconds, not seconds.

Wouldn’t it be nice to be able to use an existing, tested function in the database to keep our application’s code to a minimum? Even though there is no to_epoch function in SQL, the Java Development Kit (JDK) offers such functionality. The Oracle Java Virtual Machine (OJVM) is an embedded component of the Oracle Database 23.5. It’s a JDK version 11.

Java Without SQL

The Oracle Database has supported Java stored procedures since version 8i Release 1. This means that we can provide a to_epoch_java function as shown in Listing 2.

Listing 2: to_epoch_java
create or replace and compile java source named "Util" as
public class Util {
   public static long toEpoch(java.sql.Timestamp ts) {
      return ts.getTime();
   }
}
/
create or replace function to_epoch_java(in_ts in timestamp) 
  return number is language java name 
    'Util.toEpoch(java.sql.Timestamp) return java.lang.long';
/

For Java, we need to create a class and a call specification. The purpose of the call specification is, among other things, to map the types of input and output data between SQL and Java. For example, to map the return value of java.lang.long to number.

The code no longer contains the formula for converting a timestamp to Unix time, but it is quite extensive. Is there an easy way in JavaScript?

JavaScript Without SQL

With Oracle Database 23ai, we can create JavaScript modules.

Listing 3: to_epoch_js
create or replace mle module util_mod language javascript as   
export function toEpoch(ts) {
   return ts.valueOf();
}
/
create or replace function to_epoch_js(in_ts in timestamp) 
   return number is 
      mle module util_mod 
      signature 'toEpoch(Date)';
/

The implementation of to_epoch_js in Listing 3 is similar to to_epoch_java. A module in JavaScript and then an MLE call specification. However, it is no longer necessary to fully map the input data types in JavaScript. The Oracle Database defines default values. These can be overridden, but do not need to be explicitly defined as in Java. It is not possible to map the output data type. In this case, it must be possible to convert the return value to a number, otherwise, a runtime error will occur.

However, the implementation for this simple case is quite extensive. Oracle has probably realised this and provided an alternative.

Listing 4: to_epoch_js2 – inline MLE call specification
create or replace function to_epoch_js2("in_ts" in timestamp)
   return number is
      mle language javascript ' return in_ts.valueOf();';
/

The to_epoch_js2 function in Listing 4 is equivalent to to_epoch_js but significantly simpler than any of the other variants. However, an inline MLE call specification is only applicable to JavaScript code without dependencies on other modules, this means for JavaScript code without an import command.

Performance of to_epoch_...

Anyone who has worked with Java stored procedures in the database knows that the initialisation of the OJVM in a new database session slows down the response time considerably. This is not the case with MLE because it uses a GraalVM native image. Simply put, it reads only the memory contents of a file, similar to waking your laptop from hibernation. This makes it possible to start a Java program within a millisecond. The native image is integrated into the database as a shared library $ORACLE_HOME/lib/libmle.so. This means that the MLE provides JavaScript via Java, but is completely independent of the OJVM.

In Figure 1 we compare the runtimes of 100,000 function calls. Instead of seconds, we use a normalised unit of time, which makes comparing easier and results less dependent on the hardware stack used.

All experiments were performed using the Oracle Database 23.5 Free Edition on an AMD Ryzen R1600 processor-based system. The shortest time of five repetitions was taken into account. You can reproduce these experiments using the scripts in this GitHub repository.

Image may be NSFW.
Clik here to view.
Figure 1: Runtime of 100K calls of to_epoch_...
Figure 1: Runtime of 100K calls of to_epoch_…

The to_epoch_plsql, to_epoch_java and to_epoch_js variants are called from a PL/SQL loop. This means 100,000 context switches between PL/SQL and Java or JavaScript. The fourth variant, to_epoch_jsloop, calls toEpoch from a JavaScript loop. In this case, the context switching between PL/SQL and JavaScript makes processing about 50 times slower.

Based on these results, we should avoid context switching between PL/SQL and JavaScript if possible. The performance of JavaScript in the database is impressive in this case. Quite different from the OJVM.

Memory Usage of to_epoch_...

Figure 2 shows the maximum memory used at the end of a to_epoch_… function call. The call was made in a new database session and contains the memory requirements of the measuring instruments.

Image may be NSFW.
Clik here to view.
Max. Memory Usage After Single Call
Figure 2: Max. memory usage after a single call

JavaScript uses significantly more memory than PL/SQL. If you take this into account when sizing the database server and connection pools, this should not be a problem nowadays.

Using a 3rd Party JavaScript Library

Let’s say we want to validate email addresses in our database without actually sending a test email. The rules for a valid email address are quite extensive. In the JavaScript ecosystem, we can find open-source libraries for such requirements that can be used in the database without modification. For this example, we use validator.js, which can validate not only email addresses but also credit card numbers, EAN, IBAN and much more. Using SQLcl’s script command, we can load npm modules directly into the database.

Listing 5: Load validator.js from npm as validator_mod into the database
script https://raw.githubusercontent.com/PhilippSalvisberg/mle-sqlcl/main/mle.js install validator_mod https://esm.run/validator@13.12.0 13.12.0

select version, language_name, length(module)
  from user_mle_modules
 where module_name = 'VALIDATOR_MOD';
VERSION    LANGUAGE_NAME       LENGTH(MODULE)
---------- ---------------- ----------------- 
13.12.0    JAVASCRIPT                  123260

The script mle.js in Listing 5 is not read from the local file system as usual but via a URL from GitHub. The script creates a JavaScript module validator_mod with the contents of the URL https://esm.run/validator@13.12.0, which is a minimised, browser-optimised version of the validator.js module in the npm software registry. The last parameter 13.12.0 is the version of the module stored in the Oracle Data Dictionary.

In Listing 6, we create the MLE call specification in a PL/SQL package. The is_mail function accepts only a string as a parameter. The validator options are defined in the package body. This simplifies uniform use in the database application.

Listing 6: PL/SQL package validator_api
create or replace package validator_api is
   function is_email(
      in_email in varchar2
   ) return boolean deterministic;
end validator_api;
/
create or replace package body validator_api is
   function is_email_internal(
      in_email   in varchar2,
      in_options in json
   ) return boolean deterministic as mle module validator_mod 
   signature 'default.isEmail(string, any)';

   function is_email(
      in_email in varchar2
   ) return boolean deterministic is
   begin
      return is_email_internal(
                in_email   => in_email,
                in_options => json('
                   {
                      "allow_display_name": false,
                      "allow_undescores": false,
                      "require_display_name": false,
                      "allow_utf8_local_part": true,
                      "require_tld": true,
                      "allow_ip_domain": false,
                      "domain_specific_validation": false,
                      "blacklisted_chars": "",
                      "ignore_max_length": false,
                      "host_blacklist": ["dubious.com"],
                      "host_whitelist": []
                   }
                ')
             );
   end is_email;
end validator_api;
/

Listing 7 shows the use of the validator in SQL. The second email address is invalid because of the allow_display_name option. The third e-mail address is formally correct, but it uses a domain listed under host_blacklist.

Listing 7: Validate email addresses
select e_mail, validator_api.is_email(e_mail) as is_valid
  from (values
          ('esther.muster@example.com'),
          ('Esther Muster <esther.muster@example.com>'),
          ('esther.muster@dubious.com')
       ) test_data (e_mail);
E_MAIL                                      IS_VALID
----------------------------------------- ----------
esther.muster@example.com                          1
Esther Muster <esther.muster@example.com>          0
esther.muster@dubious.com                          0

JavaScript With SQL

The MLE provides a global variable session of type IConnection to communicate with the current database session. Listing 8 shows an example of a simple update statement using bind variables.

Listing 8: increase_salary_js
create or replace mle module increase_salary_mod 
language javascript as
export function increase_salary(deptno, by_percent) {
   session.execute(`
      update emp
         set sal = sal + sal * :by_percent / 100
         where deptno = :deptno`, [by_percent, deptno]);
}
/
create or replace procedure increase_salary_js(
   in_deptno     in number,
   in_by_percent in number
) as mle module increase_salary_mod
signature 'increase_salary(number, number)';
/

PL/SQL With SQL

Listing 9 shows the PL/SQL counterpart to the JavaScript code in Listing 8, using dynamic SQL with bind variables.

Listing 9: increase_salary_dplsql
create or replace procedure increase_salary_dplsql(
   in_deptno     in number,
   in_by_percent in number
) is
begin
   execute immediate '
      update emp
         set sal = sal + sal * :by_percent / 100
      where deptno = :deptno' using in_by_percent, in_deptno;
end increase_salary_dplsql;
/

Experienced PL/SQL developers would not write it this way, as syntax and semantic errors are only thrown at runtime. In addition, it is more expensive for the Oracle Database to execute dynamic SQL, and the use of database objects is not stored in the Oracle Data Dictionary. Instead, experienced PL/SQL developers use static SQL whenever possible and sensible. The code is shorter and SQL injection is impossible. Listing 10 shows the static SQL variant.

Listing 10: increase_salary_plsql
create or replace procedure increase_salary_plsql(
   in_deptno     in number,
   in_by_percent in number
) is
begin
   update emp
      set sal = sal + sal * in_by_percent / 100
    where deptno = in_deptno;
end increase_salary_plsql;
/

Performance of increase_salary_...

Figure 3 compares the runtimes of 100,000 procedure calls in a PL/SQL loop. Only increase_salary_jsloop uses a JavaScript loop. This avoids 100,000 context switches between PL/SQL and JavaScript. In other words, the difference between increase_salary_js and increase_salary_jsloop is the cost of 100,000 context switches.

Image may be NSFW.
Clik here to view.
Runtime of 100,000 procedure calls
Figure 3: Runtime of 100,000 procedure calls

In the Oracle Database version 23.5, JavaScript is about 5 to 6 times slower than PL/SQL in this example when we use dynamic SQL. In the Oracle Database version 23.3, the difference was a factor of 7, which makes me optimistic that we can expect further performance improvements in future versions.

Based on these experiments, it is difficult to make general statements about the performance differences between PL/SQL and JavaScript. However, it appears that PL/SQL code with SQL statements has an advantage over JavaScript.

MLE Environment

Accessing the network using the JavaScript Fetch API is possible, If the appropriate permissions have been granted using the PL/SQL package dbms_network_acl_admin. However, for security reasons, JavaScript cannot access the database server’s file system.

JavaScript runtime environments such as Node.js, Deno, Bun or web browsers, access the file system to import other JavaScript modules. For that, you need an MLE environment in the Oracle Database. Listing 11 shows how to create and use it.

Listing 11: Using an MLE environment
create or replace mle env demo_env
    imports(
        'increase_salary' module increase_salary_mod,
        'validator'       module validator_mod,
        'util'            module util_mod
    )
    language options  'js.strict=true, js.console=false,
                       js.polyglot-builtin=true'
;
create or replace mle module increase_salary_loop_mod 
language javascript as   
import {increase_salary} from "increase_salary";
export function increase_salary_loop(deptno,by_percent,times){
   for (let i=0; i<times; i++) {
      increase_salary(deptno, by_percent);
   }
}
/
create or replace procedure increase_salary_jsloop(
   in_deptno     in number,
   in_by_percent in number,
   in_times      in number
) as mle module increase_salary_loop_mod
env demo_env
signature 'increase_salary_loop(number, number, number)';
/

The MLE environment demo_env maps the import name increase_salary to the MLE module increase_salary_mod. This import is used in the MLE module increase_salary_loop_mod. However, the MLE environment is not assigned there. This is only done in the MLE call specification increase_salary_jsloop.

MLE environments allow JavaScript code to be structured in the same way inside and outside the database. In most cases, a single MLE environment will be sufficient for an application. Multiple MLE environments are required if different language options are used per module, or if different versions of a module are to be loaded with the same import name.

Is Tom Kyte’s Mantra Still Valid?

One of the things Tom Kyte is famous for is his mantra. There are several variations, but all have the same message. This variant is from Expert Oracle Database Architecture, Third Edition, 2014. On page 3 he writes:

I have a pretty simple mantra when it comes to developing database software, one that has been consistent for many years:

  • You should do it in a single SQL statement if at all possible. And believe it or not, it is almost always possible. This statement is even truer as time goes on. SQL is an extremely powerful language.
  • If you can’t do it in a single SQL Statement, do it in PL/SQL—as little PL/SQL as possible! Follow the saying that goes “more code = more bugs, less code = less bugs.”
  • If you can’t do it in PL/SQL, try a Java stored procedure. The times this is necessary are extremely rare nowadays with Oracle9i and above. PL/SQL is an extremely competent, fully featured 3GL.
  • If you can’t do it in Java, do it in a C external procedure. This is most frequently the approach when raw speed or using a third-party API written in C is needed.
  • If you can’t do it in a C external routine, you might want to seriously think about why it is you need to do it.

With Oracle Database 23ai, I would put JavaScript on the same level as PL/SQL. Before Java, definitely. Furthermore, it is not just a question of whether something can be done in SQL or PL/SQL. If we need a functionality that already exists in the JavaScript ecosystem, we should consider using it rather than reimplementing it in SQL or PL/SQL just because it’s possible. Ultimately, it is also about the maintainability of the application and the technical debt we are incurring.

Conclusion

MLE was introduced as an experimental feature at Oracle Open World 2017. Since then, MLE and the underlying GraalVM technology have been continuously improved and have reached a good, production-ready state in Oracle Database 23ai. It is ideally suited for integrating existing, tested functionality from the JavaScript ecosystem into the Oracle Database.

We still need to figure out how to develop, test, debug and deploy JavaScript with SQL. In any case, JavaScript is a real alternative to PL/SQL, even if PL/SQL scores with static SQL and better performance.

The post PL/SQL vs. JavaScript in the Oracle Database 23ai #JoelKallmanDay appeared first on Philipp Salvisberg's Blog.


Evolution of a SQL Domain for Semantic Versioning

1. Introduction

In my current project, I use an SQL domain to implement the formatting and precedence rules for Semantic Versioning. I started with a simple implementation covering only the most basic rules. Getting the sorting right is key in my project. It allows me to identify the latest compatible version of an artefact. After adding some tests I started to evolve the functionality to a point where I can say that the implementation covers everything needed for my use case.

Domains make data models easier to understand and reduce the logic regarding data consistency and visualisation. In this blog post, I demonstrate this by evolving a domain for Semantic Versioning. This should give you an impression of whether keeping domains and table columns in sync is more complicated than traditional approaches.

2. What Are Domains?

A data use case domain (aka SQL domain, aka domain) is a new feature in the Oracle Database 23ai based on the SQL standard. Domains abstract column properties so that they can be used across multiple tables. They come in different flavours:

  • Single-column domain (constraints for a single column, enums, display and order expressions)
  • Multi-column domain (constraints across multiple columns, display and order expressions)
  • Flexible domain (abstract domain, delegating functionality to concrete domains via discriminator columns)

It’s important to notice, that a table column can have only one domain. It is therefore not possible to combine single-column domains with multi-column or flexible domains.

3. Starting Model

3.1 Domains

First, let’s create three domains.

1) create domains app_identifier, app_file_name and app_semantic_version
create domain if not exists app_identifier
   as raw(16) strict
   display lower(substr(rawtohex(app_identifier), 1, 8)
           || '-' || substr(rawtohex(app_identifier), 9, 4)
           || '-' || substr(rawtohex(app_identifier), 13, 4)
           || '-' || substr(rawtohex(app_identifier), 17, 4)
           || '-' || substr(rawtohex(app_identifier), 21, 12));

create domain if not exists app_file_name
   as varchar2(128 char) strict;

create domain if not exists app_semantic_version
   as varchar2(20 byte) strict
   -- naïve, incomplete implementation of semantic versioning
   constraint app_semantic_version_has_major_minor_patch_ck
      check (regexp_like(app_semantic_version, '\d{1,6}\.\d{1,6}\.\d{1,6}'))
   order to_char(to_number(regexp_substr(app_semantic_version, '\d+', 1, 1)), 'FM000000')
      || '.' || to_char(to_number(regexp_substr(app_semantic_version, '\d+', 1, 2)), 'FM000000')
      || '.' || to_char(to_number(regexp_substr(app_semantic_version, '\d+', 1, 3)), 'FM000000');
Domain APP_IDENTIFIER created.


Domain APP_FILE_NAME created.


Domain APP_SEMANTIC_VERSION created.

The domain app_identifier defines the data type for a GUID-based identifier. It also defines the display format.

The domain app_file_name defines only the data type used for file names.

The domain app_semantic_version is an incomplete implementation of Semantic Versioning 2.0.0 (we will fix that later). It defines the data type, a check constraint app_semantic_version_has_major_minor_patch_ck and an order expression.

3.2 Table

Let’s create a table using all these domains and insert a few rows.

2) create table with some data
create table if not exists app_files (
    file_id      app_identifier       default sys_guid() not null,
    file_name    app_file_name                           not null,
    file_version app_semantic_version                    not null,
    constraint app_files_pk primary key (file_id),
    constraint app_files_uk1 unique (file_name, file_version)
);

desc app_files

insert into app_files
   (file_name, file_version)
values
   ('file1.txt', '1.9.0'),
   ('file1.txt', '1.10.0'),
   ('file1.txt', '1.11.0'),
   ('file1.txt', '2.0.0'),
   ('file2.txt', '0.0.7'),
   ('file2.txt', '0.0.42');
Table APP_FILES created.

Name         Null?    Type                                     
------------ -------- ---------------------------------------- 
FILE_ID      NOT NULL RAW(16 BYTE) DOMAIN APP_IDENTIFIER       
FILE_NAME    NOT NULL VARCHAR2(128 CHAR) DOMAIN APP_FILE_NAME  
FILE_VERSION NOT NULL VARCHAR2(20) DOMAIN APP_SEMANTIC_VERSION 

6 rows inserted.

Look at the result of the desc app_files command. Each column has a concrete data type inherited from the domain, as well as the domain associated with it.

3.3 Demo

Now, let’s query the data to demonstrate the domain functionality.

3) demo domain functionality
column file_name format a9
select domain_display(file_id) as file_id,
       file_name,
       file_version,
       domain_order(file_version) as file_version_order
  from app_files
 order by file_name, file_version_order desc;
FILE_ID                              FILE_NAME FILE_VERSION         FILE_VERSION_ORDER     
------------------------------------ --------- -------------------- -----------------------
29ed8fd9-57c7-4b4d-e063-03e0a8c091f8 file1.txt 2.0.0                000002.000000.000000   
29ed8fd9-57c6-4b4d-e063-03e0a8c091f8 file1.txt 1.11.0               000001.000011.000000   
29ed8fd9-57c5-4b4d-e063-03e0a8c091f8 file1.txt 1.10.0               000001.000010.000000   
29ed8fd9-57c4-4b4d-e063-03e0a8c091f8 file1.txt 1.9.0                000001.000009.000000   
29ed8fd9-57c9-4b4d-e063-03e0a8c091f8 file2.txt 0.0.42               000000.000000.000042   
29ed8fd9-57c8-4b4d-e063-03e0a8c091f8 file2.txt 0.0.7                000000.000000.000007   

6 rows selected. 

You see the file version is sorted according to the specification in 11.2:

“Precedence is determined by the first difference when comparing each of these identifiers from left to right as follows: Major, minor, and patch versions are always compared numerically.”

4. Test

The semantic versioning specification covers a lot of ground that our simple implementation can barely cover. So let’s test it, to find out what is missing. We use utPLSQL, of course.

4.1 Define Test

Let’s create a utPLSQL test PL/SQL package.

4) utPLSQL test package
create or replace package test_app_domains is
   --%suite

   --%test
   procedure test_app_semantic_version;
end test_app_domains;
/

create or replace package body test_app_domains is
   procedure test_app_semantic_version is
      c_actual   sys_refcursor;
      c_expected sys_refcursor;
   begin
      open c_actual for
         select input_col,
                to_char(domain_check(app_semantic_version, input_col)) as test_check,
                case when domain_check(app_semantic_version, input_col) then domain_order(cast(input_col as app_semantic_version)) end as test_order
           from (values
                   -- https://semver.org/#spec-item-2
                   ('1.9.0'),
                   ('1.10.0'),
                   ('1.11.0'),
                   -- https://semver.org/#spec-item-9
                   ('1.0.0-alpha'),
                   ('1.0.0-alpha.1'),
                   ('1.0.0-0.3.7'),
                   ('1.0.0-x.7.z.92'),
                   ('1.0.0-x-y-z.--'),
                   -- https://semver.org/#spec-item-10
                   ('1.0.0-alpha+001'),
                   ('1.0.0+20130313144700'),
                   ('1.0.0-beta+exp.sha.5114f85'),
                   ('1.0.0+21AF26D3----117B344092BD'),
                   -- https://semver.org/#spec-item-11
                   ('1.0.0'),
                   ('2.0.0'),
                   ('2.1.0'),
                   ('2.1.1'),
                   ('1.0.0-alpha'),
                   ('1.0.0-alpha.1'),
                   ('1.0.0-alpha.beta'),
                   ('1.0.0-beta'),
                   ('1.0.0-beta.2'),
                   ('1.0.0-beta.11'),
                   ('1.0.0-rc.1'),
                   -- invalid
                   ('01.0.0'),
                   ('1.00.0'),
                   ('1.0.00'),
                   ('1.0.x')
                ) as t(input_col);
      open c_expected for
         select *
           from (values
                   -- https://semver.org/#spec-item-2
                   ('1.9.0',                          'TRUE',  '000001.000009.000000(1)'),
                   ('1.10.0',                         'TRUE',  '000001.000010.000000(1)'),
                   ('1.11.0',                         'TRUE',  '000001.000011.000000(1)'),
                   -- https://semver.org/#spec-item-9
                   ('1.0.0-alpha',                    'TRUE',  '000001.000000.000000(0)-alpha'),
                   ('1.0.0-alpha.1',                  'TRUE',  '000001.000000.000000(0)-alpha.000001'),
                   ('1.0.0-0.3.7',                    'TRUE',  '000001.000000.000000(0)-000000.000003.000007'),
                   ('1.0.0-x.7.z.92',                 'TRUE',  '000001.000000.000000(0)-x.000007.z.000092'),
                   ('1.0.0-x-y-z.--',                 'TRUE',  '000001.000000.000000(0)-x-y-z.--'),
                   -- https://semver.org/#spec-item-10
                   ('1.0.0-alpha+001',                'TRUE',  '000001.000000.000000(0)-alpha'),
                   ('1.0.0+20130313144700',           'TRUE',  '000001.000000.000000(1)'),
                   ('1.0.0-beta+exp.sha.5114f85',     'TRUE',  '000001.000000.000000(0)-beta'),
                   ('1.0.0+21AF26D3----117B344092BD', 'TRUE',  '000001.000000.000000(1)'),
                   -- https://semver.org/#spec-item-11
                   ('1.0.0',                          'TRUE',  '000001.000000.000000(1)'),
                   ('2.0.0',                          'TRUE',  '000002.000000.000000(1)'),
                   ('2.1.0',                          'TRUE',  '000002.000001.000000(1)'),
                   ('2.1.1',                          'TRUE',  '000002.000001.000001(1)'),
                   ('1.0.0-alpha',                    'TRUE',  '000001.000000.000000(0)-alpha'),
                   ('1.0.0-alpha.1',                  'TRUE',  '000001.000000.000000(0)-alpha.000001'),
                   ('1.0.0-alpha.beta',               'TRUE',  '000001.000000.000000(0)-alpha.beta'),
                   ('1.0.0-beta',                     'TRUE',  '000001.000000.000000(0)-beta'),
                   ('1.0.0-beta.2',                   'TRUE',  '000001.000000.000000(0)-beta.000002'),
                   ('1.0.0-beta.11',                  'TRUE',  '000001.000000.000000(0)-beta.000011'),
                   ('1.0.0-rc.1',                     'TRUE',  '000001.000000.000000(0)-rc.000001'),
                   --invalid
                   ('01.0.0',                         'FALSE', null),
                   ('1.00.0',                         'FALSE', null),
                   ('1.0.00',                         'FALSE', null),
                   ('1.0.x',                          'FALSE', null)
                ) as t(input_col, test_check, test_order);
      ut.expect(c_actual).to_equal(c_expected).join_by('INPUT_COL');
   end test_app_semantic_version;
end test_app_domains;
/
Package TEST_APP_DOMAINS compiled


Package Body TEST_APP_DOMAINS compiled 

The test compares the cursors for the actual and expected results.

You will get a compile error for the package body if you do not have the utPLSQL framework installed on your database instance. See the utPLSQL installation guide for information on how to install utPLSQL. It’s quite simple and works on an Oracle Database 23ai.

4.2 Run Test

Now let’s run the test.

5) run utPLSQL test
set serveroutput on size unlimited;
exec ut.run;
test_app_domains
  test_app_semantic_version [.087 sec] (FAILED - 1)
 
Failures:
 
  1) test_app_semantic_version
      Actual: refcursor [ count = 27 ] was expected to equal: refcursor [ count = 27 ]
      Diff:
      Rows: [ 30 differences, showing first 20 ]
        PK <INPUT_COL>1.9.0</INPUT_COL> - Actual:   <TEST_ORDER>000001.000009.000000</TEST_ORDER>
        PK <INPUT_COL>1.9.0</INPUT_COL> - Expected: <TEST_ORDER>000001.000009.000000(1)</TEST_ORDER>
        PK <INPUT_COL>1.10.0</INPUT_COL> - Actual:   <TEST_ORDER>000001.000010.000000</TEST_ORDER>
        PK <INPUT_COL>1.10.0</INPUT_COL> - Expected: <TEST_ORDER>000001.000010.000000(1)</TEST_ORDER>
        PK <INPUT_COL>1.11.0</INPUT_COL> - Actual:   <TEST_ORDER>000001.000011.000000</TEST_ORDER>
        PK <INPUT_COL>1.11.0</INPUT_COL> - Expected: <TEST_ORDER>000001.000011.000000(1)</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(0)-alpha</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(0)-alpha</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha.1</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha.1</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(0)-alpha.000001</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha.1</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha.1</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(0)-alpha.000001</TEST_ORDER>
        PK <INPUT_COL>1.0.0-0.3.7</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0-0.3.7</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(0)-000000.000003.000007</TEST_ORDER>
        PK <INPUT_COL>1.0.0-x.7.z.92</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0-x.7.z.92</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(0)-x.000007.z.000092</TEST_ORDER>
        PK <INPUT_COL>1.0.0-x-y-z.--</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0-x-y-z.--</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(0)-x-y-z.--</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha+001</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha+001</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(0)-alpha</TEST_ORDER>
        PK <INPUT_COL>1.0.0+20130313144700</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0+20130313144700</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(1)</TEST_ORDER>
        PK <INPUT_COL>1.0.0-beta+exp.sha.5114f85</INPUT_COL> - Actual:   <TEST_CHECK>FALSE</TEST_CHECK><TEST_ORDER/>
        PK <INPUT_COL>1.0.0-beta+exp.sha.5114f85</INPUT_COL> - Expected: <TEST_CHECK>TRUE</TEST_CHECK><TEST_ORDER>000001.000000.000000(0)-beta</TEST_ORDER>
        PK <INPUT_COL>1.0.0+21AF26D3----117B344092BD</INPUT_COL> - Actual:   <TEST_CHECK>FALSE</TEST_CHECK><TEST_ORDER/>
        PK <INPUT_COL>1.0.0+21AF26D3----117B344092BD</INPUT_COL> - Expected: <TEST_CHECK>TRUE</TEST_CHECK><TEST_ORDER>000001.000000.000000(1)</TEST_ORDER>
        PK <INPUT_COL>1.0.0</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(1)</TEST_ORDER>
        PK <INPUT_COL>2.0.0</INPUT_COL> - Actual:   <TEST_ORDER>000002.000000.000000</TEST_ORDER>
        PK <INPUT_COL>2.0.0</INPUT_COL> - Expected: <TEST_ORDER>000002.000000.000000(1)</TEST_ORDER>
        PK <INPUT_COL>2.1.0</INPUT_COL> - Actual:   <TEST_ORDER>000002.000001.000000</TEST_ORDER>
        PK <INPUT_COL>2.1.0</INPUT_COL> - Expected: <TEST_ORDER>000002.000001.000000(1)</TEST_ORDER>
        PK <INPUT_COL>2.1.1</INPUT_COL> - Actual:   <TEST_ORDER>000002.000001.000001</TEST_ORDER>
        PK <INPUT_COL>2.1.1</INPUT_COL> - Expected: <TEST_ORDER>000002.000001.000001(1)</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha</INPUT_COL> - Actual:   <TEST_ORDER>000001.000000.000000</TEST_ORDER>
        PK <INPUT_COL>1.0.0-alpha</INPUT_COL> - Expected: <TEST_ORDER>000001.000000.000000(0[...]
      at "DEMO42.TEST_APP_DOMAINS.TEST_APP_SEMANTIC_VERSION", line 80 ut.expect(c_actual).to_equal(c_expected).join_by('INPUT_COL');
       
Finished in .091311 seconds
1 tests, 1 failed, 0 errored, 0 disabled, 0 warning(s)
 


PL/SQL procedure successfully completed.

The highlighted lines for 1.0.0-alpha+001 explain the problems with our current implementation, which are:

  1. Missing support for pre-release labels (-alpha)
  2. Missing support for build metadata (+001, leading zeros are allowed and preserved)
  3. Missing precedence handling between releases and pre-releases (releases (1) are newer than pre-releases (0))
  4. Missing precedence handling for pre-releases (numeric qualifiers to be compared numerically)
  5. Missing precedence handling for build metadata (to be ignored)

5. Fix

5.1 Drop domain

Before deploying a new version of the domain, we have to drop the existing one. alter domain is not applicable because we need to change the length of the data type and a create or replace syntax variant does not exist for domains.

6) drop domain
drop domain if exists app_semantic_version;
Error starting at line : 1 in command -
drop domain if exists app_semantic_version
Error report -
ORA-11502: The domain APP_SEMANTIC_VERSION to be dropped has dependent objects.

https://docs.oracle.com/error-help/db/ora-11502/11502. 0000 -  "The domain %s to be dropped has dependent objects."
*Cause:    An attempt is made to drop a domain with dependent objects.
*Action:   Drop the domain using the FORCE mode

Okay, that didn’t work, but the error message is good. It lets us know what to do.

So let’s try again with the force option.

7) drop domain with force option
drop domain if exists app_semantic_version force;
Domain APP_SEMANTIC_VERSION dropped.
5.2 Recreate domain

Now we can deploy the fixed variant of the domain.

8) recreate domain
create domain if not exists app_semantic_version
   as varchar2(60 byte) strict
   -- valid examples: '0.13.0', '23.5.0', '123456.789012.345678'
   -- valid pre-release examples: '1.0.0-alpha', '1.0.0-alpha.1', '1.0.0-0.3.7', '1.0.0-x.7.z.92'
   -- valid build metadata examples: '1.0.0+20130313144700', '1.0.0-beta+exp.sha.5114f85'
   -- use suggested regex from https://semver.org/spec/v2.0.0.html#is-there-a-suggested-regular-expression-regex-to-check-a-semver-string without non-capturing groups (?:)
   constraint app_semantic_version_has_major_minor_patch_ck
      check (regexp_like(app_semantic_version, '^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-((0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(\.(0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(\+([0-9a-zA-Z-]+(\.[0-9a-zA-Z-]+)*))?$'))
   -- add leading zeroes to major, minor, patch and numeric qualifiers in pre-release for sorting (normalized semantic version)
   -- supports up to 6 digits for major, minor, patch and numeric qualifiers in pre-release
   order to_char(to_number(regexp_substr(app_semantic_version, '\d+', 1, 1)), 'FM000000')
      || '.' || to_char(to_number(regexp_substr(app_semantic_version, '\d+', 1, 2)), 'FM000000')
      || '.' || to_char(to_number(regexp_substr(app_semantic_version, '\d+', 1, 3)), 'FM000000')
      -- sort pre-release versions (0) before final versions (1)
      -- build metadata is ignored for sorting, they have the same precedence according to the specification
      || case
            when instr(app_semantic_version, '-') > 0
               and (instr(app_semantic_version, '+') = 0 or instr(app_semantic_version, '-') < instr(app_semantic_version, '+'))
            then
               -- sort pre-release according the qualifiers after the hyphen, ignoring build metadata, add leading zeroes to qualifiers starting with a number
               '(0)' || regexp_replace( --  workaround part 2: remove superfluous leading zeroes
                           regexp_replace( -- workaround part 1: add 6 leading zeroes to numeric qualifiers
                              regexp_replace( -- remove build metadata
                                 substr(app_semantic_version, instr(app_semantic_version, '-')),
                                 '\+.+$',
                                 null
                              ),
                              '(\.|\-|^)(\d{1,6})',
                              '\1@000000\2@' -- workaround since \2 in lpad, to_char is not evaluated before calling the function
                           ),
                           '@[0]+(\d{6})@',
                           '\1'
                        )
            else
               '(1)'
         end;
Domain APP_SEMANTIC_VERSION created.

The details of the fix are not so important, besides the fact that the data type changed from varchar2(20 bytes) to varchar2(60 bytes).

This variant should fully support Semantic Versioning 2.0 as long as numeric qualifiers do not require more than 6 digits. This is good enough for my use case.

5.3 Re-test

It’s now time to re-run the previously failed utPLSQL test.

9) re-run utPLSQL test
set serveroutput on size unlimited;
exec ut.run;
test_app_domains
  test_app_semantic_version [.02 sec]
 
Finished in .021919 seconds
1 tests, 0 failed, 0 errored, 0 disabled, 0 warning(s)
 


PL/SQL procedure successfully completed.

Looks good.

5.4 Check Table

Now let’s look at the structure of the app_files table.

10) describe app_files before change
desc app_files
Name         Null?    Type                                    
------------ -------- --------------------------------------- 
FILE_ID      NOT NULL RAW(16 BYTE) DOMAIN APP_IDENTIFIER      
FILE_NAME    NOT NULL VARCHAR2(128 CHAR) DOMAIN APP_FILE_NAME 
FILE_VERSION NOT NULL VARCHAR2(20)                            

The file_version column has a length of 20 instead of 60 bytes, and the association with the app_semantic_version domain is missing.

5.5 Modify Table

We can fix the column length and associate the column with a domain in one go.

11) modify table app_files and describe it
alter table app_files
   modify (file_version varchar2(60 byte) domain app_semantic_version);

desc app_files
Table APP_FILES altered.

Name         Null?    Type                                     
------------ -------- ---------------------------------------- 
FILE_ID      NOT NULL RAW(16 BYTE) DOMAIN APP_IDENTIFIER       
FILE_NAME    NOT NULL VARCHAR2(128 CHAR) DOMAIN APP_FILE_NAME  
FILE_VERSION NOT NULL VARCHAR2(60) DOMAIN APP_SEMANTIC_VERSION                        

This looks good and was quite easy.

Let’s query the data in this table using the new domain variant.

12) query app_files
column file_name format a9
column file_version format a12
column file_version_order format a25
select domain_display(file_id) as file_id,
       file_name,
       file_version,
       domain_order(file_version) as file_version_order
  from app_files
 order by file_name, file_version_order desc;
FILE_ID                              FILE_NAME FILE_VERSION FILE_VERSION_ORDER       
------------------------------------ --------- ------------ -------------------------
29ed8fd9-57c7-4b4d-e063-03e0a8c091f8 file1.txt 2.0.0        000002.000000.000000(1)  
29ed8fd9-57c6-4b4d-e063-03e0a8c091f8 file1.txt 1.11.0       000001.000011.000000(1)  
29ed8fd9-57c5-4b4d-e063-03e0a8c091f8 file1.txt 1.10.0       000001.000010.000000(1)  
29ed8fd9-57c4-4b4d-e063-03e0a8c091f8 file1.txt 1.9.0        000001.000009.000000(1)  
29ed8fd9-57c9-4b4d-e063-03e0a8c091f8 file2.txt 0.0.42       000000.000000.000042(1)  
29ed8fd9-57c8-4b4d-e063-03e0a8c091f8 file2.txt 0.0.7        000000.000000.000007(1)  

6 rows selected. 

The only visible change is the release version indicator –(1) – at the end of the file_version_order column. We need it when we have pre-release versions to determine the correct order.

Let’s add some pre-release versions and versions with build metadata and query the table again.

13) insert pre-release versions and re-query
insert into app_files
   (file_name, file_version)
values
   ('file3.txt', '1.0.0-beta'),
   ('file3.txt', '1.0.0-beta.2'),
   ('file3.txt', '1.0.0-beta.11.2.3.4'),
   ('file3.txt', '1.0.0-beta+exp.sha.5114f85'),
   ('file3.txt', '1.0.0+21AF26D3----117B344092BD');

column file_name format a9
column file_version format a30
column file_version_order format a57
select domain_display(file_id) as file_id,
       file_name,
       file_version,
       domain_order(file_version) as file_version_order
  from app_files
 order by file_name, file_version_order desc;
5 rows inserted.


FILE_ID                              FILE_NAME FILE_VERSION                   FILE_VERSION_ORDER                                       
------------------------------------ --------- ------------------------------ ---------------------------------------------------------
29ed8fd9-57c7-4b4d-e063-03e0a8c091f8 file1.txt 2.0.0                          000002.000000.000000(1)                                  
29ed8fd9-57c6-4b4d-e063-03e0a8c091f8 file1.txt 1.11.0                         000001.000011.000000(1)                                  
29ed8fd9-57c5-4b4d-e063-03e0a8c091f8 file1.txt 1.10.0                         000001.000010.000000(1)                                  
29ed8fd9-57c4-4b4d-e063-03e0a8c091f8 file1.txt 1.9.0                          000001.000009.000000(1)                                  
29ed8fd9-57c9-4b4d-e063-03e0a8c091f8 file2.txt 0.0.42                         000000.000000.000042(1)                                  
29ed8fd9-57c8-4b4d-e063-03e0a8c091f8 file2.txt 0.0.7                          000000.000000.000007(1)                                  
29fb5aeb-44d7-39e2-e063-03e0a8c01f4e file3.txt 1.0.0+21AF26D3----117B344092BD 000001.000000.000000(1)                                  
29fb5aeb-44d5-39e2-e063-03e0a8c01f4e file3.txt 1.0.0-beta.11.2.3.4            000001.000000.000000(0)-beta.000011.000002.000003.000004 
29fb5aeb-44d4-39e2-e063-03e0a8c01f4e file3.txt 1.0.0-beta.2                   000001.000000.000000(0)-beta.000002                      
29fb5aeb-44d3-39e2-e063-03e0a8c01f4e file3.txt 1.0.0-beta                     000001.000000.000000(0)-beta                             
29fb5aeb-44d6-39e2-e063-03e0a8c01f4e file3.txt 1.0.0-beta+exp.sha.5114f85     000001.000000.000000(0)-beta                             

11 rows selected. 

Look at the five highlighted rows. The pre-release labels are considered in the file_version_order column but the build metadata is ignored.

Now we can easily determine the latest version of a file.

14) query latest versions of all files
select file_Name, file_version as latest_file_version
  from (
          select file_name,
                 file_version,
                 domain_order(file_version) as current_version,
                 max(domain_order(file_version)) over (partition by file_name) as max_version
            from app_files
       )
 where current_version = max_version
 order by file_name;
FILE_NAME LATEST_FILE_VERSION                                         
--------- ------------------------------------------------------------
file1.txt 2.0.0                                                       
file2.txt 0.0.42                                                      
file3.txt 1.0.0+21AF26D3----117B344092BD                                     

The domain nicely hides the implementation details of the semantic versioning precedence rules.

6. Alternatives to Domains

The following table shows the domain features and their alternatives within the Oracle Database.

Domain FeatureAlternative
❌ Data type of a domain column❌ Data type of a table column
❌ Enums🤔 Lookup table
❌ Constraint on single-column domain❌ Table constraint
❌ Constraint on multi-column domain❌ Table constraint
❌ Flexible domain❌ Table constraint
❌ Validate JSON column against a schema❌ Table constraint
❌ Collation❌ Table column collation
✅ Annotation✅ Table column annotation
✅ Display expression✅ Function (standalone, package, type)
✅ Order expression✅ Function (standalone, package, type)

The emojis have the following meanings:

  • ✅ A change is possible without impacting the underlying table/data (e.g. alter domain is applicable)
  • 🤔 A change might have an impact on the underlying table/data (e.g. removing/adding enum item)
  • ❌ A change will have an impact on the underlying table/data (e.g. alter table or data migration)

The main difference between the domain features and their alternatives is that domains provide an abstraction in a standardised way. This should make the models easier to understand and therefore easier to maintain.

7. Conclusion

A tough part of evolving data structures is changing data types. Domains neither simplify nor complicate this. You only use an alternative series of statements for a change.

However, reassociating domains with columns can become tedious for domains used in many columns. It’s a good idea to save the usage before dropping a domain. This is certainly a useful area for annotations and helper scripts. Perhaps future versions of the Oracle Database will allow us to disable domain associations instead of dropping them. Similar to constraints. This would be very helpful in preventing the loss of important information and would relieve us of the burden of managing additional metadata.

Furthermore, the lack of support for domains in PL/SQL or virtual columns limits its usefulness. Hopefully, future versions will address these shortcomings.

Nevertheless, I like the idea of domains and will try to apply the following principles in new projects:

  • Use strict single-column domains instead of raw data types in tables whenever possible. This ensures the consistent use of data types (e.g., for identifiers, names, descriptions, etc.).
  • Do not use multi-column domains, as columns can only be associated with one domain. Also, do not use flexible domains (IMO we should avoid designs with discriminator columns).
  • Favour traditional lookup tables over enums (and provide the data as part of the application). Enums become appealing for small and static reference data once domains are supported in PL/SQL.
  • Define a display_expression for raw and string columns, if the data should not be presented “as is” (e.g. GUID).
  • Define an order_expression for columns that have a non-default precedence (e.g. semantic version).

Ask me in two or three years whether this was a good idea.

The post Evolution of a SQL Domain for Semantic Versioning appeared first on Philipp Salvisberg's Blog.

Avoid Implicit Type Conversion in JSON Access

Introduction

Before comparing two values, the Oracle Database automatically ensures that both values have the same data type. It converts one of the values to match the data type of the other value. The SQL Language Reference manual describes when and how implicit data conversions happen. However, Oracle recommends that you convert data types explicitly. This ensures consistent results and better performance.

I am using JSON relational duality views in my current project. During development, I have stumbled across some implicit data conversions that I was unaware of and that are causing poor performance. I will use a simplified example to show you what I mean.

  1. Setup
  2. JSON-relational Duality View
  3. JSON Data Types
  4. Explicit Conversion
  5. JSON Collection View
  6. Conclusion

1. Setup

I’ve tested this example with an Oracle Database 23.6 and 23.7.

In a schema of your choice you can run the following setup script:

1) Setup table dept and emp
drop table if exists emp;
drop table if exists dept;

create table dept (
   deptno number(2, 0)      not null constraint dept_pk primary key,
   dname  varchar2(14 char) not null,
   loc    varchar2(13 char) not null
);

create table emp (
   empno    number(4, 0)      not null  constraint emp_pk primary key,
   ename    varchar2(10 char) not null,
   job      varchar2(9 char)  not null,
   mgr      number(4, 0)                constraint emp_mgr_fk references emp,
   hiredate date              not null,
   sal      number(7, 2)      not null,
   comm     number(7, 2),
   deptno   number(2, 0)      not null  constraint emp_deptno_fk references dept
);

insert into dept (deptno, dname, loc)
values (10, 'ACCOUNTING', 'NEW YORK'),
       (20, 'RESEARCH',   'DALLAS'),
       (30, 'SALES',      'CHICAGO'),
       (40, 'OPERATIONS', 'BOSTON');

insert into emp (empno, ename, job, mgr, hiredate, sal, comm, deptno)
values (7566, 'JONES',  'MANAGER',   7839, date '1981-04-02', 2975, null, 20),
       (7698, 'BLAKE',  'MANAGER',   7839, date '1981-05-01', 2850, null, 30),
       (7782, 'CLARK',  'MANAGER',   7839, date '1981-06-09', 2450, null, 10),
       (7788, 'SCOTT',  'ANALYST',   7566, date '1987-04-19', 3000, null, 20),
       (7902, 'FORD',   'ANALYST',   7566, date '1981-12-03', 3000, null, 20),
       (7499, 'ALLEN',  'SALESMAN',  7698, date '1981-02-20', 1600,  300, 30),
       (7521, 'WARD',   'SALESMAN',  7698, date '1981-02-22', 1250,  500, 30),
       (7654, 'MARTIN', 'SALESMAN',  7698, date '1981-09-28', 1250, 1400, 30),
       (7844, 'TURNER', 'SALESMAN',  7698, date '1981-09-08', 1500,    0, 30),
       (7900, 'JAMES',  'CLERK',     7698, date '1981-12-03',  950, null, 30),
       (7934, 'MILLER', 'CLERK',     7782, date '1982-01-23', 1300, null, 10),
       (7369, 'SMITH',  'CLERK',     7902, date '1980-12-17',  800, null, 20),
       (7839, 'KING',   'PRESIDENT', null, date '1981-11-17', 5000, null, 10),
       (7876, 'ADAMS',  'CLERK',     7788, date '1987-05-23', 1100, null, 20);

begin
   dbms_stats.gather_table_stats(user, 'dept');
   dbms_stats.gather_table_stats(user, 'emp');
end;
/

alter session set nls_date_format = 'YYYY-MM-DD';
column deptno format a6
column dname format a10
column loc format a9
column empid format a8
column empno format a5
column ename format a6
column mgr format a4
column sal format a4
column comm format a4

select * from dept;
select * from emp;
Table EMP dropped.


Table DEPT dropped.


Table DEPT created.


Table EMP created.


4 rows inserted.


14 rows inserted.


PL/SQL procedure successfully completed.


Session altered.


DEPTNO DNAME      LOC      
------ ---------- ---------
    10 ACCOUNTING NEW YORK 
    20 RESEARCH   DALLAS   
    30 SALES      CHICAGO  
    40 OPERATIONS BOSTON   


EMPNO ENAME  JOB        MGR HIREDATE    SAL COMM DEPTNO
----- ------ --------- ---- ---------- ---- ---- ------
 7566 JONES  MANAGER   7839 1981-04-02 2975          20
 7698 BLAKE  MANAGER   7839 1981-05-01 2850          30
 7782 CLARK  MANAGER   7839 1981-06-09 2450          10
 7788 SCOTT  ANALYST   7566 1987-04-19 3000          20
 7902 FORD   ANALYST   7566 1981-12-03 3000          20
 7499 ALLEN  SALESMAN  7698 1981-02-20 1600  300     30
 7521 WARD   SALESMAN  7698 1981-02-22 1250  500     30
 7654 MARTIN SALESMAN  7698 1981-09-28 1250 1400     30
 7844 TURNER SALESMAN  7698 1981-09-08 1500    0     30
 7900 JAMES  CLERK     7698 1981-12-03  950          30
 7934 MILLER CLERK     7782 1982-01-23 1300          10
 7369 SMITH  CLERK     7902 1980-12-17  800          20
 7839 KING   PRESIDENT      1981-11-17 5000          10
 7876 ADAMS  CLERK     7788 1987-05-23 1100          20

14 rows selected. 

The model has primary and foreign keys which are required for the duality views. However, the constraints do not need to be enabled.

2. JSON-relational Duality View

Now let’s create an updateable duality view for the model we created earlier.

2) Create duality view dept_dv
create or replace json duality view dept_dv as
dept @insert @update @delete
{
   _id: deptno
   dname
   loc
   emps: emp @insert @update @delete
      {
         empno
         ename
         job
         emp @unnest @link(from: [mgr])
            {
               mgr    : empno @nocheck
               mgrname: ename @nocheck
            }
         hiredate
         sal
         comm
      }
};
Json duality view DEPT_DV created.

We can query the department 10 as follows:

3) Query deptno 10 in dept_dv
column json_data format a50
alter session disable parallel query;
set pagesize 1000

select json_serialize(dv.data returning varchar2 pretty) as json_data
  from dept_dv dv
 where dv.data."_id" = 10;
Session altered.


JSON_DATA
--------------------------------------------------
{
  "_id" : 10,
  "_metadata" :
  {
    "etag" : "E0146035FE26EE16D4968A21E6350D81",
    "asof" : "00002604AFC8A4BC"
  },
  "dname" : "ACCOUNTING",
  "loc" : "NEW YORK",
  "emps" :
  [
    {
      "empno" : 7782,
      "ename" : "CLARK",
      "job" : "MANAGER",
      "mgr" : 7839,
      "mgrname" : "KING",
      "hiredate" : "1981-06-09T00:00:00",
      "sal" : 2450,
      "comm" : null
    },
    {
      "empno" : 7839,
      "ename" : "KING",
      "job" : "PRESIDENT",
      "hiredate" : "1981-11-17T00:00:00",
      "sal" : 5000,
      "comm" : null
    },
    {
      "empno" : 7934,
      "ename" : "MILLER",
      "job" : "CLERK",
      "mgr" : 7782,
      "mgrname" : "CLARK",
      "hiredate" : "1982-01-23T00:00:00",
      "sal" : 1300,
      "comm" : null
    }
  ]
}

The where clause in line 7 contains an implicit conversion. It’s not that obvious and the execution plan does not help to spot it either.

4) Execution plan
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  8nns7xgu0v6vu, child number 4
-------------------------------------
select json_serialize(dv.data returning varchar2 pretty) as json_data   
from dept_dv dv  where dv.data."_id" = 10
 
Plan hash value: 2166484641
 
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     7 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP     |     1 |    10 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | EMP_PK  |     1 |       |     0   (0)|          |
|   3 |  SORT GROUP BY              |         |     1 |    38 |            |          |
|*  4 |   TABLE ACCESS FULL         | EMP     |     5 |   190 |     3   (0)| 00:00:01 |
|   5 |  TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    20 |     1   (0)| 00:00:01 |
|*  6 |   INDEX UNIQUE SCAN         | DEPT_PK |     1 |       |     0   (0)|          |
---------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("OUTER_ALIAS2"."EMPNO"=:B1)
   4 - filter("OUTER_ALIAS1"."DEPTNO"=:B1)
   6 - access("DEPTNO"=10)

The highlighted lines show that a unique index scan has been performed for deptno 10. This looks good. So where does the implicit conversion take place?

3. JSON Data Types

The condition dv.data."_id" = 10 in the where clause compares a JSON data type with a numeric data type. As a result, the Oracle Database needs to convert one of the values. It decided to convert the JSON to a numeric value. That’s what we see in the execution plan.

Really? Can we prove this claim? Yes, with the following statement.

5) Determine JSON data type
select is_json_condition, value
  from (
          select deptno is json as is_json,
                 deptno is json(value) as is_json_value,
                 deptno is json(array) as is_json_array,
                 deptno is json(object) as is_json_object,
                 deptno is json(scalar) as is_json_scalar,
                 deptno is json(scalar number) as is_json_scalar_number,
                 deptno is json(scalar string) as is_json_scalar_string,
                 deptno is json(scalar binary_double) as is_json_scalar_binary_double,
                 deptno is json(scalar binary_float) as is_json_scalar_binary_float,
                 deptno is json(scalar date) as is_json_scalar_date,
                 deptno is json(scalar timestamp) as is_json_scalar_timestamp,
                 deptno is json(scalar timestamp with time zone) as is_json_scalar_timestamp_with_time_zone,
                 deptno is json(scalar null) as is_json_scalar_null,
                 deptno is json(scalar boolean) as is_json_scalar_boolean,
                 deptno is json(scalar binary) as is_json_scalar_binary,
                 deptno is json(scalar interval year to month) as is_json_scalar_interval_year_to_month,
                 deptno is json(scalar interval day to second) as is_json_scalar_interval_day_to_second
            from (select dv.data."_id" as deptno from dept_dv dv where rownum = 1)
       ) src unpivot (
          value for is_json_condition in (
             is_json,
             is_json_value,
             is_json_array,
             is_json_object,
             is_json_scalar,
             is_json_scalar_number,
             is_json_scalar_string,
             is_json_scalar_binary_double,
             is_json_scalar_binary_float,
             is_json_scalar_date,
             is_json_scalar_timestamp,
             is_json_scalar_timestamp_with_time_zone,
             is_json_scalar_null,
             is_json_scalar_boolean,
             is_json_scalar_binary,
             is_json_scalar_interval_year_to_month,
             is_json_scalar_interval_day_to_second
          )
       );
IS_JSON_CONDITION                       VALUE
--------------------------------------- -----
IS_JSON                                 true 
IS_JSON_VALUE                           true 
IS_JSON_ARRAY                           false
IS_JSON_OBJECT                          false
IS_JSON_SCALAR                          true 
IS_JSON_SCALAR_NUMBER                   true 
IS_JSON_SCALAR_STRING                   false
IS_JSON_SCALAR_BINARY_DOUBLE            false
IS_JSON_SCALAR_BINARY_FLOAT             false
IS_JSON_SCALAR_DATE                     false
IS_JSON_SCALAR_TIMESTAMP                false
IS_JSON_SCALAR_TIMESTAMP_WITH_TIME_ZONE false
IS_JSON_SCALAR_NULL                     false
IS_JSON_SCALAR_BOOLEAN                  false
IS_JSON_SCALAR_BINARY                   false
IS_JSON_SCALAR_INTERVAL_YEAR_TO_MONTH   false
IS_JSON_SCALAR_INTERVAL_DAY_TO_SECOND   false

17 rows selected. 

In line 20 we query dv.data."_id" and use the IS JSON condition to determine the data type. The most granular type is JSON scalar number.

4. Explicit Conversion

So, we know now, that we need to convert a JSON scalar number to a number. And how do we do that? – By using a SQL/JSON path expression method, as in the next example.

7) Query deptno 10 in dept_dv with explicit type conversion
select json_serialize(dv.data returning varchar2 pretty) as json_data
  from dept_dv dv
 where dv.data."_id".number() = 10;

We used the .number() method in this example. The execution plan for this query looks now like this:

8) Execution plan querying dept_dv with explicit type conversion
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  4rjg8g02bph16, child number 2
-------------------------------------
select json_serialize(dv.data returning varchar2 pretty) as json_data   
from dept_dv dv  where dv.data."_id".number() = 10
 
Plan hash value: 2166484641
 
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     7 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP     |     1 |    10 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | EMP_PK  |     1 |       |     0   (0)|          |
|   3 |  SORT GROUP BY              |         |     1 |    38 |            |          |
|*  4 |   TABLE ACCESS FULL         | EMP     |     5 |   190 |     3   (0)| 00:00:01 |
|   5 |  TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    20 |     1   (0)| 00:00:01 |
|*  6 |   INDEX UNIQUE SCAN         | DEPT_PK |     1 |       |     0   (0)|          |
---------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("OUTER_ALIAS2"."EMPNO"=:B1)
   4 - filter("OUTER_ALIAS1"."DEPTNO"=:B1)
   6 - access("DEPTNO"=10)

The plan looks the same as without calling the .number() method, but in this case, we did not rely on implicit type conversion.

We’ve proven, that a .number() call does not make things worse. But are there cases where such an explicit type conversion results in a better execution plan?

5. JSON Collection View

Let’s say we need a view that filters the data in the duality view. For example, for security purposes. One way to achieve this is to create a JSON collection view. From a consumer point of view, such a view behaves 100% the same as a duality view. The only difference is that it is read-only and allows the full SQL grammar to define such a view.

Here’s an example.

9) Create JSON collection view dept_cv
create or replace json collection view dept_cv as
select dv.data
  from dept_dv dv
 where dv.data."_id".number() = 10 or dv.data.loc.string() = 'DALLAS';
Json collection view DEPT_CV created.

Now, let’s run a query with implicit type conversion.

10) Query deptno 10 in dept_cv
column json_data format a50
alter session disable parallel query;
set pagesize 1000

select json_serialize(cv.data returning varchar2 pretty) as json_data
  from dept_cv cv
 where cv.data."_id" = 10;
Session altered.


JSON_DATA
--------------------------------------------------
{
  "_id" : 10,
  "_metadata" :
  {
    "etag" : "E0146035FE26EE16D4968A21E6350D81",
    "asof" : "00002604B0C08AAE"
  },
  "dname" : "ACCOUNTING",
  "loc" : "NEW YORK",
  "emps" :
  [
    {
      "empno" : 7782,
      "ename" : "CLARK",
      "job" : "MANAGER",
      "mgr" : 7839,
      "mgrname" : "KING",
      "hiredate" : "1981-06-09T00:00:00",
      "sal" : 2450,
      "comm" : null
    },
    {
      "empno" : 7839,
      "ename" : "KING",
      "job" : "PRESIDENT",
      "hiredate" : "1981-11-17T00:00:00",
      "sal" : 5000,
      "comm" : null
    },
    {
      "empno" : 7934,
      "ename" : "MILLER",
      "job" : "CLERK",
      "mgr" : 7782,
      "mgrname" : "CLARK",
      "hiredate" : "1982-01-23T00:00:00",
      "sal" : 1300,
      "comm" : null
    }
  ]
}

The execution plan of the query above looks like this:

11) Execution plan querying dept_cv with implicit type conversion
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID  gkhradvxjrwfq, child number 4
-------------------------------------
select json_serialize(cv.data returning varchar2 pretty) as json_data   
from dept_cv cv  where cv.data."_id" = 10
 
Plan hash value: 1318549807
 
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |       |       |     9 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP    |     1 |    10 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | EMP_PK |     1 |       |     0   (0)|          |
|   3 |  SORT GROUP BY              |        |     1 |    38 |            |          |
|*  4 |   TABLE ACCESS FULL         | EMP    |     5 |   190 |     3   (0)| 00:00:01 |
|*  5 |  TABLE ACCESS FULL          | DEPT   |     1 |    20 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("OUTER_ALIAS2"."EMPNO"=:B1)
   4 - filter("OUTER_ALIAS1"."DEPTNO"=:B1)
   5 - filter((("DEPTNO"=10 OR "LOC"='DALLAS') AND 
              JSON_VALUE(JSON_SCALAR("DEPTNO" JSON NULL ON NULL ) FORMAT OSON , '$' 
              RETURNING NUMBER NULL ON ERROR TYPE(STRICT) )=10))
 
SQL Analysis Report (identified by operation id/Query Block Name/Object Alias):
-------------------------------------------------------------------------------
 
   5 -  SEL$95C0DFA4 / "OUTER_ALIAS0"@"SEL$3"
           -  The following columns have predicates which preclude their 
              use as keys in index range scan. Consider rewriting the 
              predicates.
                "DEPTNO"

The SQL analysis report at the bottom is interesting. It clearly states that no index range scan was used and that we should rewrite the query to avoid implicit type conversion.

Let’s do that.

12) Query deptno 10 in dept_cv with explicit type conversion
select json_serialize(cv.data returning varchar2 pretty) as json_data
  from dept_cv cv
 where cv.data."_id".number() = 10;

And now the execution plan has changed for the better.

13) Execution plan querying dept_cv with explicit type conversion
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  cca5xrgndnmkd, child number 2
-------------------------------------
select json_serialize(cv.data returning varchar2 pretty) as json_data   
from dept_cv cv  where cv.data."_id".number() = 10
 
Plan hash value: 2166484641
 
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     7 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP     |     1 |    10 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | EMP_PK  |     1 |       |     0   (0)|          |
|   3 |  SORT GROUP BY              |         |     1 |    38 |            |          |
|*  4 |   TABLE ACCESS FULL         | EMP     |     5 |   190 |     3   (0)| 00:00:01 |
|   5 |  TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    20 |     1   (0)| 00:00:01 |
|*  6 |   INDEX UNIQUE SCAN         | DEPT_PK |     1 |       |     0   (0)|          |
---------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("OUTER_ALIAS2"."EMPNO"=:B1)
   4 - filter("OUTER_ALIAS1"."DEPTNO"=:B1)
   6 - access("DEPTNO"=10)

Making an index unique scan possible.

6. Conclusion

Always use a SQL/JSON path expression method (binary(), boolean(), date(), dateWithTime(), number(), string(), …) when comparing a JSON value to a non-JSON value in SQL. This way you avoid implicit type conversions, improve the readability of your code, and give the Oracle Database everything it needs to create an optimal execution plan.

The post Avoid Implicit Type Conversion in JSON Access appeared first on Philipp Salvisberg's Blog.

Wrapping and Unwrapping PL/SQL

Introduction

Today I released a PL/SQL Unwrapper for VS Code. You can find it in the VS Code Marketplace, along with instructions on how to use it. It’s super easy and works the same way as the extension I wrote 10 years ago for SQL Developer in Java. If you’re curious about why I created an Unwrapper in the first place, then read this blog post . The post also includes the original Python code by Niels Teusink. I basically translated that code to the target language and added a bit of UI sugar.

And Code Wrapped with the 9i Wrap Utility?

I often get asked whether I plan to enhance the Unwrapper to support code that was wrapped with a wrap utility from Oracle Database version 7, 8, or 9. The answer is no. I don’t plan on doing that. Why? – Well, there are a few reasons.

  1. Different Algorithm
    The way the wrapping algorithm works in 9i and later versions is totally different. The 9i algorithm is a bit complicated. If you want the full story, check out Pete Finnigan’s How to Unwrap PL/SQL. In other words, It’ll take a lot of effort to create a comprehensive 9i Unwrapper. If you need one, Pete’s your guy.
  2. Not widely used
    Since unwrapping >=10g wrapped code is super easy, a few companies decided to use the 9i algorithm with or without a minifier to protect their intellectual property. However, I do not find that very often in the wild.
  3. For Legacy Applications Only
    IMO the 9i algorithm is only feasible for legacy applications. Because modern PL/SQL and SQL cannot be processed by the 9i wrap utility. As a result, legit requests for an Unwrapper are applications to be migrated that contain wrapped code without matching source code. In those cases, it is a good idea to contact Pete for help.

Let’s explore this last point a little further.

Wrap Modern SQL with OracleDB 23ai

The following procedure uses a table value constructor. That is a 23ai feature.

1) print_emp_modern_sql.sql
create or replace procedure print_emp (in_deptno in number default null) is
begin
   -- print column headers
   sys.dbms_output.put_line('DEPTNO EMPNO ENAME   SAL');
   sys.dbms_output.put_line('------ ----- ------ ----');
   <<print_emps_of_selected_dept>>
   for r in (
      with
         -- table value constructor is a 23ai feature
         emp (empno, ename, sal, deptno) as (values
            (7839, 'KING',  5000, 10),
            (7566, 'JONES', 2975, 20),
            (7788, 'SCOTT', 3000, 20)
         )
      select deptno, empno, ename, sal
        from emp
       where deptno = in_deptno or in_deptno is null
       order by deptno, sal desc
   ) loop
      sys.dbms_output.put(lpad(r.deptno, 6));
      sys.dbms_output.put(lpad(r.empno, 6));
      sys.dbms_output.put(' ');
      sys.dbms_output.put(rpad(r.ename, 7));
      sys.dbms_output.put_line(lpad(r.sal, 4));
   end loop print_emps_of_selected_dept;
end print_emp;
/

We can wrap this code with the wrap utility of the Oracle Database.

2) wrap v23.6 print_modern_sql.sql
wrap iname=print_emp_modern_sql.sql oname=print_emp_modern_sql_wrapped.sql
PL/SQL Wrapper: Release 23.0.0.0.0 - Production on Sat Mar 8 13:20:22 2025
Version 23.6.0.24.10

Copyright (c) 1982, 2024, Oracle and/or its affiliates.  All rights reserved.

Processing print_emp_modern_sql.sql to print_emp_modern_sql_wrapped.sql

And the resulting file looks like this:

3) print_emp_modern_sql_wrapped.sql
create or replace procedure print_emp wrapped 
a000000
1
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
7
357 207
ayazvHL/mg/US3DdDyNYwTqa68EwgwLxTK5qyo5A7RlkV79jvdE9vjLZi/E/UPQJEalFiuXq
hzBrk1+1JByeVW8X/KEOaUHHMK6a36OU2H4q1oax+MG//jNSNhUB6sLU8kRxrH4ebt4Wk40N
FQYr4wRTWF1+xkM2pmh8W4JiToP15Q0u9rBXe69s78wW2/zU12UKWGqC85UeCNsFIo/gM9DI
UzEh7AwFODhZ4ntqNtVW1RJDBTuExWM1mG/jBTiKvhCe2Q4FWzymJdUah2Yynj4wzX+ROYX2
G6PJ/60EFEz/K45RH9G/80R1SaHm7tH0KZZ+vFTKM9gUgoVVZCWt/7FURXFmyZ4BYuADvKvg
FydUkofWep7ql66IjSsN2hyHik8Ee6RWQkYDcvIPflMpTBzxidGzteS4RSgg3Q0+di/WaSdN
tQdYoiGTodL1hUzbGbxSe/XSKHY1ISPihH+1TCWz4PmwmbH+iyQ2QfrCu/Ng+cSB28xJcBBO
HZoylZ0=

/

Let’s unwrap the code. BTW: the highlighted lines 1 to 18 and 29 to 30 are not required to unwrap the code.

4) print_emp_modern_sql_unwrapped.sql
create or replace PROCEDURE print_emp (IN_DEPTNO IN NUMBER DEFAULT NULL) IS
BEGIN
   
   SYS.DBMS_OUTPUT.PUT_LINE('DEPTNO EMPNO ENAME   SAL');
   SYS.DBMS_OUTPUT.PUT_LINE('------ ----- ------ ----');
   <<PRINT_EMPS_OF_SELECTED_DEPT>>
   FOR R IN (
      WITH
         
         EMP (EMPNO, ENAME, SAL, DEPTNO) AS (VALUES
            (7839, 'KING',  5000, 10),
            (7566, 'JONES', 2975, 20),
            (7788, 'SCOTT', 3000, 20)
         )
      SELECT DEPTNO, EMPNO, ENAME, SAL
        FROM EMP
       WHERE DEPTNO = IN_DEPTNO OR IN_DEPTNO IS NULL
       ORDER BY DEPTNO, SAL DESC
   ) LOOP
      SYS.DBMS_OUTPUT.PUT(LPAD(R.DEPTNO, 6));
      SYS.DBMS_OUTPUT.PUT(LPAD(R.EMPNO, 6));
      SYS.DBMS_OUTPUT.PUT(' ');
      SYS.DBMS_OUTPUT.PUT(RPAD(R.ENAME, 7));
      SYS.DBMS_OUTPUT.PUT_LINE(LPAD(R.SAL, 4));
   END LOOP PRINT_EMPS_OF_SELECTED_DEPT;
END PRINT_EMP;

Interesting are the empty lines 3 and 9. The original comments are lost. Furthermore, all keywords and identifiers are in uppercase. The only exception is the name of the procedure. The Unwrapper added the create or replace clause to make the statement executable.

Wrap Modern SQL with OracleDB 9iR2

Now let’s try to wrap print_emp_modern_sql.sql with the wrap utility in the Oracle Database 9.2.0.8.

5) wrap v9.2 print_modern_sql.sql
wrap iname=print_emp_modern_sql.sql oname=print_emp_modern_sql_wrapped9i.sql
PL/SQL Wrapper: Release 9.2.0.8.0- 64bit Production on Sat Mar 08 14:55:28 2025

Copyright (c) Oracle Corporation 1993, 2001.  All Rights Reserved.

Processing print_emp_modern_sql.sql to print_emp_modern_sql_wrapped9i.sql
PSU(103,1,8,7):Encountered the symbol "WITH" when expecting one of the following:

   ( - + case mod new not null others select <an identifier>
   <a double-quoted delimited-identifier> <a bind variable> avg
   count current exists max min prior sql stddev sum variance
   execute forall merge time timestamp interval date
   <a string literal with character set specification>
   <a number> <a single-quoted SQL string> pipe
The symbol "WITH" was ignored.

PSU(103,1,10,42):Encountered the symbol "AS" when expecting one of the following:

   . ( ) , * % & | = - + < / > at in is mod not range rem => ..
   <an exponent (**)> <> or != or ~= >= <= <> and or like
   between ||

PL/SQL Wrapper error: Compilation error(s) for:
create or replace procedure print_emp
Outputting source and continuing.

We got two errors due to the use of modern SQL. However, we can use the parameter edebug=wrap_new_sql to support newer SQL grammar.

6) wrap v9.2 print_modern_sql.sql with edebug=wrap_new_sql
wrap iname=print_emp_modern_sql.sql oname=print_emp_modern_sql_wrapped9i.sql edebug=wrap_new_sql
PL/SQL Wrapper: Release 9.2.0.8.0- 64bit Production on Sat Mar 08 14:56:01 2025

Copyright (c) Oracle Corporation 1993, 2001.  All Rights Reserved.

Processing print_emp_modern_sql.sql to print_emp_modern_sql_wrapped9i.sql

And the resulting file looks like this:

7) print_emp_modern_sql_wrapped9i.sql
create or replace procedure print_emp wrapped 
0
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
3
7
9200000
1
4
0 
18
2 :e:
1PRINT_EMP:
1IN_DEPTNO:
1NUMBER:
1SYS:
1DBMS_OUTPUT:
1PUT_LINE:
1DEPTNO EMPNO ENAME   SAL:
1------ ----- ------ ----:
1PRINT_EMPS_OF_SELECTED_DEPT:
1R:
1EMP:
1EMPNO:
1ENAME:
1SAL:
1DEPTNO:
1LOOP:
1with:n         -- table value constructor is a 23ai feature:n         emp (em+
1pno, ename, sal, deptno) as (values:n            (7839, 'KING',  5000, 10),:n+
1            (7566, 'JONES', 2975, 20),:n            (7788, 'SCOTT', 3000, 20)+
1:n         ):n      select deptno, empno, ename, sal:n        from emp:n     +
1  where deptno = in_deptno or in_deptno is null:n       order by deptno, sal +
1desc:n   :
1PUT:
1LPAD:
16:
1 :
1RPAD:
17:
14:
0

0
0
7e
2
0 9a 8f a0 4d b0 3d b4
55 6a :2 a0 6b a0 6b 6e a5
57 :2 a0 6b a0 6b 6e a5 57
93 91 :10 a0 12a 37 :2 a0 6b a0
6b :3 a0 6b 51 a5 b a5 57
:2 a0 6b a0 6b :3 a0 6b 51 a5
b a5 57 :2 a0 6b a0 6b 6e
a5 57 :2 a0 6b a0 6b :3 a0 6b
51 a5 b a5 57 :2 a0 6b a0
6b :3 a0 6b 51 a5 b a5 57
b7 :2 a0 47 b0 46 b7 a4 a0
b1 11 68 4f 1d 17 b5 
7e
2
0 3 20 1b 1f 1a 28 17
2d 31 35 39 3d 40 44 47
4c 4d 52 56 5a 5d 61 64
69 6a 6f 77 7b 7f 83 87
8b 8f 93 97 9b 9f a3 a7
ab af b3 b7 bb c7 c9 cd
d1 d4 d8 db df e3 e7 ea
ed ee f0 f1 f6 fa fe 101
105 108 10c 110 114 117 11a 11b
11d 11e 123 127 12b 12e 132 135
13a 13b 140 144 148 14b 14f 152
156 15a 15e 161 164 165 167 168
16d 171 175 178 17c 17f 183 187
18b 18e 191 192 194 195 19a 19c
1a0 1a4 1ab 1ac 1af 1b1 1b5 1b9
1bb 1c7 1cb 1cd 1ce 1d7 
7e
2
0 b 16 23 32 :2 16 15 :2 1
4 :2 8 :2 14 1d :3 4 :2 8 :2 14 1d
:2 4 6 8 a f 16 1d 22
e 16 1d 24 :2 e 17 24 11
19 6 d 4 7 :2 b :2 17 1b
20 :2 22 2a :2 1b :3 7 :2 b :2 17 1b
20 :2 22 29 :2 1b :3 7 :2 b :2 17 1b
:3 7 :2 b :2 17 1b 20 :2 22 29 :2 1b
:3 7 :2 b :2 17 20 25 :2 27 2c :2 20
:2 7 6 8 d 4 :4 1 5 :7 1

7e
4
0 :9 1 :8 4 :8 5
6 7 :5 a :4 f
10 :3 11 :2 12 13
:2 7 :e 14 :e 15 :8 16
:e 17 :e 18 13 :2 19
7 :4 2 1a :7 1

1d9
4
:3 0 1 :a 0 79
1 :7 0 5 :2 0
:2 3 :4 0 2 :7 0
5 3 4 :2 0
7 :2 0 79 1
8 :2 0 4 :3 0
5 :3 0 a b
0 6 :3 0 c
d 0 7 :4 0
7 e 10 :2 0
74 4 :3 0 5
:3 0 12 13 0
6 :3 0 14 15
0 8 :4 0 9
16 18 :2 0 74
9 :5 0 72 2
a :3 0 b :3 0
c :3 0 d :3 0
e :3 0 f :3 0
f :3 0 c :3 0
d :3 0 e :3 0
b :3 0 f :3 0
2 :3 0 2 :3 0
f :3 0 e :3 0
10 :4 0 11 1
:8 0 2d 1b 2c
4 :3 0 5 :3 0
2e 2f 0 12
:3 0 30 31 0
13 :3 0 a :3 0
f :3 0 34 35
0 14 :2 0 b
33 38 e 32
3a :2 0 6e 4
:3 0 5 :3 0 3c
3d 0 12 :3 0
3e 3f 0 13
:3 0 a :3 0 c
:3 0 42 43 0
14 :2 0 10 41
46 13 40 48
:2 0 6e 4 :3 0
5 :3 0 4a 4b
0 12 :3 0 4c
4d 0 15 :4 0
15 4e 50 :2 0
6e 4 :3 0 5
:3 0 52 53 0
12 :3 0 54 55
0 16 :3 0 a
:3 0 d :3 0 58
59 0 17 :2 0
17 57 5c 1a
56 5e :2 0 6e
4 :3 0 5 :3 0
60 61 0 6
:3 0 62 63 0
13 :3 0 a :3 0
e :3 0 66 67
0 18 :2 0 1c
65 6a 1f 64
6c :2 0 6e 21
71 10 :3 0 9
:3 0 2d 6e :4 0
73 27 72 71
74 29 78 :3 0
78 1 :4 0 78
77 74 75 :6 0
79 :2 0 1 8
78 7c :3 0 7b
79 7d :8 0 
2d
4
:3 0 1 2 1
6 1 f 1
17 2 36 37
1 39 2 44
45 1 47 1
4f 2 5a 5b
1 5d 2 68
69 1 6b 5
3b 49 51 5f
6d 1 1a 3
11 19 73 
1
4
0 
7c
0
1
14
2
4
0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 
2 1 0
1a 1 2
1 0 1
1b 2 0
0

/

The query is visible in plain text on lines 42 to 47.

When we try to install this wrapped package in the OracleDB 23ai we get the following error:

8) Error message with default settings
Procedure PRINT_EMP compiled

LINE/COL  ERROR
--------- -------------------------------------------------------------
0/0       PLS-01918: 9.2 and earlier wrap formats are not permitted
Errors: check compiler log

We have to enable permit_92_wrap_format to overcome this issue. This is not possible on session or PDB level. So we have to change the settings as follows:

9) enable permit_92_wrap_format
alter session set container=cdb$root;
alter system set permit_92_wrap_format=true scope=spfile;
Session altered.


System altered.

After a restart of the database, we can install print_emp_modern_sql_wrapped9i.sql successfully.

System parameters that you can’t set at the PDB level are very limiting. In fact, it makes installing PL/SQL code wrapped with the 9i wrap utility in an Autonomous Database pretty much impossible.

Wrap Modern PL/SQL with OracleDB 9iR2

We have seen that it is possible to process arbitrary SQL within PL/SQL with the wrap utility in OracleDB 9.2.

Now let’s add some PL/SQL constructs that were introduced in later versions of the Oracle Database.

The highlighted lines show the changes to print_emp_modern_sql. We use a PL/SQL identifier print_employees_of_a_selected_department that is longer than 30 bytes. Also, we use the continue statement, which wasn’t available in version 9.2.0.8.

10) print_emp_modern_plsql.sql
create or replace procedure print_emp (in_deptno in number default null) is
begin
   -- print column headers
   sys.dbms_output.put_line('DEPTNO EMPNO ENAME   SAL');
   sys.dbms_output.put_line('------ ----- ------ ----');
   <<print_employees_of_a_selected_department>>
   for r in (
      with
         -- table value constructor is a 23ai feature
         emp (empno, ename, sal, deptno) as (values
            (7839, 'KING',  5000, 10),
            (7566, 'JONES', 2975, 20),
            (7788, 'SCOTT', 3000, 20)
         )
      select deptno, empno, ename, sal
        from emp
       where deptno = in_deptno or in_deptno is null
       order by deptno, sal desc
   ) loop
      -- continue is a 11g feature
      continue when r.sal < 1000;
      sys.dbms_output.put(lpad(r.deptno, 6));
      sys.dbms_output.put(lpad(r.empno, 6));
      sys.dbms_output.put(' ');
      sys.dbms_output.put(rpad(r.ename, 7));
      sys.dbms_output.put_line(lpad(r.sal, 4));
   end loop print_employees_of_a_selected_department;
end print_emp;
/

Let’s try to wrap this code with the wrap utility of the Oracle Database 9.2.0.8.

11) wrap v9.2 print_modern_plsql.sql with edebug=wrap_new_sql
 wrap iname=print_emp_modern_plsql.sql oname=print_emp_modern_plsql_wrapped9i.sql edebug=wrap_new_sql
PL/SQL Wrapper: Release 9.2.0.8.0- 64bit Production on Sat Mar 08 16:11:45 2025

Copyright (c) Oracle Corporation 1993, 2001.  All Rights Reserved.

Processing print_emp_modern_plsql.sql to print_emp_modern_plsql_wrapped9i.sql
PSU(114,1,6,6):identifier 'PRINT_EMPLOYEES_OF_A_SELECTED_' too long
PSU(103,1,21,16):Encountered the symbol "WHEN" when expecting one of the following:

   := . ( @ % ;

PSU(114,1,27,13):identifier 'PRINT_EMPLOYEES_OF_A_SELECTED_' too long
PL/SQL Wrapper error: Compilation error(s) for:
create or replace procedure print_emp
Outputting source and continuing.

It is impossible to wrap PL/SQL code with grammar constructs that are missing in the Oracle database version of the wrap utility. This is only true for Oracle Database versions before 10g, though.

In case of an error, the wrap utility writes the original code unchanged to the target file. So technically, the resulting file can be installed successfully.

Conclusion

Still wrapping PL/SQL with Oracle 9i Release 2? It’s time to move on. Staying tied to 2007’s feature set means restricting your application’s potential and compatibility.

And if you’re absolutely sure you need a 9i Unwrapper, I’m not the person to ask – Pete is.

The post Wrapping and Unwrapping PL/SQL appeared first on Philipp Salvisberg's Blog.

Viewing all 118 articles
Browse latest View live