The solution I’ve provided in Joining Temporal Intervals produces wrong results if one or more temporal tables have gaps in their history or if disconnected intervals have the same content. In this post I’ll address both problems.
Test Data
The example queries are based on the same model as described in Joining Temporal Intervals. For the join of the tables EMPV, DEPTV, JOBV and EMPV (manager) I’ve amended the history to contain some gaps which are highlighted in the following listing.
SQL> SELECT * FROM empv WHERE empno = 7788 ORDER BY valid_from; EMPVID EMPNO ENAME JOBNO MGR HIREDATE SAL COMM DEPTNO VALID_FROM VALID_TO ------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- ----------- 8 7788 SCOTT 5 7566 19-APR-1987 3000.0 20 19-APR-1987 31-DEC-1989 22 7788 Scott 5 7566 19-APR-1987 3000.0 20 01-JAN-1990 31-MAR-1991 36 7788 Scott 5 7566 19-APR-1987 3300.0 20 01-APR-1991 31-JUL-1991 43 7788 Scott 5 7566 01-JAN-1992 3500.0 20 01-JAN-1992 31-DEC-9999 SQL> SELECT * FROM jobv WHERE jobno = 5 ORDER BY valid_from; JOBVID JOBNO JOB VALID_FROM VALID_TO ---------- ----- --------- ----------- ----------- 5 5 ANALYST 01-JAN-1980 20-JAN-1990 10 5 Analyst 22-JAN-1990 31-DEC-9999 SQL> SELECT * FROM deptv WHERE deptno = 20 ORDER BY valid_from; DEPTVID DEPTNO DNAME LOC VALID_FROM VALID_TO ---------- ------ -------------- ------------- ----------- ----------- 2 20 RESEARCH DALLAS 01-JAN-1980 28-FEB-1990 6 20 Research DALLAS 01-MAR-1990 31-MAR-1990 10 20 Research Dallas 01-APR-1990 31-DEC-9999 SQL> SELECT * FROM empv WHERE empno = 7566 ORDER BY valid_from; EMPVID EMPNO ENAME JOBNO MGR HIREDATE SAL COMM DEPTNO VALID_FROM VALID_TO ------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- ----------- 4 7566 JONES 4 7839 02-APR-1981 2975.0 20 02-APR-1981 31-DEC-1989 18 7566 Jones 4 7839 02-APR-1981 2975.0 20 01-JAN-1990 31-MAR-1991 32 7566 Jones 4 7839 02-APR-1981 3272.5 20 01-APR-1991 31-DEC-9999 SQL> SELECT * FROM empv WHERE mgr = 7788 ORDER BY valid_from; EMPVID EMPNO ENAME JOBNO MGR HIREDATE SAL COMM DEPTNO VALID_FROM VALID_TO ------ ----- ----- ----- ---- ----------- ------ ---- ------ ----------- ----------- 11 7876 ADAMS 1 7788 23-MAY-1987 1100.0 20 23-MAY-1987 31-DEC-1989 25 7876 Adams 1 7788 23-MAY-1987 1100.0 20 01-JAN-1990 31-MAR-1991 39 7876 Adams 1 7788 23-MAY-1987 1210.0 20 01-APR-1991 31-DEC-9999
From a business point of view Scott left the company on 31-JUL-1991 and came back on 01-JAN-1992 with a better salary. It’s important to notice, that Scott is Adams manager and Adams is therefore leaderless from 01-AUG-1991 until 31-DEC-1991. Additionally I fabricated a gap for JOBNO 5 on 21-JAN-1990.
You find the SQL script to create and populate the model here.
Gap-Aware Temporal Join
The following figure visualizes the expected result of the temporal join. The raw data intervals queried perviously are represented in blue and the join result in red. The yellow bars highlight the gaps in the source and result data set.
Here is the query and the join result for EMPNO = 7788. Please note that the column LOC from table DEPV is not queried, which will reduce the number of final result intervals from 7 to 6.
SELECT e.empno, g.valid_from, LEAST( e.valid_to, d.valid_to, j.valid_to, NVL(m.valid_to, e.valid_to), LEAD(g.valid_from - 1, 1, e.valid_to) OVER( PARTITION BY e.empno ORDER BY g.valid_from ) ) AS valid_to, e.ename, j.job, e.mgr, m.ename AS mgr_ename, e.hiredate, e.sal, e.comm, e.deptno, d.dname FROM empv e INNER JOIN (SELECT valid_from FROM empv UNION SELECT valid_from FROM deptv UNION SELECT valid_from FROM jobv UNION SELECT valid_to + 1 FROM empv WHERE valid_to != DATE '9999-12-31' UNION SELECT valid_to + 1 FROM deptv WHERE valid_to != DATE '9999-12-31' UNION SELECT valid_to + 1 FROM jobv WHERE valid_to != DATE '9999-12-31') g ON g.valid_from BETWEEN e.valid_from AND e.valid_to INNER JOIN deptv d ON d.deptno = e.deptno AND g.valid_from BETWEEN d.valid_from AND d.valid_to INNER JOIN jobv j ON j.jobno = e.jobno AND g.valid_from BETWEEN j.valid_from AND j.valid_to LEFT JOIN empv m ON m.empno = e.mgr AND g.valid_from BETWEEN m.valid_from AND m.valid_to WHERE e.empno = 7788 ORDER BY 1, 2;
EMPNO VALID_FROM VALID_TO ENAME JOB MGR MGR_E HIREDATE SAL COMM DEPTNO DNAME ----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ -------- 7788 19-APR-1987 22-MAY-1987 SCOTT ANALYST 7566 JONES 19-APR-1987 3000.0 20 RESEARCH 7788 23-MAY-1987 31-DEC-1989 SCOTT ANALYST 7566 JONES 19-APR-1987 3000.0 20 RESEARCH 7788 01-JAN-1990 20-JAN-1990 Scott ANALYST 7566 Jones 19-APR-1987 3000.0 20 RESEARCH 7788 22-JAN-1990 28-FEB-1990 Scott Analyst 7566 Jones 19-APR-1987 3000.0 20 RESEARCH 7788 01-MAR-1990 31-MAR-1990 Scott Analyst 7566 Jones 19-APR-1987 3000.0 20 Research 7788 01-APR-1990 31-MAR-1991 Scott Analyst 7566 Jones 19-APR-1987 3000.0 20 Research 7788 01-APR-1991 31-JUL-1991 Scott Analyst 7566 Jones 19-APR-1987 3300.0 20 Research 7788 01-JAN-1992 31-DEC-9999 Scott Analyst 7566 Jones 01-JAN-1992 3500.0 20 Research
The highlighted inline view g produces a list of all distinct VALID_FROM values which will be used as additional join criterion for all temporal tables. Unlinke in Joining Temporal Intervals all interval endpoints need also be considered to identify gaps.
The calculation of the VALID_TO column is a bit laborious (see highlighted lines 3 to 11). You need to get the lowest value for VALID_TO of all involved intervals, including outer joined intervals like m. Also the subsequent VALID_FROM has to be considered since the inline view g is providing all VALID_FROM values to be probed and they may be completely independent of the involved intervals.
The remaining part of the query is quite simple. I’ve highlighted the rows in the result set which should be merged in a subsequent step (see line 4 and 8).
If you change line 46 to “WHERE e.mgr = 7788″ you get the following result:
EMPNO VALID_FROM VALID_TO ENAME JOB MGR MGR_E HIREDATE SAL COMM DEPTNO DNAME ----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ -------- 7876 23-MAY-1987 31-DEC-1989 ADAMS CLERK 7788 SCOTT 23-MAY-1987 1100.0 20 RESEARCH 7876 01-JAN-1990 20-JAN-1990 Adams CLERK 7788 Scott 23-MAY-1987 1100.0 20 RESEARCH 7876 21-JAN-1990 21-JAN-1990 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 RESEARCH 7876 22-JAN-1990 28-FEB-1990 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 RESEARCH 7876 01-MAR-1990 31-MAR-1990 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 Research 7876 01-APR-1990 31-MAR-1991 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 Research 7876 01-APR-1991 31-JUL-1991 Adams Clerk 7788 Scott 23-MAY-1987 1210.0 20 Research 7876 01-AUG-1991 31-DEC-1991 Adams Clerk 7788 23-MAY-1987 1210.0 20 Research 7876 01-JAN-1992 31-DEC-9999 Adams Clerk 7788 Scott 23-MAY-1987 1210.0 20 Research
This result is interesting for several reasons. First, there are two highlighted records which should be merged in a subsequent step. Second, the line 10 represents the time where Scott was not employed by this company. Third, the records in line 9 and 11 must not be merged in a subsequent step. They are identical (beside VALID_FROM and VALID_TO) but the intervals are not connected.
Merging Temporal Intervals
If you look at the result of the previous query you might be tempted to avoid the merging step since there are just a few intervals which need merging. However, in real live scenarios you might easily end up with daily intervals for large tables, since the inline-view g considers all valid_from and valid_to columns of all involved tables. Sooner or later you will think about merging temporal intervals or about other solutions to reduce the result set. – If you’re skeptical, then querying for “e.empno = 7369″ might give you an idea what I’m taking about (21 intervals before merge, 6 intervals after merge).
Since I covered this topic in Merging Temporal Intervals with Gaps I’ll provide the query to produce the final result and explain the specialities only.
WITH joined AS ( -- gap-aware temporal join -- produces result_cols to calculate new_group in the subsequent query SELECT e.empno, g.valid_from, LEAST( e.valid_to, d.valid_to, j.valid_to, NVL(m.valid_to, e.valid_to), LEAD(g.valid_from - 1, 1, e.valid_to) OVER( PARTITION BY e.empno ORDER BY g.valid_from ) ) AS valid_to, ( e.ename || ',' || j.job || ',' || e.mgr || ',' || m.ename || ',' || TO_CHAR(e.hiredate,'YYYY-MM-DD') || ',' || e.sal || ',' || e.comm || ',' || e.deptno || ',' || d.dname ) AS result_cols, e.ename, j.job, e.mgr, m.ename AS mgr_ename, e.hiredate, e.sal, e.comm, e.deptno, d.dname FROM empv e INNER JOIN (SELECT valid_from FROM empv UNION SELECT valid_from FROM deptv UNION SELECT valid_from FROM jobv UNION SELECT valid_to + 1 FROM empv WHERE valid_to != DATE '9999-12-31' UNION SELECT valid_to + 1 FROM deptv WHERE valid_to != DATE '9999-12-31' UNION SELECT valid_to + 1 FROM jobv WHERE valid_to != DATE '9999-12-31') g ON g.valid_from BETWEEN e.valid_from AND e.valid_to INNER JOIN deptv d ON d.deptno = e.deptno AND g.valid_from BETWEEN d.valid_from AND d.valid_to INNER JOIN jobv j ON j.jobno = e.jobno AND g.valid_from BETWEEN j.valid_from AND j.valid_to LEFT JOIN empv m ON m.empno = e.mgr AND g.valid_from BETWEEN m.valid_from AND m.valid_to ), calc_various AS ( -- produces columns has_gap, new_group SELECT empno, valid_from, valid_to, result_cols, ename, job, mgr, mgr_ename, hiredate, sal, comm, deptno, dname, CASE WHEN LAG(valid_to, 1, valid_from - 1) OVER( PARTITION BY empno ORDER BY valid_from ) = valid_from - 1 THEN 0 ELSE 1 END AS has_gap, CASE WHEN LAG(result_cols, 1, result_cols) OVER ( PARTITION BY empno ORDER BY valid_from ) = result_cols THEN 0 ELSE 1 END AS new_group FROM joined ), calc_group AS ( -- produces column group_no SELECT empno, valid_from, valid_to, ename, job, mgr, mgr_ename, hiredate, sal, comm, deptno, dname, SUM(has_gap + new_group) OVER( PARTITION BY empno ORDER BY valid_from ) AS group_no FROM calc_various ), merged AS ( -- produces the final merged result -- grouping by group_no ensures that gaps are honored SELECT empno, MIN(valid_from) AS valid_from, MAX(valid_to) AS valid_to, ename, job, mgr, mgr_ename, hiredate, sal, comm, deptno, dname FROM calc_group GROUP BY empno, group_no, ename, job, mgr, mgr_ename, hiredate, sal, comm, deptno, dname ORDER BY empno, valid_from ) -- main select * FROM merged WHERE empno = 7788;
EMPNO VALID_FROM VALID_TO ENAME JOB MGR MGR_E HIREDATE SAL COMM DEPTNO DNAME ----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ -------- 7788 19-APR-1987 31-DEC-1989 SCOTT ANALYST 7566 JONES 19-APR-1987 3000.0 20 RESEARCH 7788 01-JAN-1990 20-JAN-1990 Scott ANALYST 7566 Jones 19-APR-1987 3000.0 20 RESEARCH 7788 22-JAN-1990 28-FEB-1990 Scott Analyst 7566 Jones 19-APR-1987 3000.0 20 RESEARCH 7788 01-MAR-1990 31-MAR-1991 Scott Analyst 7566 Jones 19-APR-1987 3000.0 20 Research 7788 01-APR-1991 31-JUL-1991 Scott Analyst 7566 Jones 19-APR-1987 3300.0 20 Research 7788 01-JAN-1992 31-DEC-9999 Scott Analyst 7566 Jones 01-JAN-1992 3500.0 20 Research
The named query “joined” produces an additional column RESULT_COLS (see highlighted lines 16-26). It’s simply a concatenation of all columns used in the group by clause of the named query “merged”. RESULT_COLS is used in the named query “calc_various” (see highlighted lines 86-88) to calculate the column NEW_GROUP. NEW_GROUP is set to 1 if the value of RESULT_COLS is different for the current and previous row. NEW_GROUP ensures that Adam’s intervals valid from 01-APR-1991 and valid from 01-JAN-1992 are not merged. See line 7 and 9 in the following listing.
EMPNO VALID_FROM VALID_TO ENAME JOB MGR MGR_E HIREDATE SAL COMM DEPTNO DNAME ----- ----------- ----------- ----- ------- ---- ----- ----------- ------ ---- ------ -------- 7876 23-MAY-1987 31-DEC-1989 ADAMS CLERK 7788 SCOTT 23-MAY-1987 1100.0 20 RESEARCH 7876 01-JAN-1990 20-JAN-1990 Adams CLERK 7788 Scott 23-MAY-1987 1100.0 20 RESEARCH 7876 21-JAN-1990 28-FEB-1990 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 RESEARCH 7876 01-MAR-1990 31-MAR-1991 Adams Clerk 7788 Scott 23-MAY-1987 1100.0 20 Research 7876 01-APR-1991 31-JUL-1991 Adams Clerk 7788 Scott 23-MAY-1987 1210.0 20 Research 7876 01-AUG-1991 31-DEC-1991 Adams Clerk 7788 23-MAY-1987 1210.0 20 Research 7876 01-JAN-1992 31-DEC-9999 Adams Clerk 7788 Scott 23-MAY-1987 1210.0 20 Research
The ORA_HASH function over RESULT_COLS could give me a shorter representation of all result columns. But since I do not detect hash collisions and this would lead to wrong results in some rare data constellations, I decided not to use a hash function.
The named queries “calc_various”, “calc_group” and “merged” are based on the query in scenario A of Merging Temporal Intervals with Gaps. The columns HAS_GAP, NEW_GROUP and GROUP_NO are explained in this post.
Conclusion
Joining and merging temporal intervals is indeed very challenging. Even if I showed in this post that it is doable I recommend to choose a simpler solution whenever feasible. E.g. limiting the query to a certain point in time for all involved temporal tables, since this eliminates the need of merging temporal intervals and even simplifies the gap-aware temporal join.