We all know what is a "CARTESIAN JOIN", for further reference, pls refer the url mentioned in the references section.
I worked on a tuning task of a 11 page report. Two invisible cartesian joins are causing the slowness.
Description of the problem:
- This is a 11 page report.
- It used all sorts of aggregate functions.
- It consists of 8 sub-queries joined in the main query.
- Very few specific literal values to reduce the amount of data churned.
- Total number of tables accessed in this report are about 48.
- The smallest table consists of 1k rows and the largest consists of 188 Million rows.
Requirements:
- The report is to be tuned to run within 10 mts.
- The initial test run took 49 hours to complete.
- I had the input, 11 page report that tkes 49 hrs to complete, the output, expected time to complete the report 10 mts.
- Tough task!
This is a great quote.
"If you succeed, there is no need to explain. If you fail, you will not be there to explain."
– Adolf Hitler.
Working on stringent/tough tasks, always helps me sharpen my skills.
I love this quote, by Rabindranath Tagore.
"Dedication towards your job is not a sacrifice. It is the justification for your existence in this world." – Rabindranath Tagore
Initial tricks and methods are of no help in bringing down the elapsed time, but for few minutes of elapsed time reduction here and there. This query is making few trillion LIOs, during execute. Used index hints, parallel hints in few of the sub-queries, still no reduction in elapsed time.
Left with no other option, but to analyze and understand each line of code of the 11 page report. Performance tuning is an unproductive task. Upfront, I do not have any estimate of how much time, the current sql tuning requires and when it completes.
Tested the individual sub-queries' executions, all of them ran between few seconds to 8 mts time range. Then, where is the problem, the consolidated query is still taking 49 hours to complete.
My tuning task is running into weeks, without any hope of completion in the horizon.
While scrutinizing the sub-queries, I stumbled on some thing strange. On my initial investigation, everything looked normal. I suspected Cartesian joins are happening in two sub-queries and blowing the number of rows to be processed inside memory.
Since I can not share the actual code, as it violates SENSITIVE/CONFIDENTIAL/PROPRIETARY DATA LAWS, I created a test case scenario, to show how the Cartesian joins are occurring in our scenario.
The cardinality of columns used in the table joins will impact the queries performance. The number of rows fetched from tables, the % of uniqueness of the column values coming from the base tables will influence the task.
***********************************************************************************
>create table temp_jp1(col1 number, col2 varchar2(20));
Table created.
>create table temp_jp2 as select * from temp_jp1 where 1=2;
Table created.
>declare
2 begin
3 for i in 1..3 loop
4 insert into temp_jp1 values(1,'Veeksha');
5 insert into temp_jp2 values(1,'Saketh');
6 end loop;
7 end;
8 /
PL/SQL procedure successfully completed.
>commit;
Commit complete.
>select * from temp_jp1;
COL1 COL2
---------- --------------------
1 Veeksha
1 Veeksha
1 Veeksha
>select * from temp_jp2;
COL1 COL2
---------- --------------------
1 Saketh
1 Saketh
1 Saketh
>select * from temp_jp1, temp_jp2;
COL1 COL2 COL1 COL2
---------- -------------------- ---------- --------------------
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
9 rows selected.
–THIS IS A CARTESEAN JOIN( OR FULL OUTER JOIN) WHERE IN YOU ARE NOT SPECIFYING ANY JOIN CONDITION BETWEEN THE TABLES INVOLVED IN THE QUERY IN YOUR WHERE CLAUSE.
>select * from temp_jp1 a, temp_jp2 b where a.col1 = b.col1;
COL1 COL2 COL1 COL2
---------- -------------------- ---------- --------------------
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
1 Veeksha 1 Saketh
9 rows selected.
–THIS IS ALSO A CARTESEAN PRODUCT(FULL OUTER JOIN) AS EACH ROW IN THE FIRST TABLE MATCHES WITH EVERY ROW IN THE SECOND TABLE. ALL THE ROWS ARE DUPLICATES AND RESULT IN CARTESIAN PLS NOTE THE DIFFERENCE IN THE ABOVE TWO SQLS, ONE WITHOUT A WHERE CONDITION AND THE SECOND ONE WITH A WHERE CONDITION, STILL FETCHING THE SAME NUMBER OF RESULTANT ROWS.
>select * from (select distinct * from temp_jp1) a, (select distinct * from temp_jp2) b where a.col1 = b.col1;
COL1 COL2 COL1 COL2
---------- -------------------- ---------- --------------------
1 Veeksha 1 Saketh
–TO FIND DISTINCT MATCHES BETWEEN THE TWO TABLES, YOU NEED TO SUPRESS THE DUPLICATES USING A DISTINT OR GROUP BY CLAUSE AND THEN JOIN THE RESULTANT COLUMN KEY VALUES FROM SUB-QUERIES.
–A GROUP BY CLAUSE CAN BE USED TO FILTER DUPLICATES BASED ON THE REQUIRED COLUMN'S COMBINATION.
***********************************************************************************
I consulted the application team, as to the use of Cartesian joins in the sub-queries. They confirmed, that the application is expecting to process only distinct values. And also confirmed that there is no business requirement to process Cartesian joins. Assured that I can filter all the duplicates generated.
Used group by clause and filtered duplicates. There was huge reduction in the run time. Tuned further with join hints, parallel hints.
The query completed under 20 mts. The business tested the modified query, and is happy with the new run times.
There is scope to further tune the query, but it requires more time.
I love this quote, very much: “If you want me to give you a two-hour presentation, I am ready today. If you want only a five-minute speech, it will take me two weeks to prepare.”
― Mark Twain
Happy scripting.
References:
https://www.tutorialspoint.com/sql/sql-cartesian-joins.htm
Start the discussion at forums.toadworld.com