What to Do When the MySQL Optimizer Overestimates the Condition Filtering Effect

In my previous blog post, I showed an example of how the MySQL Optimizer found a better join order by taking into account the filtering effects of conditions. I also explained that for non-indexed columns the filtering estimate is just a guess, and that there is a risk for non-optimal query plans if the guess is off.

We have received a few bug reports on performance regressions when upgrading from 5.6 to 5.7 that are caused by the optimizer overestimating the filtering effect. In most cases, the cause of the regression is inaccurate filtering estimates for equality conditions on non-indexed columns with low cardinality. In this blog post, I will discuss three ways to handle such regressions:

Create an index
Use an optimizer hint to change the join order
Disable condition filtering

First, I will show an example where overestimating the condition filtering effects gives a non-optimal query plan.

Example: DBT-3 Query 21

We will look at Query 21 in the DBT-3 benchmark:

SELECT s_name, COUNT(*) AS numwait
FROM supplier
JOIN lineitem l1 ON s_suppkey = l1.l_suppkey
JOIN orders ON o_orderkey = l1.l_orderkey
JOIN nation ON s_nationkey = n_nationkey
WHERE o_orderstatus = 'F'
  AND l1.l_receiptdate > l1.l_commitdate
  AND EXISTS (SELECT * FROM lineitem l2
              WHERE l2.l_orderkey = l1.l_orderkey
                AND l2.l_suppkey <> l1.l_suppkey)
  AND NOT EXISTS (SELECT * FROM lineitem l3
                  WHERE l3.l_orderkey = l1.l_orderkey
                    AND l3.l_suppkey <> l1.l_suppkey
                    AND l3.l_receiptdate > l3.l_commitdate)
  AND n_name = 'JAPAN'
GROUP BY s_name ORDER BY numwait DESC, s_name LIMIT 100;

Query 21 is called Suppliers Who Kept Orders Waiting Query. In MySQL 5.7, Visual EXPLAIN shows the following query plan for Query 21:

The four tables of the join are joined from left-to-right, starting with a full table scan of the orders table. There are also two dependent subqueries on the lineitem table that will be executed for each row of the outer lineitemtable. The execution time for this query plan is almost 25 seconds on a scale factor 1 DBT-3 database. This is more than ten times as long as the query plan used in MySQL 5.6!

The filtered column of tabular EXPLAIN shows the optimizer's estimates for the condition filter effects (some of the columns have been removed to save space):

id	select_type	table	type	key	rows	filtered	Extra
1	PRIMARY	orders	ALL	NULL	1500000	10.00	Using where; Using temporary; Using filesort
1	PRIMARY	l1	ref	PRIMARY	4	33.33	Using where
1	PRIMARY	supplier	eq_ref	PRIMARY	1	100.00	Using index condition
1	PRIMARY	nation	ALL	NULL	25	4.00	Using where; Using join buffer (Block Nested Loop)
3	DEPENDENT SUBQUERY	l3	ref	PRIMARY	4	30.00	Using where
2	DEPENDENT SUBQUERY	l2	ref	PRIMARY	4	90.00	Using where

This shows that the optimizer assumes that the condition o_orderstatus = 'F' is satisfied by 10% of the rows in the orders table. Hence, the optimizer thinks that it will be possible to filter out a lot of orders early by starting with the orders table. However, the truth is that almost 50% of the rows have the requested order status. In other words, by overestimating the filtering effect for orders, query plans that start with the orders table will appear to be less costly than is actually the case.

We will now look at how we can influence the optimizer to pick a better query plan for this query.

Option 1: Create an Index

As mentioned, the optimizer does not have any statistics on non-indexed columns. So one way to improve the optimizer's precision is to create an index on the column. For Query 21, since the filtering estimate for o_orderstatus is way off, we can try to see what happens if we create an index on this column:

CREATE INDEX i_o_orderstatus ON orders(o_orderstatus);

With this index, the query plan has changed:

id	select_type	table	type	key	rows	filtered	Extra
1	PRIMARY	nation	ALL	NULL	25	10.00	Using where; Using temporary; Using filesort
1	PRIMARY	supplier	ref	i_s_nationkey	400	100.00	NULL
1	PRIMARY	l1	ref	i_l_suppkey	600	33.33	Using where
1	PRIMARY	orders	eq_ref	PRIMARY	1	50.00	Using where
3	DEPENDENT SUBQUERY	l3	ref	PRIMARY	4	30.00	Using where
2	DEPENDENT SUBQUERY	l2	ref	PRIMARY	4	90.00	Using where

We see from the EXPLAIN output that the estimated filtering effect for orders is now 50%. Given that, the optimizer prefers a different join order, starting with the nationtable. This is the same join order as one got in MySQL 5.6, and the execution time with this plan is 2.5 seconds. Instead of accessing 50% of all orders, the query will now just access orders for suppliers in Japan. However, this improvement comes at the cost of having to maintain an index that will probably never be used!

Looking at Query 21, there is also an equality condition on another column without an index; n_name of the nationtable. For this column, 10% is actually a too high estimate. There are 25 nations in the table. Hence, the correct estimate should be 4%. What if we, instead, create an index on this column?

DROP INDEX i_o_orderstatus ON orders;
CREATE INDEX i_n_name ON nation(n_name);

Then we get this query plan:

id	select_type	table	type	key	rows	filtered	Extra
1	PRIMARY	nation	ref	i_n_name	1	100.00	Using index; Using temporary; Using filesort
1	PRIMARY	supplier	ref	i_s_nationkey	400	100.00	NULL
1	PRIMARY	l1	ref	i_l_suppkey	600	33.33	Using where
1	PRIMARY	orders	eq_ref	PRIMARY	1	10.00	Using where
3	DEPENDENT SUBQUERY	l3	ref	PRIMARY	4	30.00	Using where
2	DEPENDENT SUBQUERY	l2	ref	PRIMARY	4	90.00	Using where

In this case, our new index is actually used! Since scanning a table with 25 rows takes a neglible part of the total execution time, the savings for Query 21 are insignificant, but there might be other queries where such an index could be more useful.

Option 2: Join Order Hint

Instead of trying to improve statistics to get a better query plan, we can use hints to influence the optimizer's choice of query plan. The STRAIGHT_JOIN hint can be used to change the join order. It comes in two flavors:

STRAIGHT_JOIN right after SELECT

SELECT STRAIGHT_JOIN … FROM t1, t2, t3

t1 → t2 → t3

STRAIGHT_JOIN used as a join operator

… FROM t1 STRAIGHT_JOIN t2 …

… t1 → … → t2 …

We will use the second variant and specify that nationshould be processed before orders:

SELECT s_name, COUNT(*) AS numwait
FROM supplier
JOIN lineitem l1 ON s_suppkey = l1.l_suppkey
JOIN nation ON s_nationkey = n_nationkey
STRAIGHT_JOIN orders ON o_orderkey = l1.l_orderkey
WHERE o_orderstatus = 'F'
  AND l1.l_receiptdate > l1.l_commitdate
  AND EXISTS (SELECT * FROM lineitem l2
              WHERE l2.l_orderkey = l1.l_orderkey
                AND l2.l_suppkey <> l1.l_suppkey)
  AND NOT EXISTS (SELECT * FROM lineitem l3
                  WHERE l3.l_orderkey = l1.l_orderkey
                    AND l3.l_suppkey <> l1.l_suppkey
                    AND l3.l_receiptdate > l3.l_commitdate)
  AND n_name = 'JAPAN'
GROUP BY s_name ORDER BY numwait DESC, s_name LIMIT 100;

This way we force the optimizer to pick a query plan where nation comes before orders, and the resulting query plan is the "good one":

id	select_type	table	type	key	rows	filtered	Extra
1	PRIMARY	nation	ALL	NULL	25	10.00	Using where; Using temporary; Using filesort
1	PRIMARY	supplier	ref	i_s_nationkey	400	100.00	NULL
1	PRIMARY	l1	ref	i_l_suppkey	600	33.33	Using where
1	PRIMARY	orders	eq_ref	PRIMARY	1	10.00	Using where
3	DEPENDENT SUBQUERY	l3	ref	PRIMARY	4	30.00	Using where
2	DEPENDENT SUBQUERY	l2	ref	PRIMARY	4	90.00	Using where

In order to user STRAIGHT_JOIN we had to rearrange the tables in the FROM clause. This is a bit cumbersome, and to avoid this, we have in MySQL 8.0 introduced new join order hintsthat uses the new optimizer hint syntax. Using this syntax, we can add hints right after SELECT and avoid editing the rest of the query. In the case of Query 21, we can add hints like

SELECT /*+ JOIN_PREFIX(nation) */ …

SELECT /*+ JOIN_ORDER(nation, orders) */ …

to achieve the desired query plan.

Option 3: Disable Condition Filtering

Many optimizer features can be disabled by setting the optimizer_switch variable. The following statement will make the optimizer not use condition filtering estimates:

SET optimizer_switch='condition_fanout_filter=off';

Looking that the query plan as presented by EXPLAIN, we see that filtering is no longer taken into account:

id	select_type	table	type	key	rows	filtered	Extra
1	PRIMARY	nation	ALL	NULL	25	100.00	Using where; Using temporary; Using filesort
1	PRIMARY	supplier	ref	i_s_nationkey	400	100.00	NULL
1	PRIMARY	l1	ref	i_l_suppkey	600	100.00	Using where
1	PRIMARY	orders	eq_ref	PRIMARY	1	100.00	Using where
3	DEPENDENT SUBQUERY	l3	ref	PRIMARY	4	100.00	Using where
2	DEPENDENT SUBQUERY	l2	ref	PRIMARY	4	100.00	Using where

Note that you can set optimizer_switch at session level. Hence, it is possible to disable condition filtering for individual queries. However, this requires extra round-trips to the server to set optimizer_switch before and after the execution of the query.

(Option 4: Wait for Histograms)

We are working to improve the statistics available to the optimizer by introducing histograms. A histogram provides more detailed information about the data distribution in a table column. With histograms, the optimizer will be able to estimate pretty accurately the filtering effects also for conditions on non-indexed columns. Until then, you will have to resort to one of options presented above to improve bad query plans caused by inaccurate filtering estimates.

What to Do When the MySQL Optimizer Overestimates the Condition Filtering Effect

Example: DBT-3 Query 21

Option 1: Create an Index

Option 2: Join Order Hint

Option 3: Disable Condition Filtering

(Option 4: Wait for Histograms)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112