Tuesday, September 7, 2021

SQL Server CE: Multiple single-column statistics connected by OR

In post SQL Server CE: Multiple single-column statistics, I discussed the selectivity when AND is used to join the columns in Where clause. Today, I'm going to talk about the OR operator

 

Let's say we have four predicates connected by OR with selectivites P0,P1,P2 and P3,

 

Legacy CE

 The combined selectivity is : 1-(1-P0)*(1-P1)*(1-P2)*(1-P3).

If there are only two predicates, the formula is relative simple: (P0+P1)-(P0*P1)

 

New CE

1.The combined selectivity is : 1-(1-P0)*(1-P1)^(1/2)*(1-P2)^(1/4)*(1-P3)^(1/8)

2.P0,P1 are the selectivity of each value of the column in WHERE clause and  P0>P1.

3.We can have up to 4 predicates. If there are more than 4 columns in the where clause, it only counts the first 4, the rest of them are ignored.

Combined selectivity of four predicates: 1-(1-P0)*(1-P1)^(1/2)*(1-P2)^(1/4)*(1-P3)^(1/8), where   P0>P1>P2>P3.

In practice, Please use following formula to calcualte the selectivity if the combined selectivity has more than 2 predicate.

1-

Liwei_0-1631025801463.png

 

 

As you expected, more columns used in where clause, more estiamted rows you get.

 

 

Here are examples , adventure 2019 OLTP  database is used in this example

use AdventureWorks2019

go

IF exists(select 1 from sys.tables where name='SalesOrderDetail' and schema_id=schema_id('dbo'))

drop table SalesOrderDetail

Go

select * into SalesOrderDetail  from Sales.SalesOrderDetail

GO

create statistics I_ProductID on SalesOrderDetail(ProductID) with fullscan

create statistics I_SalesOrderID on SalesOrderDetail(SalesOrderID) with fullscan

 

 

Here is the T-SQL query I'm going to test

select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288

 

The selectivity of ProductId 708 is 3007/121317=0.02478630

dbcc show_statistics(SalesOrderDetail,I_ProductID)

Liwei_1-1631025801464.png

 

 

The selectivity of SalesOrderID 44288 is 34/121317=0.00028025

Liwei_2-1631025801465.png

 

 

 

Legacy CE:

Combined selectivity:(P0+P1)-(P0*P1)=(0.02478630+0.00028025)-(0.02478630*0.00028025)=0.0250596036394250

Estimated row: 0.0250596036394250*121317=3040.15, is rounded down to 3040

select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 9481)--trace flag 9481 is used to force legacy CE

Liwei_3-1631025801466.png

 

 

 

 

New CE:

Combined selectivity:1-(1-P0)*sqrt(1-P1)=1-(1-0.02478630)*(sqrt(1-0.00028025)=0.0249229613952225

Estimated row: 0.0249229613952225*121317=3023.57 is rounded up to 3024

select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 2312)--2312 is used to force New CE

Liwei_4-1631025801466.png

 

 

Related trace flags

Trace flag 9471: Pickup the lowest selectivity(Most rows) from all predicates, and ignore rest of them. This trace flag only works when New CE is used.

Combined selectivity: Max(P0,P1,P2,P3,P4..)

In this case, it's Max(P0,P1)*CARD=MAX(0.02478630,0.00028025)*CARD=0.02478630*CARD=0.02478630*121317=3007

select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 2312,querytraceon 9471)---2312 is used to force New CE

Liwei_5-1631025801466.png

 

 

Trace flag 9472:Switch to independent formula, the result is as same as the result of Legacy CE and this trace flag only works when New CE is used.

select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 2312,querytraceon 9472)--force New CE

Liwei_6-1631025801467.png

 

 

Assessment:

T-SQL

use AdventureWorks2019

go

IF exists(select 1 from sys.tables where name='SalesOrderDetail' and schema_id=schema_id('dbo'))

drop table SalesOrderDetail

go

select * into SalesOrderDetail  from Sales.SalesOrderDetail

GO

--create five statistics

create statistics I_ProductID on SalesOrderDetail(ProductID) with fullscan

create statistics I_orderqty on SalesOrderDetail(orderqty) with fullscan

create statistics I_SalesOrderID on SalesOrderDetail(SalesOrderID) with fullscan

create statistics I_SpecialOfferid on SalesOrderDetail(SpecialOfferid) with fullscan

create statistics I_modifiedDate on SalesOrderDetail(modifiedDate) with fullscan

 

 

1.Please calculate the estimated rows of following queries and run the queries to verify

select *From SalesOrderDetail where ProductId=708 or orderqty=10 or modifiedDate='2011-05-31 00:00:00.000' or SpecialOfferid=4  or SalesOrderID=44288 option(recompile,querytraceon 9481)

go

select *From SalesOrderDetail where ProductId=708 or orderqty=10 or modifiedDate='2011-05-31 00:00:00.000' or SpecialOfferid=4  or SalesOrderID=44288 option(recompile,querytraceon 2312)

2.If there are multi-column stats, does it make different?

Posted at https://sl.advdat.com/3tjJUoa