Tuesday, March 15, 2022

Lesson Learned #191: Performance comparison using Inner vs Left Join in Linked Server/Elastic Query

Today, I worked on a very interesting performance service request when our customer is using Linked Server and/or Elastic Query joining with other local and remote queries. Following, I would like to share with you my lessons learned about the tests done.

 

We have the following scenario:

 

  • TableRemote1 that contains 22M rows running in a SQL Server remote server
  • TableRemote2 that contains 10M rows running in a SQL Server remote server.
  • TableLocal that contains 10M rows running in a SQL Server local server. 

 

Test 1:  Running the following query: 

 

SELECT TOP 100000 * FROM TableRemote1 
INNER JOIN TableLocal ON TableRemote1.Col1 = TableLocal.Col1

 

The execution plan was:

 

  • Run a query against SQL Server Remote Server to obtain all the rows of TableRemote1 
  • After it, join the data using Col1 with TableLocal Table. 

 

Test 2:  Running the following query: 

 

SELECT TOP 100000 * FROM TableRemote1 
INNER JOIN TableRemote2 ON TableRemote1.Col1 = TableRemote2.Col1
INNER JOIN TableLocal ON TableRemote1.Col1 = TableLocal.Col1

 

The execution plan was:

 

  • SQL SERVER sends the following query to the SQL SERVER remote server: SELECT ... FROM TableRemote1,TableRemote2 WHERE TableRemote1.Col1 = TableRemote2.Col1. which is great in terms of performance.
  • After it, join the data using Col1 with TableLocal Table.  

 

Test 3:  Running the following query: 

 

SELECT TOP 100000 * FROM TableRemote1 
LEFT JOIN TableRemote2 ON TableRemote1.Col1 = TableRemote2.Col1
INNER JOIN TableLocal ON TableRemote1.Col1 = TableLocal.Col1

 

The execution plan was:

 

  • SQL SERVER sends two queries:
    • First, obtain all the rows of the following query to SELECT ... FROM TableRemote1
    • Once the first query finished, runs, SELECT .. FROM TableRemote2
  • After it, join the data using Col1 with TableLocal Table.
  • Here, I saw an difference between Elastic Query and Linked Server. In Linked Server seems that we have some improvements that we don't have in Elastic Query, for example:
    • Running some DBCC SHOWSTATISTICS to review the cardinality.
    • Seems that the queries against TableRemote1 and TableRemote2 are running in parallel meanwhile Elastic Query is running one by one.

 

So, my lessons learned after reviewing these scenarios:

 

  • Use sp_execute_remote for Elastic Database Query and OpenQuery for Linked Server to have a single resulset from SQL Server Remote Server, will be the best and faster solution, because the TSQL will force query execution on the remote server.
  • If could be possible, try to allocate all the tables that you need to run the left join in an unique server (local/remote). 

 

Enjoy!                                      

Posted at https://sl.advdat.com/36jyyJAhttps://sl.advdat.com/36jyyJA