Thursday, February 24, 2022

SQL Server on Linux: ELF and PE Images Just Work

Moved from:


Last March I moved from 22 years in SQL Server support to the SQL Server development team, working on SQL Server on Linux project and reporting to Slava Oks.  As Slava highlights in his recent blog post, he also contacted me in early 2015 to assist with supportability of SQL Server on Linux.  I quickly got engaged and found that the SQL Development team had SQL running on Linux and within an hour I too had it running, in a VM, on my laptop.  It became very exciting to learn about the new technology and how we would expand the product I know and love.


I spent a year making plans to for the support changes needed, providing supportability feedback, testing debugger extensions and engaging in many other aspects of the project.   By March of 2016 Slava had convinced me to join the team.  He started me off with an easy project.  Upgrade the complier and get SOS to boot below Win32.  To accomplish this I had to understand the environment, how we are hosting the SQL Server images and the like.


Consistently, one of the most common questions I encounter is “How does it work?”  Several of the recent posts highlight the design:


Image Formats

This post takes a minute to focus on the specific concept of of ELF vs PE images.

  • ELF image format is the image format know to the Linux kernel.
  • PE image format is the image format known the Windows kernel.

In simplified terms the ELF and PE file formats, understood by the respective kernels, hold the assembly instructions for the image.

When I tell folks the sqlserver.exe, sqlmin.dll, … are the same, exact, PE binaries we ship for Windows (our traditional box product), the first question is always:  “How can the Linux kernel understand a PE image?”  


I then ask them to explain to me what they mean and I get a wide variety of answers which are often quite nebulous.  There are no burning secrets. Once you look at it from the CPU outward (as Slava talks about in the channel 9 video) it becomes clear. 


Without going into details, you boot your laptop or server and don’t give a thought about the binary format(s) of the operation system.  The computer works the same way if made by IBM, HP, Dell or other vendor no matter what the operating system.   What is really happening is that the operating system has registered a binary image and entry point with the boot loader.  The boot loader (bootstrapper) has enough information to find binaries, such as ntoskrnl.sys, and then tells the CPU to start executing instructions at the defined entry point.   From this point forward Windows loads drivers, starts services and provides you the Windows you are used to.  It is just running assembly instructions.   The same thing happens for Linux and other operating systems.


Another way to explain this is to take a page from old friend, David Campbell.  David taught me to explain things in everyday ways and to try to explain it to your mother.  Automobiles are a favorite of David’s and I grew up in a family that owned a farm machinery dealership so I like mechanical references.   The fenders on my Massey tractor are the same fenders used on other models of a Ford tractor.   They come from the same fender manufacture but are sold on tractors built by two different companies.  The only differences are the Massey is painted red and the Ford is painted white or blue and the bolt holes may be a bit different.   Paint it the color you want and drill the right bolt holds and I can use a fender from the Ford on my Massey.  Find those same concepts for software and you can do the exact same things, no virtualization and no performance or functionality compromises.


If you step back and think about the CPU, it runs a defined and finite set of instructions.  It does not matter if you are running Windows, Linux or some other system, everything boils down to a set of assembly instructions.  Let’s look at a more concrete example of adding two numbers.


int c = a + b;


0134167C 8B 45 F8             mov         eax,dword ptr [a] 
0134167F 03 45 EC             add         eax,dword ptr [b] 
01341682 89 45 E0             mov         dword ptr [c],eax 


These assembly instructions are the same on Linux as Windows because they are the CPU assembly instructions.   Now let’s build this simple application on Linux(ELF) and on Windows(PE.)  If you look at the difference in the binary images the differences are operating system specific (headers) but the actual code of execution is the same between the images.


What this means is that if you can abstract the logic for the binary formats and provide the necessary ABI/API functionality, an application just runs on the CPU.  Simplified, I can write a Windows application that understands the ELF headers.  I can then load an ELF based binary, read the ELF headers, find execution entry point, and invoke the entry point.  Going back to the simple example above the logic would add (a + b) and Windows does not care that the executable was ELF format.  We are running the set of assembly instructions the CPU understands.  


As this blog highlights the Library OS support for the core Windows APIs and services.   This means the startup of the user mode, library OS can register a binary and entry point and be invoked much like the bootstrapper when you boot your computer.   The ntoskrnl understands PE format and provides services and support for Win32 APIs.  Externally the API appears the same, internally the services are implemented as necessary with support from the abstraction layer.   Now you can load sqlservr.exe and associated components and execute the assembly instructions because executable code is not operating system specific.  What is specific is the API/ABI invocations.   If the WriteFile API is called on Windows we use the classic Windows implementation.  If WriteFile is called in the Linux based installation we can call the Linux ABI that writes to a file.  Simple and without a bunch of redirection activities.


SQL Server on Linux or Windows can be optimized to leverage the best set of instructions and optimal path to achieve the best outcome.


Posted at