Wow64 implementation details
How is Wow64 implemented in Windows 11 25H2 ?
Introduction:
As you might know, 64bit versions of Windows allow you to run both 32 and 64bit applications without any problems. Running a 64bit application in a 64bit system is straight-forward and doesn't require any special care from the system. Some of you might be asking this question: "is running a 32bit application in a 64bit system different than running a 64 bit one?". To answer this question let me describe some major causes that make running 32bit applications in 64bit systems seems impossible without some modifications. The first thing that you need to know is that the kernel of a 64bit system expects calls coming from user-mode to follow the x64 calling convention which consists of passing the first four parameters in registers while the rest is pushed into the stack in the reverse order because as you might know the stack grows downward. This is not the case for the x86 calling convention which is much simpler as it consists of pushing all parameters into the stack one after the other (in reverse order too) and no registers are used at all. So, before jumping to the kernel through a syscall the system must update the call frame to follow the right convention. The second major problem is that the majority of the syscall stubs exported by NTDLL.dll or WIN32U.dll have one or more parameters that are instances of some structures. The most famous one is nt!_OBJECT_ATTRIBUTES. This structure's layout changes between 32bit and 64bit due to the difference in the pointer size which is 4 and 8 bytes in 32- and 64-bit systems respectively. If the structure is passed as it is the operating system kernel will misinterpret its fields leading probably to an unpredictable results or even a BSOD.
Fortunately, those problems have a solution that turns them completely transparent to the end user by making the call frame compatible with the x64 calling convention and adjusting the structures passed as parameters to become 64bit compatible in terms of offsets and sizes. This solution is known as "Windows 32bit on Windows 64bit" or for short Wow64 which is an intermediate subsystem consisting of a combination of 5 64bit dynamic link libraries each one having its purpose and role. wow64cpu.dll and wow64.dll are the most important as all the necessary modifications are implemented in them. The next three DLLs are wow64win.dll, wow64con.dll and finally wow64base.dll where each one is used by the Wow64 subsystem to convert a separate class of syscalls from 32 to 64bit.
In the next few sections, I will detail how the Wow64 subsystem works and how the 5 DLLs work in conjunction to achieve the final goal of making a 32bit application runnable in 64bit systems.
Startup:
Wow64 syscall stubs is the name by which I refer to the 32bit syscall stubs exported by the 32bit version of ntdll.dll or win32u.dll. All of them follow exactly the same pattern. Firstly, a 32bit value is moved into the EAX register which is used divided into 2 distinct parts; the lower 16 bits represent the syscall number (the SSN for short) which is used as an index into the kernel System Service Dispatcher Table, and the upper 16 bits that directs the system to either the fast or the slow path.
Next, Wow64SystemServiceCall( ) which is implemented in both ntdll(32).dll and win32u(32).dll is called to transfer the execution to the wow64cpu.dll. Wow64SystemServiceCall( ) is actually a simple indirect jump to the address stored in ntdll32!_Wow64Transition or win32u!_Wow64Transition which are two exports storing the address of wow64cpu!KiFastSystemCall( ).
wow64cpu!KiFastSystemCall( ) is the first routine that is considered part of the Wow64 subsystem that is called during the emulation phase. However, until now the processor doesn't switch to 64bit mode yet and no modifications were done. To switch to 64bit, a far jump to wow64cpu!KiFastSystemCall64( ) using 0x33 as the segment selector is executed. This selector corresponds to a 64bit code segment descriptor in the global descriptor table (GDT for short).
From now on, all executed instructions are treated as 64bit code not 32bit. wow64cpu!KiFastSystemCall64( ) is also one single instruction which is a near indirect jump to wow64cpu!CpupReturnFromSimulatedCode( ). The jump is done using a hardcoded index 0x1F in the Wow64 dispatch table (an array of function pointers having its address stored in the R15 register). wow64cpu!CpupReturnFromSimulatedCode( ) first moves the stack address where the parameters region starts to R11 (this is going to be used later to easily access the parameters) then is saves the return addresses from both the syscall stub (in this case ntdll32!NtSetEevent( )) and wow64cpu!Wow64SystemServiceCall( ) which is used at the end to return back to the caller. Finally, a jump to wow64cpu!TurboDispatchJumpAddressStart is done and this is the last step before choosing the path to use.
To determine which path to use, the upper 16bits of the syscall number (stored in EAX) are used as an index in the Wow64 dispatch table (always accessed through R15). After getting the right location to where the execution flow should go, the processor jumps to it.
The Fast Path:
I have seen many people saying that the Wow64 subsystem achieves its goal by performing 2 actions which are adjusting the call frame so it becomes 64bit compatible then calling the same syscall stub but from the 64bit version of the same DLL. While this is not totally false as it truly performs those actions before switching to the kernel; there are some syscalls that have less than 5 parameters and no one of them is a structure that needs to be adjusted. For this type of syscalls, Wow64 follows a fast path where it just moves the 4 parameters from the stack to the right registers (RCX, RDX, R8 and R9 respectively) then calls wow64cpu!CpupSyscallStub which is a normal syscall stub exported by wow64cpu.dll.
Internally Wow64 has four small code blocks each one is responsible for moving one parameter to the right register. The first is wow64cpu!Thunk4ArgSpSpSpNSp which is a single mov instruction that just places the fourth parameter in the R9 register (remember that the start address of the parameters block is stored in R11 as I said before).
The next block is wow64cpu!Thunk3ArgSpSpsp which as you might be thinking is responsible for moving the third parameter to the right register which is the R8 register.
The next one is obviously responsible for moving the second parameter to the RDX register. This block is name wow64cpu!Thunk2ArgSpSp.
Finally, Wow64 moves the first parameter to the R10 register through a simple mov instruction. You might be asking why it does move it to R10 instead of RCX despite the x64 calling clearly states that the first parameter must be in the RCX register. 64bit syscall stubs implemented in ntdll.dll and win32u.dll start by moving the value stored in RCX(the first parameter) to R10; they do this because RCX is overridden by the syscall instruction itself which sets it to the current value of the instruction pointer (the RIP register) before jump to the operating system syscall handler. So, instead of moving the first parameter to RCX than to R10, the Wow64 subsystem directly moves it to R10 and avoid the unnecessary move operation to RCX.
After making the call frame x64 compatible, wow64cpu!CpupSyscallStub( ) is called to perform the actual syscall avoiding the overhead of calling the syscall stub implemented in the 64bit version of ntdll.dll or win32u.dll.
The Slow Path:
Any syscall that takes more than 4 parameters or has some parameters that are instances of structures that have different layout between 32bit and 64bit, Wow64 has no other choice and must follow the slow path to make sure that no errors occur later. As I said before, wow64cpu!TurboDispatchJumpAddressStart determines to which location it will jump next depending on upper 16 bits of the system call number. In the case of ntdll!NtSetEvent( ) for example that takes 2 parameters and no one of them is a structure that needs to be adjusted, the fast method is used so the system jumps to wow64cpu!Thunk2ArgSpSp (because the syscall in this case takes only 2 parameters). However, to achieve the same goal for a syscall like ntdll!NtCreateThreadEx( ) that takes more than 4 parameters and many of them are structures that needs to be adjusted to become 64bit compatible, wow64cpu!TurboDispatchJumpAddressStart jumps to wow64cpu!ServiceNoTurbo (in IDA the symbol name is wow64cpu!TurboDispatchJumpAddressEnd) which is the entry point of the slow path, it is just a wrapper that calls wow64!Wow64SystemServiceEx( ) passing the same syscall number as the first parameter and the start address of the parameters block as the second.
Before detailing what wow64!Wow64SystemServiceEx( ) does and how it works, let me explain first the concept of service tables that are considered the main part involved in the slow path. During the slow path, Wow64 delegates the work of adjusting the structures passed as parameters to the syscall and making them 64bit compatible to many dispatch routines having names that start with "wh". The addresses of these routines are all stored in arrays of contiguous function pointers called service tables. Wow64 divides those special helpers into 4 classes and for each class it defines also the maximum allowed syscall number that can be used to index an entry in the corresponding service table. The first class represents the syscall stubs that are exported by ntdll.dll and its corresponding service table is accessed directly from wow64.dll. The second class represents syscall stubs that are exported by win32u.dll; they have names starting with either "NtUser" or "NtGdi"; The corresponding service table is imported from wow64win.dll. I only focused on those two classes but it's better to know that the remaining two corresponding service tables are imported from wow64con.dll and wow64base.dll.
To determine the right class of the syscall, wow64!Wow64SystemServiceEx( ) divides the syscall number into two parts. The lower 12 bits are used as an index in the corresponding service table to get the right wh helper to delegate the work to it. Only the first 2 bits of the remaining 20 bits are meaningful in this step; they are used to determine the syscall class (0 => Ntdll syscall stubs; 1 => NtUser/NtGdi syscall stubs; 2 => Con; 3 => Csr). The syscall class is used to get the corresponding service table maximum allowed syscall id to check whether to continue or fail the operation. If the provided syscall id (the lower 12 bits of the syscall number) is greater than the maximum allowed one, wow64!Wow64SystemServiceEx( ) returns STATUS_INVALID_SYSTEM_SERVICE.
If the comparison indicated that the syscall id is valid, it's used as an index into the corresponding service table to get the right wh helper that corresponds to that syscall then that helper is called directly with one parameter which the parameters block start address passed to wow64!Wow64SystemServiceEx( ).
The corresponding wh helper adjust all parameters that needs to be changed because their layout in 32bit is different from their layout in 64bit. The most common structure that is passed as a parameter to many syscalls is nt!_OBJECT_ATTRIBUTES. This structure is the first one that is adjusted by the wh helper that calls wow64!Wow64ShallowThunkAllocObjectAttributes32TO64_FNC( ) to perform this task passing the input nt!_OBJECT_ATTRIBUTES parameter as it is. The first step is allocating a buffer large enough to hold the 64bit version of the same structure using wow64!Wow64AllocateTempFromHeap( ) which allocates a fixed block from the heap then inserting it in a linked list storing temporary heap blocks. The allocation is done simply using ntdll!RtlAllocateHeap( ) setting the size to the requested one plus 16 (the first 16 bytes are reserved for the nt!_LIST_ENTRY used to link this new block to the linked list). The new block is inserted at the head of the list not the tail.
After allocating the buffer that is going to be initialized as a 64bit nt!_OBJECT_ATTRIBUTES instance (from now on, I will refer to this buffer as ObjectAttributes64 and to the input one by ObjectAttributes32), wow64!Wow64ShallowThunkAllocObjectAttributes32TO64_FNC( ) sets ObjectAttributes64->Length and ObjectAttributes64->Attributes to ObjectAttributes32->Length and ObjectAttributes32->Attributes respectively. Since nt!_OBJECT_ATTRIBUTES::ObjectName and nt!_OBJECT_ATTRIBUTES::SecurityDescriptor are also structures that need to be adjusted too, wow64!Wow64AllocateTempFromHeap( ) is called again to allocate a 64bit nt!_UNICODE_STRING then this new buffer is initialized using ObjectAttributes32->ObjectName (all members are set to their counterparts).
If the syscall takes other parameters that need to be adjusted, its corresponding wh helper does the necessary work. Finally, after converting all parameters to their respective 64bit versions, the system forwards the call to the 64bit counterpart of the called syscall (for example in this case ntdll64!NtCreateThreadEx( )) passing all required parameters.
Cleanup:
At the end, the system needs to return back to the caller and switch the processor back to 32bit mode. To do so, a indirect far jump is used. The target address is the return address from ntdll32!Wow64SystemServiceCall( ) and the segment selector is set to 0x23 which corresponds to the 32bit code segment. the top of the stack (the ESP register) is updated too before jumping back and it is set to address where the return value from the original 32bit syscall stub.
Conclusion:
That is all for now, I hope you have learned something from this article. Stay tuned, other posts about various topics will be published from time to time.


















Comments
Post a Comment