Analyzing Disassembly of C++ Executables

Sometimes it's really useful to be able to see exactly what's happening behind the scenes of the C++ code I write. I've been spending some time on my sabbatical doing this.

One thing I sometimes do is check the results of compiled template metaprogramming code. This is code that generates results at compile time. As an example, the following is a common template metaprogramming example that calculates a factorial at compile time:

    #include <iostream>

template<std::size_t number>
struct Factorial
{
    static const std::size_t result = Factorial<number - 1>::result * number;
};

template<>
struct Factorial<1>
{
    static const std::size_t result = 1;
};

void main()
{
    std::cout << "Factorial 6 is " << Factorial<6>::result << std::endl;
}  

I compiled this in Visual Studio 2015, set a breakpoint in main, hit F5 and pressed ctrl-alt-d to display the assembly instructions shown below:

 00007FF66E601000  sub         rsp,28h  std::cout <<"Factorial 6 is " << Factorial<6>::result << std::endl;
00007FF66E601004  mov         rcx,qword ptr [__imp_std::cout (07FF66E603080h)]  
00007FF66E60100B  call        std::operator<<<std::char_traits%lt;char> > (07FF66E6010D0h)  
00007FF66E601010  mov         rcx,rax  
00007FF66E601013  mov         edx,2D0h  
00007FF66E601018  call        qword ptr [__imp_std::basic_ostream >::operator<< (07FF66E603088h)]  
00007FF66E60101E  mov         rcx,rax  
00007FF66E601021  lea         rdx,[std::endl > (07FF66E601290h)]  
00007FF66E601028  call        qword ptr [__imp_std::basic_ostream >::operator<< (07FF66E603090h)]  
 

As shown above in red, 2D0 hexadecimal is moved into the EDX register just before it is displayed to standard out. 2D0 hexadecimal is equal to 720 decimal, which is the result of 6 factorial. This shows the calculation was done at compile time.

Comparing Instruction Counts

Viewing assembly can also be helpful to see how many instructions are produced for a given line (or lines) of code. Comparing code that uses std::shared_ptr versus std::unique_ptr shows the quantity of additional instructions necessary for what I suspect is allocation and initialization of the std::shared_ptr reference count data. This data "control block" is discussed a bit in Scott Meyers' book "Effective Modern C++" Item #19.

Here's some trivial code for the sake of examining the difference between initialization of a shared_ptr and a unique_ptr:

Unique Pointer Example Shared Pointer Example
#include <iostream>
#include <memory>

void UniqueFoo(std::shared_ptr<int>& iPtr);

void main(int argc, char* argv[])
{
    auto arg = atoi(argv[1]);
    std::unique_ptr pInt{new int{arg}};
    UniqueFoo(pInt);
}
#include <iostream>
#include <memory>

void SharedFoo(std::shared_ptr<int>& iPtr);

void main(int argc, char* argv[])
{
    auto arg = atoi(argv[1]);
    std::shared_ptr pInt{new int{arg}};
    SharedFoo(pInt);
}

I compiled both of these examples in release mode in VS 2015 with full optimizations on (/Ox). The resulting generated assembly instructions for the second line of main of each program are shown below.

VS2015 Disassembly of Unique Pointer Construction (Line #2 of main in "Unique Pointer Example" Above):

00007FF7B90A1020  mov         ecx,4  
00007FF7B90A1025  call        operator new (07FF7B90A13B4h)  
00007FF7B90A102A  mov         rdi,rax  
00007FF7B90A102D  mov         dword ptr [rax],ebx  
00007FF7B90A102F  mov         qword ptr [rsp+48h],rax

VS2015 Disassembly of Shared Pointer Construction (Line #2 of main in "Shared Pointer Example" Above):

00007FF7A4241060  mov         ecx,4  
00007FF7A4241065  call        operator new (07FF7A4241524h)  
00007FF7A424106A  mov         dword ptr [rax],ebx  
00007FF7A424106C  xorps       xmm0,xmm0  
00007FF7A424106F  movdqu      xmmword ptr [pInt],xmm0  
00007FF7A4241075  mov         rdx,rax  
00007FF7A4241078  lea         rcx,[pInt]  
00007FF7A424107D  call        std::shared_ptr::_Resetp (07FF7A42413F0h)  

00007FF7F44013F0  mov         qword ptr [rsp+10h],rdx  
00007FF7F44013F5  push        rdi  
00007FF7F44013F6  push        r14  
00007FF7F44013F8  push        r15  
00007FF7F44013FA  sub         rsp,30h  
00007FF7F44013FE  mov         qword ptr [rsp+20h],0FFFFFFFFFFFFFFFEh  
00007FF7F4401407  mov         qword ptr [this],rbx  
00007FF7F440140C  mov         qword ptr [rsp+68h],rsi  
00007FF7F4401411  mov         r14,rdx  
00007FF7F4401414  mov         r15,rcx  
00007FF7F4401417  mov         ecx,18h  
00007FF7F440141C  call        operator new (07FF7F4401524h)  
00007FF7F4401421  mov         rdi,rax  
00007FF7F4401424  mov         dword ptr [rax+8],1  
00007FF7F440142B  mov         dword ptr [rax+0Ch],1  
00007FF7F4401432  lea         rax,[std::_Ref_count::`vftable' (07FF7F44033A0h)]  
00007FF7F4401439  mov         qword ptr [rdi],rax  
00007FF7F440143C  mov         qword ptr [rdi+10h],r14  
00007FF7F4401440  mov         rbx,qword ptr [r15+8]  
00007FF7F4401444  test        rbx,rbx  
00007FF7F4401447  je          std::shared_ptr::_Resetp+83h (07FF7F4401473h)  

00007FF7F4401473  mov         qword ptr [r15+8],rdi  
00007FF7F4401477  mov         qword ptr [r15],r14  
00007FF7F440147A  mov         rbx,qword ptr [this]  
00007FF7F440147F  mov         rsi,qword ptr [rsp+68h]  
00007FF7F4401484  add         rsp,30h  
00007FF7F4401488  pop         r15  
00007FF7F440148A  pop         r14  
00007FF7F440148C  pop         rdi  
00007FF7F440148D  ret  

Clearly there are a number of more instructions when using shared_ptr versus unique_ptr, and this doesn't even include the instructions for the extra "new" call (at address 0x7FF7F440141C) which I suspect is likely for the shared_ptr's control block.

Same Example Using GCC

Using GCC (without optimizations) with the same unique_ptr versus shared_ptr example above produced similar results as VS2015. However, with any optimizations applied (e.g. -O, -O1, -O2, -O3) it appears more optimization occurs on the shared_ptr construction/initialization with GCC than VS2015:

GCC Disassembly of Unique Pointer Construction (Line #2 of main in "Unique Pointer Example" Above):

   0x400b35 <main(int, char**)+37>:  mov    $0x4,%edi
   0x400d3a <main(int, char**)+42>:  mov    %rax,%rbp
   0x400b3d <main(int, char**)+45>:  callq  0x400ad0 <_Znwm@plt>
   0x400b42 <main(int, char**)+50>:  mov    %rsp,%rdi
   0x400b45 <main(int, char**)+53>:  mov    %ebx,(%rax)
   0x400b47 <main(int, char**)+55>:  mov    %rax,(%rsp) 

GCC Disassembly of Shared Pointer Construction (Line #2 of main in "Shared Pointer Example" Above):

   0x400d16 <main(int, char**)+38>:  mov    $0x4,%edi
   0x400d1b <main(int, char**)+43>:  mov    %rax,%rbp
   0x400d1e <main(int, char**)+46>:  callq  0x400cb0 <_Znwm@plt>
   0x400d23 <main(int, char**)+51>:  mov    $0x18,%edi
   0x400d28 <main(int, char**)+56>:  mov    %ebp,(%rax)
   0x400d2a <main(int, char**)+58>:  mov    %rax,%rbx
   0x400d2d <main(int, char**)+61>:  mov    %rax,(%rsp)
   0x400d31 <main(int, char**)+65>:  movq   $0x0,0x8(%rsp)
   0x400d3a <main(int, char**)+74>:  callq  0x400cb0 <_Znwm@plt>
   0x400d3f <main(int, char**)+79>:  movl   $0x1,0x8(%rax)
   0x400d46 <main(int, char**)+86>:  movl   $0x1,0xc(%rax)

This was compiled with GCC 5.4.0 using the -O3 optimization flag. The resulting executable was analyzed with GDB.

Date: December 28th, 2016 at 3:04pm
Author: Terence Darwen
Tags: C++, Assembly Language, Assembly Code, Disassembly, Sabbatical

Previous Next