# Inline speed and compiler optimization

I'm doing a bit of hands on research surrounding the speed benefits of making a function inline. I don't have the book with me, but one text I was reading, was suggesting a fairly large overhead cost to making function calls; and when ever executable size is either negligible, or can be spared, a function should be declared inline, for speed.

I've written the following code to test this theory, and from what I can tell, there is no speed benifit from declaring a function as inline. Both functions, when called 4294967295 times, on my computer, execute in 196 seconds.

My question is, what would be your thoughts as to why this is happening? Is it modern compiler optimization? Would it be the lack of large calculations taking place in the function?

Any insight on the matter would be appreciated. Thanks in advance friends.

``````#include < iostream >
#include < time.h >

// RESEARCH                                                   Jared Thomson 2010
////////////////////////////////////////////////////////////////////////////////
// Two functions that preform an identacle arbitrary floating point calculation
// one function is inline, the other is not.

double test(double a, double b, double c);
double inlineTest(double a, double b, double c);

double test(double a, double b, double c){
a = (3.1415 / 1.2345) / 4 + 5;
b = 9.999 / a + (a * a);
c = a *=b;
return c;
}

inline
double inlineTest(double a, double b, double c){
a = (3.1415 / 1.2345) / 4 + 5;
b = 9.999 / a + (a * a);
c = a *=b;
return c;
}

// ENTRY POINT                                                Jared Thomson 2010
////////////////////////////////////////////////////////////////////////////////
int main(){
const unsigned int maxUINT = -1;
clock_t start = clock();

//============================ NON-INLINE TEST ===============================//
for(unsigned int i = 0; i < maxUINT; ++i)
test(1.1,2.2,3.3);

clock_t end = clock();
std::cout << maxUINT << " calls to non inline function took "
<< (end - start)/CLOCKS_PER_SEC << " seconds.\n";

start = clock();

//============================ INLINE TEST ===================================//
for(unsigned int i = 0; i < maxUINT; ++i)
test(1.1,2.2,3.3);

end = clock();
std::cout << maxUINT << " calls to inline function took "
<< (end - start)/CLOCKS_PER_SEC << " seconds.\n";

getchar(); // Wait for input.
return 0;
} // Main.
``````

Assembly Output

PasteBin

If this test took 196 seconds for each loop, then you must not have turned optimizations on; with optimizations off, generally compilers don't inline anything.

With optimization on, however, the compiler is free to notice that your test function can be completely evaluated at compile time, and crush it down to "return [constant]" -- at which point, it may well decide to inline both functions since they're so trivial, and then notice that the loops are pointless since the function value is not used, and squash that out too! This is basically what I got when I tried it.

So either way, you're not testing what you thought you tested.

Function call overhead ain't what it used to be, compared to the overhead of blowing out the level-1 instruction cache, which is what aggressive inlining does to you. You can easily find reports online of gcc's `-Os` option (optimize for size) being a better default choice for large projects than `-O2`, and the big reason for that is that `-O2` inlines a lot more aggressively. I would expect it is much the same with MSVC.

The only way I know of to guarantee a function is inline is to `#define` it

For example:

``````#define RADTODEG(x) ((x) * 57.29578)
``````

That said, the only time I would bother with such a function would be in an embedded system. On a desktop/server the performance difference is negligible.

Run it in a debugger and have a look at the generated code to see if your function is always or never inlined. I think it's always a good idea to have a look at the assembler code when you want more knowledge about the optimization the compiler does.

The `inline` keyword is basically useless. It is a suggestion only. The compiler is free to ignore it and refuse to inline such a function, and it is also free to inline a function declared without the `inline` keyword.

If you are really interested in doing a test of function call overhead, you should check the resultant assembly to ensure that the function really was (or wasn't) inlined. I'm not intimately familiar with VC++, but it may have a compiler-specific method of forcing or prohibiting the inlining of a function (however the standard C++ `inline` keyword will not be it).

So I suppose the answer to the larger context of your investigation is: don't worry about explicit inlining. Modern compilers know when to inline and when not to, and will generally make better decisions about it than even very experienced programmers. That's why the `inline` keyword is often entirely ignored. You should not worry about explicitly forcing or prohibiting inlining of a function unless you have a very specific need to do so (as a result of profiling your program's execution and finding that a bottleneck could be solved by forcing an inline that the compiler has for some reason not done).

Re: the assembly:

``````; 30   :     const unsigned int maxUINT = -1;
; 31   :     clock_t start = clock();

mov esi, DWORD PTR __imp__clock
push    edi
call    esi
mov edi, eax

; 32   :
; 33   :     //============================ NON-INLINE TEST ===============================//
; 34   :     for(unsigned int i = 0; i < maxUINT; ++i)
; 35   :         blank(1.1,2.2,3.3);
; 36   :
; 37   :     clock_t end = clock();

call    esi
``````

This assembly is:

2. Storing the clock value

Note what's missing: calling your function a whole bunch of times

The compiler has noticed that you don't do anything with the result of the function and that the function has no side-effects, so it is not being called at all.

You can likely get it to call the function anyway by compiling with optimizations off (in debug mode).

Both the functions could be inlined. The definition of the non-inline function is in the same compilation unit as the usage point, so the compiler is within its rights to inline it even without you asking.

Post the assembly and we can confirm it for you.

EDIT: the MSVC compiler pragma for banning inlining is:

``````#pragma auto_inline(off)
void myFunction() {
// ...
}
#pragma auto_inline(on)
``````

Two things could be happening:

1. The compiler may either be inlining both or neither functions. Check your compiler documentation for how to control that.

2. Your function may be complex enough that the overhead of doing the function call isn't big enough to make a big difference in the tests.

Inlining is great for very small functions but it's not always better. Code bloat can prevent the CPU from caching code.

In general inline getter/setter functions and other one liners. Then during performance tuning you can try inlining functions if you think you'll get a boost.

Um, shouldn't

``````//============================ INLINE TEST ===================================//
for(unsigned int i = 0; i < maxUINT; ++i)
test(1.1,2.2,3.3);
``````

be

``````//============================ INLINE TEST ===================================//
for(unsigned int i = 0; i < maxUINT; ++i)
inlineTest(1.1,2.2,3.3);
``````

?

But if that was just a typo, would recommend that look at a dissassembler or reflector to see if the code is actually inline or still stack-ed.

Your code as posted contains a couple oddities.

1) The math and output of your test functions are completely independent of the function parameters. If the compiler is smart enough to detect that those functions always return the same value, that might give it incentive to optimize them out entirely inline or not.

2) Your main function is calling `test` for both the inline and non-inline tests. If this is the actual code that you ran, then that would have a rather large role to play in why you saw the same results.

As others have suggested, you would do well to examine the actual assembly code generated by the compiler to determine that you're actually testing what you intended to.