Which is more effective: inc eax or add eax, 1?

I started learning assembly language, I know C / C++. Therefore, I decided to build on the knowledge that I already have, and began to disassemble the code I wrote and see how it works.

And immediately stumbled upon an illogical moment.
If the assembler has a function for increasing the value of the argument inc eax, then it must be different from adding 1 to the register by add eax,1.
I usually assumed that if there are separate such functions, then they are more efficient... but if you look at the code i++:

mov eax,dword ptr [i]
add eax,1
mov dword ptr [i],eax

Which is better to use: inc or add in this case?

P.S. I use Visual Studio 2008.

Author: kirelagin, 2011-02-14

4 answers

I think the compiler was stupid. Probably affected by the fact that theoretically the result of i++ can still be used later and because of this, it tries to save the result in r\eax. In fact, the compiler could generate just

inc dword ptr [i]

In addition, there are clearly as many as three instructions instead of one, and this is certainly slower (unless the i++ value is needed later).

As for the explicit comparison of add eax, 1 and inc eax, I suspect that there is no difference in speed, but the number 1 takes up space, which means that it will take up space in the code pre-selection queue and may play a negative role. Or maybe not play.

On the other hand, the compiler does persist in using add eax, 1:

#include <stdio.h>

int main(void) {
    int i = atoi("7");

    return ++i;
}

Gives a result without inc

00000000004004b0 <main>:
4004b0:       48 83 ec 08             sub    $0x8,%rsp
4004b4:       bf bc 05 40 00          mov    $0x4005bc,%edi
4004b9:       31 c0                   xor    %eax,%eax
4004bb:       e8 e8 fe ff ff          callq  4003a8 <atoi@plt>
4004c0:       48 83 c4 08             add    $0x8,%rsp
4004c4:       83 c0 01                add    $0x1,%eax

On the other hand, the Java JIT compiler sometimes uses inc.. I saw it myself..

UPD 2 kiralagin: Read in the official Intel dock, page 341.

 4
Author: cy6erGn0m, 2011-02-14 22:59:16

The whole point is that you are compiling in Debug mode. Find the radio button and set it to the release version)) Yes, you're right, increment is much faster. Just in the debag version, everything is done as much as possible "head-on", so that no one gets confused. And in the release version, you will be surprised by the strangeness of the commands (I experimented myself, I was pleasantly surprised). Increment is faster because it is shorter, less out of memory when executing read. This seems to be the main advantage. The fact that the mode is debagged also indicates that before the change, the data is read from memory, and then written back, which is noticeably longer than the operation itself. Plus, experience, because he himself once faced and found the reason;)

 5
Author: WhiteCrow, 2011-03-23 18:47:59

Well, you ask questions, of course! It's all so iron-dependent stuff!


The general rule is simple: in terms of generating assembly code, the compiler is always smarter (more knowledgeable) than you. If he does that, then it's better that way.


Second thought: You are based on a false message. The introduction of new machine commands is not always dictated by the efficiency of calculations. There is still such a thing as the efficiency of programmers-agree, inc eax writing is much easier than add eax,1 ;). And the operation is extremely frequent.

Now about why add is faster than inc. (Everything I write now does not have to be true - it just seems so to me). The fact is that inc, unlike add, does not change the state of the carry flag. And some other flags changes. Since I do not know of any architecture on which it is possible to set the flags one by one, and not all in a crowd, I try to state the following: inc ' y you will first need to read the current state of the transfer flag (which generates a false dependency with subsequent pipeline downtime). And add ' y doesn't need it, because it overwrites it. Hence the performance gain.

P.S. Disclaimer once again: I myself do not understand this very well yet (please contact me in half a year - apparently, by that time I will already be an expert in this field =) ). Just something like that somewhere I heard\read.

 4
Author: kirelagin, 2016-11-21 09:58:00

Gcc (GCC) 3.4.5 (mingw-vista special r3)

Uses incl for i++ j++

When compiling with -O3, it generally keeps variables in registers. gcc -S -O3

main ()
{
    int i = 0, j = 999;

    while (a(i)) {
      i++;
      j++;
      b(j);
    }
}

Makes (for while) code:

  xorl  %esi, %esi
  movl  $999, %ebx
  jmp   L2
.p2align 4,,7
L4:
  incl  %ebx          // j++
  incl  %esi          // i++
  movl  %ebx, (%esp)  // передача j в b()
  call  _b
L2:
  movl  %esi, (%esp)  // передача i
  call  _a
  testl %eax, %eax    // проверка результата, возвращаемого в регистре eax
  jne   L4

About the speed of execution. Personally, it seems to me that in a cyclic code of a reasonable size for a modern implementation of the x86 architecture, the execution speed will be, oddly enough at first glance, the same. This is due to the pre-selection of the teams and their by converting to the commands of a RISC-like processor core.

 2
Author: avp, 2011-03-23 21:50:41