Returning a value can be faster than returning nothing ====================================================== [Published 2023-01-24] Which one of these two functions is faster? void func1(int x) { } int func2(int x) { return x; } It depends. Both of them can be faster depending on how they are used and which architecture you compile for. Here is an example: void func1(int x) { } void run(int x) { func1(x); func1(x); } This code compiles for RISC-V to the following, if the compiler doesn't know the implementation of `func1` when compiling `run`: func1: ret run: addi sp,sp,-16 sd s0,0(sp) sd ra,8(sp) mv s0,a0 call func1 mv a0,s0 ld s0,0(sp) ld ra,8(sp) addi sp,sp,16 tail func1 As we can see, the variable `x` in function `run` has to be saved into another register so that we can get it back for the second call to `func1`. This means that we also have to push that register to the stack because we don't (or pretend not to) know what `func1` does and which registers it modifies. If we instead write the code like this: int func2(int x) { return x; } void run(int x) { x = func2(x); func2(x); } (the difference is that `func2` returns its parameter and that we set `x` to the returned value after the first call) we get this output: func2: ret run: addi sp,sp,-16 sd ra,8(sp) call func2 ld ra,8(sp) addi sp,sp,16 tail func2 It's much shorter. Now the only register we have to push to the stack is the return address `ra` and we don't need to save `a0` to another register and we don't need to restore it between the calls. How did this happen? The explanation --------------- In the common RISC-V calling convention, the first argument uses the same register as the return value. This has the effects: - `func2` doesn't need to do anything to return `x` since it's already in `a0`. - We don't need to restore `x` between the function calls since we get `x` from the return value of the first call and then it's already in the right place for the second call. At the same time, `func2` returning `x` means that we will have `x` in a register after the first function call has returned, so we don't need to push it to the stack. Not as good in x86 ------------------ This trick has a great effect in RISC-V but it's not as great in x86. In x86, this code: void func1(int x) { } void run(int x) { func1(x); func1(x); } compiles to this: func1: ret run: push rbp mov ebp, edi call func1 mov edi, ebp pop rbp jmp func1 and this code: int func2(int x) { return x; } void run(int x) { x = func2(x); func2(x); } compiles to this: func2: mov eax, edi ret run: sub rsp, 8 call func2 add rsp, 8 mov edi, eax jmp func2 Just like in RISC-V, we avoid having to save the register, but the problem is that in x86, first parameter and return value don't use the same register. This means that there is an extra instruction in `func2` and we need to move from `eax` to `edi` between the calls to get `x` back into the right place for the argument.