Returning a value can be faster than returning nothing
======================================================

[Published 2023-01-24]

Which one of these two functions is faster?

    void func1(int x) {
    }
    int func2(int x) {
        return x;
    }

It depends.  Both of them can be faster depending on how they are
used and which architecture you compile for.  Here is an example:

    void func1(int x) {
    }
    void run(int x) {
        func1(x);
        func1(x);
    }

This code compiles for RISC-V to the following, if the compiler
doesn't know the implementation of `func1` when compiling `run`:

    func1:
        ret
    run:
        addi    sp,sp,-16
        sd      s0,0(sp)
        sd      ra,8(sp)
        mv      s0,a0
        call    func1
        mv      a0,s0
        ld      s0,0(sp)
        ld      ra,8(sp)
        addi    sp,sp,16
        tail    func1

As we can see, the variable `x` in function `run` has to be saved
into another register so that we can get it back for the second call
to `func1`.  This means that we also have to push that register to the
stack because we don't (or pretend not to) know what `func1` does and
which registers it modifies.

If we instead write the code like this:

    int func2(int x) {
        return x;
    }
    void run(int x) {
        x = func2(x);
        func2(x);
    }

(the difference is that `func2` returns its parameter and that we set
`x` to the returned value after the first call) we get this output:

    func2:
        ret
    run:
        addi    sp,sp,-16
        sd      ra,8(sp)
        call    func2
        ld      ra,8(sp)
        addi    sp,sp,16
        tail    func2

It's much shorter.  Now the only register we have to push to the stack
is the return address `ra` and we don't need to save `a0` to another
register and we don't need to restore it between the calls.  How did
this happen?


The explanation
---------------

In the common RISC-V calling convention, the first argument uses the
same register as the return value.  This has the effects:

- `func2` doesn't need to do anything to return `x` since it's already
  in `a0`.
- We don't need to restore `x` between the function calls since we
  get `x` from the return value of the first call and then it's
  already in the right place for the second call.

At the same time, `func2` returning `x` means that we will have `x`
in a register after the first function call has returned, so we don't
need to push it to the stack.


Not as good in x86
------------------

This trick has a great effect in RISC-V but it's not as great in x86.

In x86, this code:

    void func1(int x) {
    }
    void run(int x) {
        func1(x);
        func1(x);
    }

compiles to this:

    func1:
        ret
    run:
        push    rbp
        mov     ebp, edi
        call    func1
        mov     edi, ebp
        pop     rbp
        jmp     func1

and this code:

    int func2(int x) {
        return x;
    }
    void run(int x) {
        x = func2(x);
        func2(x);
    }

compiles to this:

    func2:
        mov     eax, edi
        ret
    run:
        sub     rsp, 8
        call    func2
        add     rsp, 8
        mov     edi, eax
        jmp     func2

Just like in RISC-V, we avoid having to save the register, but the
problem is that in x86, first parameter and return value don't use
the same register.  This means that there is an extra instruction in
`func2` and we need to move from `eax` to `edi` between the calls to
get `x` back into the right place for the argument.