Monday, February 6, 2012

Optimized… in Assembler

OK, now for the fanatics as always: The same thing I just posted, but in assembler:

BitReverse PROC
PUSH r13

MOV rax, rcx
MOV r13, rcx
SHR rax, 1
AND rax, 0x5555555555555555
AND r13, 0x5555555555555555
SHL r13, 1
OR rax, r13

MOV r13, rax
SHR rax, 2
AND rax, 0x3333333333333333
AND r13, 0x3333333333333333
SHL r13, 2
OR rax, r13

MOV r13, rax
SHR rax, 4
AND rax, 0x0F0F0F0F0F0F0F0F
AND r13, 0x0F0F0F0F0F0F0F0F
SHL r13, 4
OR rax, r13

MOV r13, rax
SHR rax, 8
AND rax, 0x00FF00FF00FF00FF
AND r13, 0x00FF00FF00FF00FF
SHL r13, 8
OR rax, r13

MOV r13, rax
SHR rax, 16
AND rax, 0x0000FFFF0000FFFF
AND r13, 0x0000FFFF0000FFFF
SHL r13, 16
OR rax, r13

MOV r13, rax
SHR rax, 32
SHL r13, 32
OR rax, r13

POP r13

ret

BitReverse ENDP

Optimizing the BitReverse

Well well well, seems finally a way emerged to do the reversing without the need of a loop. And here is the trick:

Figure this simple starting value (16 bit for demo, works the same with 64 of course):
1010000010100000
Now let’s do a little bit math:
1) We just flip all odd and even bits using a 0101010101010101 mask. For this we use the original value twice, one time shifted left by 1 bit, one time shifted right, then ANDed with the mask and back ORed together. Here is the code:
i=((i>>1) & 0x5555) | ((i & 0x5555)<<1);
What does that do? Let’s break it up:
i>>1 translates to: 0101000001010000. Now do the AND 0x5555:
0101000001010000
0101010101010101
result:
0101000001010000.
The original ANDed with 0x5555 means:
1010000010100000
0101010101010101
result:
0000000000000000
Now OR those together and you get 0101000001010000

2) Now we use 0011001100110011 as bit mask and do it again, basically flipping pairs:
i=((i>>2) & 0x3333) | ((i & 0x3333)<<2)
First half:
0101000001010000 >>2 –> 0001010000010100
AND 0011001100110011
result: 0001000000010000
Second half:
0101000001010000 AND
0011001100110011
result: 0001000000010000 << 2 : 0100000001000000
Now OR those:
0001000000010000
0100000001000000
result: 0101000001010000

3) Now we swap nibbles, but I think you got the point…
0101000001010000 –> 0000010100000101

4) Last we swap the bytes. (Which in this case results in no change.)

Voila, the bits are flipped…

Here is the sample code for 64 bit in C#:

[SqlFunction]
public static Int64 Reverse(Int64 inVal)
{
    ulong Result=(ulong) inVal;
    // odd and even
    Result=((Result>>1)& 0x5555555555555555)|((Result & 0x5555555555555555)<<1);
    // pairs
    Result=((Result>>2)& 0x3333333333333333)|((Result & 0x3333333333333333)<<2);
    // nibbles
    Result=((Result>>4)& 0x0F0F0F0F0F0F0F0F)|((Result & 0x0F0F0F0F0F0F0F0F)<<4);
    // bytes
    Result=((Result>>8)& 0x00FF00FF00FF00FF)|((Result & 0x00FF00FF00FF00FF)<<8);
    // words
    Result=((Result>>16)& 0x0000FFFF0000FFFF)|((Result & 0x0000FFFF0000FFFF)<<16);
    // double words
    Result=(Result>>32)|(Result<<32);

    return (Int64) Result;
}

One thing: If you (like me) want to flip only 63 bits (to keep the value positive) just add Result=Result<<1; before the even/odd flip.

Thanks very much to the anonymous poster that brought up the idea.