Rounding for fixed point calculations
by Toshihiro Horie
April 25, 2005
updated July 17, 2017


Legend:
-------
* = endpoint is included in range
o = endpoint is excluded in range

On intel SSE, there exists a CVTSD2SI insruction with rounding mode controllable using the MXCSR register.

k are numbers in the set of all integers.

Rounding to Nearest Integer (Symmetric Arithmetic Rounding)
C++: floor(x+0.5), except when x=0.49999... [see JDK bug below]
feature: almost ideal rounding, except for bias at k+0.5 where k is integer
feature: this is probably what you learned in elementary school
feature: symmetric about the origin
Intel x64: not possible in single instruction, they prefer to use round to closest even
ARM AArch64: FCVTAS round to nearest with ties to away
-------------------------------------

                       rint(x)
                          |
                          |
                      3.0 +              *=====o
                          |                    
                          |                    
                      2.0 +        *=====o     
                          |                    
                          |                    
                      1.0 +  *=====o           
                          |                    
                          |                    
     .--+--.--+--.--+--o=====o--+--.--+--.--+--.  x
      -3.0  -2.0  -1.0    |    1.0    2.0  3.0  
                          |
                 o=====*  + -1.0
                          |
                          |
           o=====*        + -2.0
                          |
                          |
     o=====*              + -3.0
                          |
                          |




Intel x87's Round to Closest (Nearest) Even Integer
Also known as Banker's Rounding
feature: default starting mode in Win32 apps
feature: ideal rounding from a numerical perspective
feature: MS DirectX uses this mode, along with single precision mode
feature: almost the same as ideal rounding except at 0.5+k points
where k is an integer.
C++: difficult to implement correctly without relying on platform-specific behavior (see notes at the end)
ARM NEON: int32x2_t  vcvt_s32_f32(float32x2_t a);  // VCVT.S32.F32 d0, d0
intel x64: _mm_round_sd(x, _MM_FROUND_TO_NEAREST_INT)
ARM AArch64: FCVTNS
------------------------------------------------------------------

                      rint_even(x)
                          |
                          |
                      3.0 +              o====o
                          |              
                          |              
                      2.0 +        *=====*
                          |              
                          |               
                      1.0 +  o=====o      
                          |               
                          |               
     .--+--.--+--.--+--*=====*--+--.--+--.--+--.  x
      -3.0  -2.0  -1.0    |    1.0    2.0  3.0  
                          |
                 o=====o  + -1.0
                          |
                          |
           *=====*   -2.0 +
                          |
                          |
     o=====o         -3.0 +
                          |
                          |


Rounding to Nearest Integer, but biased towards positive infinity
feature: easy to implement using floor
feature: for integer arguments, use add and shift right
C++: rint_pos(x) = floor(x+0.5), except when x= k + 0.49999...
intel x64: _mm_round_sd(x, _MM_FROUND_TO_POS_INF)
ARM AArch64: ???
--------------------------------------------------------------------

                      rint_pos(x)
                          |
                          |
                      3.0 +              *=====o
                          |                    
                          |                    
                      2.0 +        *=====o     
                          |                    
                          |                    
                      1.0 +  *=====o           
                          |                    
                          |                    
     .--+--.--+--.--+--*=====o--+--.--+--.--+--.  x
      -3.0  -2.0  -1.0    |    1.0    2.0  3.0  
                          |
                 *=====o  + -1.0
                          |
                          |
           *=====o        + -2.0
                          |
                          |
     *=====o              + -3.0
                          |
                          |

Round Towards Zero (chop, fix, truncation)
feature: default rounding mode in ANSI C
C++: (int)x
feature: toward minus infinity for positive numbers, 
          toward positive infinity for negative numbers
feature: has a larger deadzone at zero
intel: Intel SSE3 has a FISTTP instruction for this.
intel x64: _mm_round_sd(x, _MM_FROUND_TO_ZERO)
ARM AArch64: FCVTZS
--------------------------------------------------------
                     int_cast(x)
                          |
                          |
                      3.0 +
                          |
                          |
                      2.0 +           *=====o
                          |
                          |
                      1.0 +     *=====o
                          |
                          |
     .--+--.--+--.--o==.=====.==o--.--+--.--+--. x
      -3.0  -2.0  -1.0    |    1.0    2.0  3.0  
                          |
              o=====*     + -1.0
                          |
                          |
         o====*      -2.0 +
                          |
                          |
                     -3.0 +
                          |
                          |

Round towards Minus Infinity (floor)
feature: for integer or fixed point arguments, this is easy in hardware (use arithmetic shift right)
C++: floor(x)
intel x64: _mm_round_sd(x, _MM_FROUND_TO_NEG_INF)?
ARM AArch64: FCVTMS?
--------------------------------------------------------------

                        floor(x)
                          |
                          |
                      3.0 +
                          |
                          |
                      2.0 +           *=====o
                          |
                          |
                      1.0 +     *=====o
                          |
                          |
     .--+--.--+--.--+--.--*=====o--.--+--.--+--.  x
      -3.0  -2.0  -1.0    |    1.0    2.0  3.0  
                          |
                    *=====o -1.0
                          |
                          |
              *=====o     + -2.0
                          |
                          |
        *=====o      -3.0 +
                          |
                          |


Ceiling (ceil)
C++: ceil(x) = -floor(-x)
feature: almost the mirror image of floor(x)
intel x64: _mm_round_sd(x, _MM_FROUND_TO_POS_INF)
ARM AArch64: FCVTPS(x)
----------------------------------------

                       ceil(x)
                          |
                          |
                      3.0 +           o=====*
                          |                  
                          |                  
                      2.0 +     o=====*        
                          |                   
                          |                   
                      1.0 o=====*              
                          |                     
                          |                   
     .--+--.--+--.--o==.==*--.--+--.--+--.--+--.  x
      -3.0  -2.0  -1.0    |    1.0    2.0  3.0  
                          |
              o=====*     + -1.0
                          |
                          |
         o====*      -2.0 +
                          |
                          |
                     -3.0 +
                          |
                          |

General information on: Custom Rounding.

More information on: Speed Optimizations.

Pentium rounding unsignaled overflow bug documented at: Microsoft
More about C99 rounding modes at FreeBSD mailing list
It turns out that floor(x+0.5) is not equal to round(x) when x=0.4999... ... This was the cause of this Java JDK bug.
More information on rounding bugs are at here.