# Rounding for fixed point calculations by Toshihiro Horie April 25, 2005 updated July 17, 2017

```Legend:
-------
* = endpoint is included in range
o = endpoint is excluded in range

On intel SSE, there exists a CVTSD2SI insruction with rounding mode controllable using the MXCSR register.

k are numbers in the set of all integers.

Rounding to Nearest Integer (Symmetric Arithmetic Rounding)
C++: floor(x+0.5), except when x=0.49999... [see JDK bug below]
feature: almost ideal rounding, except for bias at k+0.5 where k is integer
feature: this is probably what you learned in elementary school
feature: symmetric about the origin
Intel x64: not possible in single instruction, they prefer to use round to closest even
ARM AArch64: FCVTAS round to nearest with ties to away
-------------------------------------

rint(x)
|
|
3.0 +              *=====o
|
|
2.0 +        *=====o
|
|
1.0 +  *=====o
|
|
.--+--.--+--.--+--o=====o--+--.--+--.--+--.  x
-3.0  -2.0  -1.0    |    1.0    2.0  3.0
|
o=====*  + -1.0
|
|
o=====*        + -2.0
|
|
o=====*              + -3.0
|
|

Intel x87's Round to Closest (Nearest) Even Integer
Also known as Banker's Rounding
feature: default starting mode in Win32 apps
feature: ideal rounding from a numerical perspective
feature: MS DirectX uses this mode, along with single precision mode
feature: almost the same as ideal rounding except at 0.5+k points
where k is an integer.
C++: difficult to implement correctly without relying on platform-specific behavior (see notes at the end)
ARM NEON: int32x2_t  vcvt_s32_f32(float32x2_t a);  // VCVT.S32.F32 d0, d0
intel x64: _mm_round_sd(x, _MM_FROUND_TO_NEAREST_INT)
ARM AArch64: FCVTNS
------------------------------------------------------------------

rint_even(x)
|
|
3.0 +              o====o
|
|
2.0 +        *=====*
|
|
1.0 +  o=====o
|
|
.--+--.--+--.--+--*=====*--+--.--+--.--+--.  x
-3.0  -2.0  -1.0    |    1.0    2.0  3.0
|
o=====o  + -1.0
|
|
*=====*   -2.0 +
|
|
o=====o         -3.0 +
|
|

Rounding to Nearest Integer, but biased towards positive infinity
feature: easy to implement using floor
feature: for integer arguments, use add and shift right
C++: rint_pos(x) = floor(x+0.5), except when x= k + 0.49999...
intel x64: _mm_round_sd(x, _MM_FROUND_TO_POS_INF)
ARM AArch64: ???
--------------------------------------------------------------------

rint_pos(x)
|
|
3.0 +              *=====o
|
|
2.0 +        *=====o
|
|
1.0 +  *=====o
|
|
.--+--.--+--.--+--*=====o--+--.--+--.--+--.  x
-3.0  -2.0  -1.0    |    1.0    2.0  3.0
|
*=====o  + -1.0
|
|
*=====o        + -2.0
|
|
*=====o              + -3.0
|
|

Round Towards Zero (chop, fix, truncation)
feature: default rounding mode in ANSI C
C++: (int)x
feature: toward minus infinity for positive numbers,
toward positive infinity for negative numbers
feature: has a larger deadzone at zero
intel: Intel SSE3 has a FISTTP instruction for this.
intel x64: _mm_round_sd(x, _MM_FROUND_TO_ZERO)
ARM AArch64: FCVTZS
--------------------------------------------------------
int_cast(x)
|
|
3.0 +
|
|
2.0 +           *=====o
|
|
1.0 +     *=====o
|
|
.--+--.--+--.--o==.=====.==o--.--+--.--+--. x
-3.0  -2.0  -1.0    |    1.0    2.0  3.0
|
o=====*     + -1.0
|
|
o====*      -2.0 +
|
|
-3.0 +
|
|

Round towards Minus Infinity (floor)
feature: for integer or fixed point arguments, this is easy in hardware (use arithmetic shift right)
C++: floor(x)
intel x64: _mm_round_sd(x, _MM_FROUND_TO_NEG_INF)?
ARM AArch64: FCVTMS?
--------------------------------------------------------------

floor(x)
|
|
3.0 +
|
|
2.0 +           *=====o
|
|
1.0 +     *=====o
|
|
.--+--.--+--.--+--.--*=====o--.--+--.--+--.  x
-3.0  -2.0  -1.0    |    1.0    2.0  3.0
|
*=====o -1.0
|
|
*=====o     + -2.0
|
|
*=====o      -3.0 +
|
|

Ceiling (ceil)
C++: ceil(x) = -floor(-x)
feature: almost the mirror image of floor(x)
intel x64: _mm_round_sd(x, _MM_FROUND_TO_POS_INF)
ARM AArch64: FCVTPS(x)
----------------------------------------

ceil(x)
|
|
3.0 +           o=====*
|
|
2.0 +     o=====*
|
|
1.0 o=====*
|
|
.--+--.--+--.--o==.==*--.--+--.--+--.--+--.  x
-3.0  -2.0  -1.0    |    1.0    2.0  3.0
|
o=====*     + -1.0
|
|
o====*      -2.0 +
|
|
-3.0 +
|
|

```
