Fast Square Root

From ILabWiki

Table of contents

C++ Source Code for Computing Fast Square Root Approximations with Floating and Double Precision Floating Points

Here are listed a few different methods for computing a square root faster than the standard methods used with gcc and libm. Many of these methods are discussed in greater detail on the Wikipedia in Methods of computing square roots (http://en.wikipedia.org/wiki/Methods_of_computing_square_roots).

Heading

An optional header can be put in place if you implement all of these so that you can describe them to users who look at your source.

// ######################################################################
// ##### Fast Square Root functions
// ######################################################################
/* 
  
    These are several methods for computing a quick square root 
    approximation. The order is by decreasing speed / increasing accuracy

    Name           : opperation time relative to sqrt()

    fastSqrt_2     : 1/5
    fastSqrt_Bab   : 1/2
    fastSqrt_Q3    : 1/2
    fastSqrt_Bab_2 : 2/3

    Details:

    fastSqrt_2     - This is a simple log base 2 approximation using a union
                     to int and 3 simple arith operations. 

    fastSqrt_Bab   - This is the fastSqrt_2 with a Babylonian step added
                     to give it a quadratic improvement.

    fastSqrt_Q3    - This uses the Quake 3 Walsh InvRoot function with a
                     quick estimation and a single Newton step. This will call 
                     invSqrt listed below.

    fastSqrt_Bab_2 - This is the fastSqrt_Bab with two Babylonian Steps instead
                     of just one.  

   
   ---
   T. Nathan Mundhenk <mundhenk@usc.edu>
   Department of Computer Science
   University of Southern California

*/

Method using Log Base 2 Approximation

This method is quick and dirty and gives 5 times speed up over sqrt(). The table below gives an idea of its precision by comparing its results with those given by running the typical sqrt(). The answers were obtained on an AMD Opteron based machine using 64 bit Mandriva Linux and gcc version 4.1.1. The results shown are for the floating point methods. However, for each function, the double method yields similar results. To get an idea of how these functions operate on floating point numbers see the Wikipedia entry on IEEE 754 floating point standards (http://en.wikipedia.org/wiki/IEEE_754). For the tests of the different functions, the source code can be found here.

Sqrt(x) Method Results Actual Answer
1.000000 1.000000 1.000000
2.000000 1.500000 1.414214
8.000000 3.000000 2.828427
100.000000 10.250000 10.000000
3.141593 1.785398 1.772454
100000.000000 323.312500 316.227753
0.333333 0.583333 0.577350
inline float fastSqrt_2(const float x)  
{
  union
  {
    int i;
    float x;
  } u;
  u.x = x;
  u.i = (1<<29) + (u.i >> 1) - (1<<22); 
  return u.x;
}

inline double fastSqrt_2(const double x)  
{
  union
  {
    long i;
    double x;
  } u;
  u.x = x;
  u.i = (((long)1)<<61) + (u.i >> 1) - (((long)1)<<51); 
  return u.x;
}

Method using Log Base 2 Approximation With One Extra Babylonian Steps

This function takes the log base 2 approximation and adds a single Babylonian step to it. It still has a 2 times speed up over sqrt().

Sqrt(x) Method Results Actual Answer
1.000000 1.000000 1.000000
2.000000 1.416667 1.414214
8.000000 2.833333 2.828427
100.000000 10.003049 10.000000
3.141593 1.772501 1.772454
100000.000000 316.305389 316.227753
0.333333 0.577381 0.577350
inline float fastSqrt_Bab(const float x)  
{
  union
  {
    int i;
    float x;
  } u;
  u.x = x;
  u.i = (1<<29) + (u.i >> 1) - (1<<22); 

  // One Babylonian Step
  u.x = 0.5f * (u.x + x/u.x);

  return u.x;
}

inline double fastSqrt_Bab(const double x)  
{
  union
  {
    long i;
    double x;
  } u;
  u.x = x;
  u.i = (((long)1)<<61) + (u.i >> 1) - (((long)1)<<51); 

  // One Babylonian Step
  u.x = 0.5F * (u.x + x/u.x);

  return u.x;
}

Method using Log Base 2 Approximation With Two Extra Babylonian Steps

This is like the method above except that we use two Babylonian steps. We still have a 50% speed up over sqrt(). It gives fairly close approximations to the correct answer.

Sqrt(x) Method Results Actual Answer
1.000000 1.000000 1.000000
2.000000 1.414216 1.414214
8.000000 2.828431 2.828427
100.000000 10.000000 10.000000
3.141593 1.772454 1.772454
100000.000000 316.227783 316.227753
0.333333 0.577350 0.577350

Note that we simplify two Babylonian steps slightly and remove one multiplication operation by doing so:

u.x = 0.5f * (u.x + x/u.x);
u.x = 0.5f * (u.x + x/u.x);

becomes:

u.x =       u.x + x/u.x;
u.x = 0.25f*u.x + x/u.x;

Without this simplification, we obtain a speed up of just 33%, so this really helps.

inline float fastSqrt_Bab_2(const float x)  
{
  union
  {
    int i;
    float x;
  } u;
  u.x = x;
  u.i = (1<<29) + (u.i >> 1) - (1<<22); 
  
  // Two Babylonian Steps (simplified from:)
  // u.x = 0.5f * (u.x + x/u.x);
  // u.x = 0.5f * (u.x + x/u.x);
  u.x =       u.x + x/u.x;
  u.x = 0.25f*u.x + x/u.x;

  return u.x;
}

inline double fastSqrt_Bab_2(const double x)  
{
  union
  {
    long i;
    double x;
  } u;
  u.x = x;
  u.i = (((long)1)<<61) + (u.i >> 1) - (((long)1)<<51); 

  // Two Babylonian Steps (simplified from:)
  // u.x = 0.5F * (u.x + x/u.x);
  // u.x = 0.5F * (u.x + x/u.x);
  u.x =       u.x + x/u.x;
  u.x = 0.25F*u.x + x/u.x;

  return u.x;
}

Method using The Quake 3 Walsh Method

This method uses an integer shift and a Newton step to give and approximation for the inverse square root. The square root can then be extracted by dividing 1 by the answer or much faster multiplying InvSqrt(x) by x. It gives a 2 times speed up over sqrt().

Sqrt(x) Method Results Actual Answer
1.000000 0.998307 1.000000
2.000000 1.413860 1.414214
8.000000 2.827720 2.828427
100.000000 9.984488 10.000000
3.141593 1.771723 1.772454
100000.000000 315.763275 316.227753
0.333333 0.577020 0.577350

For the Quake 3 method credited to Greg Walsh we need to define a few magic constants. One is for the floating point version and the other is for the double version. The derivation of the magic numbers is given by Chris Lomont here (http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf)

// For Magic Derivation see: 
// Chris Lomont
// http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
// Credited to Greg Walsh.
// 32  Bit float magic number 
#define SQRT_MAGIC_F 0x5f3759df
// 64  Bit float magic number
#define SQRT_MAGIC_D 0x5fe6ec85e7de30da

Which are plugged into:

inline float invSqrt(const float x)
{
  const float xhalf = 0.5f*x;
 
  union // get bits for floating value
  {
    float x;
    int i;
  } u;
  u.x = x;
  u.i = SQRT_MAGIC_F - (u.i >> 1);  // gives initial guess y0
  return u.x*(1.5f - xhalf*u.x*u.x);// Newton step, repeating increases accuracy
}

inline float fastSqrt_Q3(const float x)
{
  return x * invSqrt(x);
}

inline double invSqrt(const double x)
{
  const double xhalf = 0.5F*x;
 
  union // get bits for floating value
  {
    double x;
    long i;
  } u;
  u.x = x;
  u.i = SQRT_MAGIC_D - (u.i >> 1);  // gives initial guess y0
  return u.x*(1.5F - xhalf*u.x*u.x);// Newton step, repeating increases accuracy
}

inline double fastSqrt_Q3(const double x)
{
  return x * invSqrt(x);
}

Setting a Default Function

If you want to emplement more than one of these methods, but would like to have users tend towards your favorite function you can add a function to create a default as such:

//! This just creates a default function call
inline float fastSqrt(const float x)
{
  return fastSqrt_Bab_2(x);
}

//! This just creates a default function call
inline double fastSqrt(const double x)
{
  return fastSqrt_Bab_2(x);
}

Comments? Suggestions?

Comments can be added in the discussion section of this page. Also, questions can be directed to either the iLab discussion board (http://ilab.usc.edu/cgi-bin/yabb/YaBB.pl) which is monitored by the authors or via email at: mailto:mundhenk@usc.edu


Copyright © 2009 by the University of Southern California, iLab and T. Nathan Mundhenk (http://www.mundhenk.com). All Rights Reserved.