bb.science
Class Math2

java.lang.Object
  extended by bb.science.Math2

public class Math2
extends Object

This class offers additional static mathematical methods beyond the ones offered in Math.

This class is multithread safe: it is immutable (both its immediate state, as well as the deep state of its fields).

Author:
Brent Boyer
See Also:
NIST Java numerics page

Nested Class Summary
static class Math2.UnitTest
          See the Overview page of the project's javadocs for a general description of this unit test class.
 
Field Summary
static double inverseSqrt2pi
          Stores the value of 1 / sqrt(2*pi).
static double normalizationErrorTolerance_default
          A default value for the errorTolerance param of the normalize method.
 
Constructor Summary
private Math2()
          This sole private constructor suppresses the default (public) constructor, ensuring non-instantiability outside of this class.
 
Method Summary
static double[][] autocorrelation(double[] numbers)
          Returns the autocorrelation function (acf) of numbers along with confidence intervals (CIs) for each element.
static double[] autocovariance(double[] numbers)
          Returns the autocovariance function (acvf) of numbers.
static int byteToUnsignedInt(byte b)
          Returns an int that equals the unsigned value of b's bits.
private static double[] calcPdfTheory(Bins bins, double mean, double sd)
           
static double[] checkNumbers(double[] numbers, boolean infinityBad)
          Checks that numbers is non-null, non-empty, and every element is non-NaN.
static int compare(double d1, double d2)
          Compares the two double args: returns -1 if d1 < d2, 0 if d1 == d2, 1 if d1 > d2.
static int compare(int i1, int i2)
          Compares the two int args: returns -1 if i1 < i2, 0 if i1 == i2, 1 if i1 > i2.
static int compare(long l1, long l2)
          Compares the two long args: returns -1 if l1 < l2, 0 if l1 == l2, 1 if l1 > l2.
static boolean equals(double d1, double d2)
          Determines if the two double args are equal or not.
static double gaussianAndersonDarling(double[] numbers)
          Calculates the Anderson–Darling test statistic (as corrected for sample size) for numbers against an assumed Gaussian (i.e. normal) probability density function (PDF).
static double gaussianCdf(double x)
          Returns the value of the standard (i.e. mean = 0.0 and sd (standard deviation) = 1.0) Gaussian (i.e. normal) cumulative distribution function (CDF) of x.
static double gaussianCdf(double x, double mean, double sd)
          Returns the value of the Gaussian (i.e. normal) cumulative distribution function (CDF) of x, given the parameters mean and sd (standard deviation).
static GaussianFit gaussianFit(double[] numbers)
          Determines that Gaussian (i.e. normal) probability density function (PDF) which best fits numbers.
static double gaussianKolmogorovSmirnov(double[] numbers, double meanG, double sdG)
          Calculates the "one sample" Kolmogorov–Smirnov statistic adjusted for the sample size (i.e.
static double gaussianPdf(double x)
          Returns the value of the standard (i.e. mean = 0.0 and sd (standard deviation) = 1.0) Gaussian (i.e. normal) probability density function (PDF) of x.
static double gaussianPdf(double x, double mean, double sd)
          Returns the value of the Gaussian (i.e. normal) probability density function (PDF) of x, given the parameters mean and sd (standard deviation).
static int hammingDistance(int bits1, int bits2)
          Returns the Hamming distance between 2 ints.
static int hammingWeight(int i)
          Returns the Hamming weight of an int.
static boolean hasSameSign(double x1, double x2)
          Determines whether or not x1 and x2 have the same sign: returns sign(x1) == sign(x2).
static boolean isNormalized(double[] numbers)
          Returns isNormalized(numbers, normalizationErrorTolerance_default).
static boolean isNormalized(double[] numbers, double errorTolerance)
          Reports whether or not numbers is normalized within an error specified by errorTolerance.
static boolean isWithin(double a, double b, double epsilon)
          Determines whether or not a and b are within epsilon of each other, that is, that the distance between a and b is <= epsilon.
static boolean isWithinOneUlp(double a, double b)
          Determines whether or not a and b are within one ulp of each other.
static double[] linearLeastSquaresFit(double[] xValues, double[] yValues)
          Given a series of 2D ordered pairs stored in the xValues and yValues arrays, that is, the points (xValues[0], yValues[0]), (xValues[1], yValues[1]), ..., this method determines the coefficients a and b of the linear fit y = a + bx as well as some of the fitness measures.
static double log(double base, double x)
          Returns the logarithm of x in the supplied base.
static double log10(double x)
          Returns the logarithm of x in base 10.
static double magnitude(double x)
          Returns the magnitude of x in a decimal (i.e. power of 10) scale.
static double max(double[] numbers)
          Returns the maximum element of numbers.
static double mean(double[] numbers)
          Returns the arithmetic mean of numbers.
static double median(double[] numbers)
          Returns the median element of numbers.
static double min(double[] numbers)
          Returns the minimum element of numbers.
static double[] minMax(double[] numbers)
          Returns both the minimum and maximum element of numbers.
static int modulo(int a, int b)
          Returns a mod b.
static double nextRandomWithMagnitude(int magnitude)
          Returns a number with the specified magnitude but random coefficient.
static double normalizationSum(double[] numbers)
          Returns the sum of every element of numbers, which is also the normalization factor for numbers.
static double[] normalize(double[] numbers)
          Simply calls normalize(numbers, normalizationErrorTolerance_default).
static double[] normalize(double[] numbers, double errorTolerance)
          Normalizes numbers, that is, divides each element by that constant factor which causes the sum of (the new values of) numbers to equal 1.
static int orderOfMagnitude(double x)
          Returns the order of magnitude of x in a decimal (i.e. power of 10) scale.
static double power10(int power)
          Returns the specified integer power of 10.
static double quantile(double[] numbers, int k, int q)
          Returns the kth q-quantile of numbers.
static double sd(double[] numbers)
          Returns sd(numbers, true).
static double sd(double[] numbers, boolean biased)
          Returns sd(numbers, mean(numbers), biased).
static double sd(double[] numbers, double mean)
          Returns sd(numbers, mean, true).
static double sd(double[] numbers, double mean, boolean biased)
          Returns the standard deviation of numbers.
static double sign(double x)
          Implements the sign function: returns -1 if x < 0, 0 if x == 0, 1 if x > 0.
static double sst(double[] numbers)
          Returns sst(numbers, mean(numbers)).
static double sst(double[] numbers, double mean)
          Caluculates the SST (Sum of Squares, Total), that is, the sum of the squares of the differences from mean of each element of numbers.
static double[] subtractParallelComponent(double[] v1, double[] v2)
          Subtracts from vector v1 that component which lies parallel to vector v2.
static double sum(double[] numbers)
          Returns the sum of every element of numbers.
static double variance(double[] numbers)
          Returns variance(numbers, true).
static double variance(double[] numbers, boolean biased)
          Returns variance(numbers, mean(numbers), biased).
static double variance(double[] numbers, double mean)
          Returns variance(numbers, mean, true).
static double variance(double[] numbers, double mean, boolean biased)
          Returns the variance of numbers.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

inverseSqrt2pi

public static final double inverseSqrt2pi
Stores the value of 1 / sqrt(2*pi).


normalizationErrorTolerance_default

public static final double normalizationErrorTolerance_default
A default value for the errorTolerance param of the normalize method.

See Also:
Constant Field Values
Constructor Detail

Math2

private Math2()
This sole private constructor suppresses the default (public) constructor, ensuring non-instantiability outside of this class.

Method Detail

equals

public static boolean equals(double d1,
                             double d2)
Determines if the two double args are equal or not. The sole reason why this method was written was because the computation d1 == d2 fails to handle the nasty case that both d1 and d2 are NaN: this method returns true in that case, whereas that computation returns false because NaN always returns false when used in tests like that.


sign

public static double sign(double x)
                   throws IllegalArgumentException
Implements the sign function: returns -1 if x < 0, 0 if x == 0, 1 if x > 0.

Starting with JDK 1.5, there is a new method Math.signum which is equivalent to this method except that it returns NaN when presented with a NaN argument instead of throwing an IllegalArgumentException like this method does.

Throws:
IllegalArgumentException - if x is not a comparable number (i.e. is NaN)

hasSameSign

public static boolean hasSameSign(double x1,
                                  double x2)
                           throws IllegalArgumentException
Determines whether or not x1 and x2 have the same sign: returns sign(x1) == sign(x2).

Throws:
IllegalArgumentException - if x is not a comparable number (i.e. is NaN)

compare

public static int compare(double d1,
                          double d2)
                   throws IllegalArgumentException
Compares the two double args: returns -1 if d1 < d2, 0 if d1 == d2, 1 if d1 > d2.

The original motivation for this method was to simplify the writing of Comparable/Comparator. The reason why you cannot simply return d1 - d2 is because an int value is needed. You could, however, return (int) sign( d1 - d2 ), which is equivalent to compare(d1, d2).

Throws:
IllegalArgumentException - if d1 and d2 are not comparable; should only happen if at least one of them is NaN

compare

public static int compare(int i1,
                          int i2)
Compares the two int args: returns -1 if i1 < i2, 0 if i1 == i2, 1 if i1 > i2.

The original motivation for this method was to simplify the writing of Comparable/Comparator. The reason why you cannot simply return i1 - i2 is because Java has retarded numerical behavior (instead of spilling over into infinite values or throwing an Exception, excessive differences "wrap around", for instance Integer.MAX_VALUE - (-1) == Integer.MIN_VALUE).


compare

public static int compare(long l1,
                          long l2)
Compares the two long args: returns -1 if l1 < l2, 0 if l1 == l2, 1 if l1 > l2.

The original motivation for this method was to simplify the writing of Comparable/Comparator. The reason why you cannot simply return l1 - l2 is because a) an int value is needed and b) Java has retarded numerical behavior (instead of spilling over into infinite values, excessive differences "wrap around", for instance Long.MAX_VALUE - (-1) == Long.MIN_VALUE).


modulo

public static int modulo(int a,
                         int b)
                  throws IllegalArgumentException
Returns a mod b.

This method is a true modulo function: b must be positive, and the result is guaranteed to lie inside [0, b - 1]. Therefore, it differs from the java % operator, which is more precisely known as the remainder operator because % can accept any non-zero value for b, but will produce negative results when a < 0.

Throws:
IllegalArgumentException - if b <= 0

log

public static double log(double base,
                         double x)
Returns the logarithm of x in the supplied base.


log10

public static double log10(double x)
Returns the logarithm of x in base 10.


power10

public static double power10(int power)
Returns the specified integer power of 10.


magnitude

public static double magnitude(double x)
Returns the magnitude of x in a decimal (i.e. power of 10) scale.

To be precise, let 10^exponent designate the largest power of 10 which does not exceed |x|. Then this method returns 10^exponent (not exponent, like the orderOfMagnitude method does).


orderOfMagnitude

public static int orderOfMagnitude(double x)
Returns the order of magnitude of x in a decimal (i.e. power of 10) scale.

To be precise, let 10^exponent designate the largest power of 10 which does not exceed |x|. Then this method returns exponent (not 10^exponent, like the magnitude method does).

Another way to view this is if x is written in scientific notation as

(+/-)a.bc... * 10^exponent
where a, b, c... are all decimal digits and additionally a > 0, then this method returns exponent. For example, 54321 is written in scientific notation as 5.4321*10^4, so this method returns 4 if supplied with 54321.

See Also:
Orders of magnitude

nextRandomWithMagnitude

public static double nextRandomWithMagnitude(int magnitude)
Returns a number with the specified magnitude but random coefficient. To be precise, the generic template for a positive number written in scientific notation is
x.yz... * 10^exponent
where x is a non zero digit and y,z,... are unrestricted digits, so that x.yz... is a number in the range [1, 10). What this method does is generate a random number for the coefficient component (x.yz...) and multiples it times the specified power of 10.


gaussianPdf

public static double gaussianPdf(double x)
                          throws IllegalArgumentException
Returns the value of the standard (i.e. mean = 0.0 and sd (standard deviation) = 1.0) Gaussian (i.e. normal) probability density function (PDF) of x.

Contract: the result is always in the range [0, inverseSqrt2pi], and is never NaN or infinite.

Throws:
IllegalArgumentException - if x is NaN

gaussianPdf

public static double gaussianPdf(double x,
                                 double mean,
                                 double sd)
                          throws IllegalArgumentException
Returns the value of the Gaussian (i.e. normal) probability density function (PDF) of x, given the parameters mean and sd (standard deviation).

Contract: the result is always in the range [0, inverseSqrt2pi / sd], and is never NaN or infinite.

Throws:
IllegalArgumentException - if a combination of x, mean, sd is encountered which causes the result to be invalid (e.g. any arg is NaN, or x and mean are identically signed infinities, or sd is infinite)

gaussianCdf

public static double gaussianCdf(double x)
                          throws IllegalArgumentException
Returns the value of the standard (i.e. mean = 0.0 and sd (standard deviation) = 1.0) Gaussian (i.e. normal) cumulative distribution function (CDF) of x.

Contract: the result is always in the range [0, 1], and is never NaN or infinite.

Throws:
IllegalArgumentException - if x is NaN

gaussianCdf

public static double gaussianCdf(double x,
                                 double mean,
                                 double sd)
                          throws IllegalArgumentException
Returns the value of the Gaussian (i.e. normal) cumulative distribution function (CDF) of x, given the parameters mean and sd (standard deviation).

Contract: the result is always in the range [0, 1], and is never NaN or infinite. The implementation here should be a monotonically increasing function of x. It is claimed to be "accurate to double precision throughout the real line" (see this article; the code here is adapted from Figure 2).

Throws:
IllegalArgumentException - if a combination of x, mean, sd is encountered which causes the result to be invalid (e.g. any arg is NaN, or x and mean are identically signed infinities, or sd is infinite)

gaussianFit

public static GaussianFit gaussianFit(double[] numbers)
                               throws IllegalArgumentException
Determines that Gaussian (i.e. normal) probability density function (PDF) which best fits numbers.

The mean and sd (standard deviation) for the Gaussian are simply calculated as the sample mean and sample standard deviation of numbers. This appears to be the standard procedure for fitting a Gaussian PDF.

Note that the values returned in the bounds field of the result are the mid points of the bin intervals (i.e. they are not interval boundary points).

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN

calcPdfTheory

private static double[] calcPdfTheory(Bins bins,
                                      double mean,
                                      double sd)

gaussianAndersonDarling

public static double gaussianAndersonDarling(double[] numbers)
                                      throws IllegalArgumentException
Calculates the Anderson–Darling test statistic (as corrected for sample size) for numbers against an assumed Gaussian (i.e. normal) probability density function (PDF). See also this reference and this discussion.

Interpretation: if the result exceeds 0.632/0.751/.870/1.029, then the null hypothesis is rejected for a 10%/5%/2.5%/1% respectively significance-level (alpha) test. In other words, the smaller the result, the more likely it is that numbers comes from the Gaussian distribution. Recall: the null hypothesis is that numbers follows the Gaussian distribution whose mean and sd are equal to the sample mean and sample sd of numbers, and alpha is the probability of a Type I error (rejecting the null hypothesis when it is in fact true); see this reference.

Pros:

  1. the computation is fairly easy to program, and the only special function involved is the Gaussian CDF
  2. this test statisic is said to be the best one for precisely identifying Gaussian distributions
Cons:
  1. the computation requires copying numbers, and then relocating/rescaling and sorting that new array into standardized values
  2. this method can return Infinity when presented with data that departs significantly from a Gaussian distribution. This can occur, for example, with data that is almost entirely Gaussian but has just a single outlier that is many standard devations away from the mean. Debugging on 2009-09-28 showed that these infinite results are caused by inaccuracy in the current gaussianCdf implementation (i.e. it returns 0 or 1 for non-infinite values). This causes the logarithms used by the Anderson–Darling test statistic to become infinite. These infinite results start for an outlier ~8.3 standard deviations away from the mean.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN

gaussianKolmogorovSmirnov

public static double gaussianKolmogorovSmirnov(double[] numbers,
                                               double meanG,
                                               double sdG)
                                        throws IllegalArgumentException
Calculates the "one sample" Kolmogorov–Smirnov statistic adjusted for the sample size (i.e. Dn * sqrt(n)) for numbers against the Gaussian (i.e. normal) probability density function (PDF) that is specified by meanG and sdG. See also this reference.

Interpretation: assuming n (i.e. numbers.length) > 40, then if the result exceeds 1.07/1.22/1.36/1.52/1.63, then the null hypothesis is rejected for a 20%/10%/5%/2%/1% respectively significance-level (alpha) test. In other words, the smaller the result, the more likely it is that numbers comes from the Gaussian distribution. Recall: the null hypothesis is that numbers follows the Gaussian distribution whose mean and sd are given by meanG and sdG, and alpha is the probability of a Type I error (rejecting the null hypothesis when it is in fact true); see Table 2.1 of this reference. Note that meanG and sdG are supplied, and are not automatically calculated from the sample mean and sample sd of numbers. (If this latter effect is desired, then the critical values must change for proper interpretation of the result; see this reference. or this reference.)

Pros:

  1. the computation is fairly easy to program, and the only special function involved is the Gaussian CDF
  2. this method is far less vulnerable to returning Infinity compared to the Anderson-Darling calculation
Cons:
  1. the computation requires copying numbers, and then sorting that new array
  2. this test statisic is not as sensitive as the Anderson-Darling test

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN

hammingDistance

public static int hammingDistance(int bits1,
                                  int bits2)
Returns the Hamming distance between 2 ints. The ints are viewed as binary strings, so the result is the number of bits where the 2 ints differ. In particular, this method simply returns hammingWeight( bits1 ^ bits2 ).

See Also:
Wikipedia article on Hamming Distance

hammingWeight

public static int hammingWeight(int i)
Returns the Hamming weight of an int. The int is viewed as a binary string, so the result is the number of bits equal to 1 (i.e. the "bit count").

See Also:
Wikipedia article on Hamming Distance

byteToUnsignedInt

public static int byteToUnsignedInt(byte b)
Returns an int that equals the unsigned value of b's bits. Specifically, the bit pattern of its least significant byte in the result is identical to the bit pattern of b, and the most significant bytes are all 0.


isWithinOneUlp

public static boolean isWithinOneUlp(double a,
                                     double b)
                              throws IllegalArgumentException
Determines whether or not a and b are within one ulp of each other.

Throws:
IllegalArgumentException - if a or b is NaN or infinite

isWithin

public static boolean isWithin(double a,
                               double b,
                               double epsilon)
                        throws IllegalArgumentException
Determines whether or not a and b are within epsilon of each other, that is, that the distance between a and b is <= epsilon.

Throws:
IllegalArgumentException - if a, b, or epsilon is NaN or infinite

checkNumbers

public static double[] checkNumbers(double[] numbers,
                                    boolean infinityBad)
                             throws IllegalArgumentException
Checks that numbers is non-null, non-empty, and every element is non-NaN. In addition, if infinityBad is true, then checks that every element is not infinite.

Returns:
returns numbers, to enable method call chaining
Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element is NaN, or is infinite

sum

public static double sum(double[] numbers)
                  throws IllegalArgumentException
Returns the sum of every element of numbers.

Throws:
IllegalArgumentException - if numbers == null

normalize

public static double[] normalize(double[] numbers)
                          throws IllegalArgumentException,
                                 IllegalStateException
Simply calls normalize(numbers, normalizationErrorTolerance_default).

Parameters:
numbers - the array of numbers to normalize
Returns:
returns numbers, to enable method call chaining
Throws:
IllegalArgumentException - if numbers == null; any element of numbers is < 0, is NaN, or is infinite; every element of numbers is 0
IllegalStateException - if normalization failed, which is defined as isNormalized(numbers, normalizationErrorTolerance_default) returns false

normalize

public static double[] normalize(double[] numbers,
                                 double errorTolerance)
                          throws IllegalArgumentException,
                                 IllegalStateException
Normalizes numbers, that is, divides each element by that constant factor which causes the sum of (the new values of) numbers to equal 1.

The elements of numbers must be legitimate values for normalization to even make sense. Specifically, each must be >= 0, non-NaN and non-infinite.

Because of floating point errors, normalization is usually imperfect. To cope, the normalization is deemed to have succeeded if isNormalized(numbers, errorTolerance) returns true.

Parameters:
numbers - the array of numbers to normalize
errorTolerance - specifies how much normalization error to tolerate
Returns:
returns numbers, to enable method call chaining
Throws:
IllegalArgumentException - if numbers == null; any element of numbers or errorTolerance is < 0, is NaN, or is infinite; every element of numbers is 0
IllegalStateException - if normalization failed, which is defined as isNormalized(numbers, errorTolerance) returns false

isNormalized

public static boolean isNormalized(double[] numbers)
                            throws IllegalArgumentException
Returns isNormalized(numbers, normalizationErrorTolerance_default).

Parameters:
numbers - the array of numbers to check for normalization
Throws:
IllegalArgumentException - if numbers == null; any element of numbers is < 0, is NaN, or is infinite; every element of numbers is 0

isNormalized

public static boolean isNormalized(double[] numbers,
                                   double errorTolerance)
                            throws IllegalArgumentException
Reports whether or not numbers is normalized within an error specified by errorTolerance. Specifically, if sumOfNumbers denotes normalizationSum(numbers), then |sumOfNumbers - 1| must be <= errorTolerance.

Parameters:
numbers - the array of numbers to check for normalization
errorTolerance - specifies how much normalization error to tolerate
Throws:
IllegalArgumentException - if numbers == null; any element of numbers or errorTolerance is < 0, is NaN, or is infinite; every element of numbers is 0

normalizationSum

public static double normalizationSum(double[] numbers)
                               throws IllegalArgumentException
Returns the sum of every element of numbers, which is also the normalization factor for numbers.

This method differs from a generic array summation method solely in that it checks that every element of numbers is a legitimate value for normalization to even make sense. Specifically, each must be >= 0, non-NaN and non-infinite.

Contract: this method guarantees to never return a result that is <= 0, NaN, or infinity.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is is NaN, infinite, or < 0; if every element of numbers == 0; if the result is an invalid value

subtractParallelComponent

public static double[] subtractParallelComponent(double[] v1,
                                                 double[] v2)
                                          throws IllegalArgumentException
Subtracts from vector v1 that component which lies parallel to vector v2. Upon return from this method, v1 will be perpendicular to v2.

Returns:
returns v1, to enable method call chaining
Throws:
IllegalArgumentException - if either of v1 == null; v2 == null; v1.length != v2.length; v1.length = v2.length = 0; any element of either v1 or v2 is NaN or infinity; every element of v2 is 0
See Also:
"Brent's analytical work for development of the formulas"

linearLeastSquaresFit

public static double[] linearLeastSquaresFit(double[] xValues,
                                             double[] yValues)
                                      throws IllegalArgumentException
Given a series of 2D ordered pairs stored in the xValues and yValues arrays, that is, the points (xValues[0], yValues[0]), (xValues[1], yValues[1]), ..., this method determines the coefficients a and b of the linear fit y = a + bx as well as some of the fitness measures. Specifically, the result is the double array {a, b, r2, ssr, sse} where a and b are the linear fit parameters, r2 is the correlation coefficient, ssr is the sum of squared residuals, and sse is the sum of squared errors.

Contract: this method guarantees to never return an element in the result that is NaN or infinity.

Throws:
IllegalArgumentException - if xValues or yValues == null; xValues.length or yValues.length == 0; any element of xValues or yValues is NaN or infinite; xValues.length != yValues.length
See Also:
Least Squares Fitting, Correlation Coefficient

autocorrelation

public static double[][] autocorrelation(double[] numbers)
                                  throws IllegalArgumentException,
                                         IllegalStateException
Returns the autocorrelation function (acf) of numbers along with confidence intervals (CIs) for each element.

The result contains three double arrays. Each array has the same length, which is one less than numbers' length. The first array is the acf, the second array is the CI lower bounds, the third array is the CI upper bounds. Note that the CI calculation involves the "large lag standard error", and so it assumes that numbers is a stationary Gaussian data series, among other assumptions/approximations.

The calculation first computes the autocovariance function by calling autocovariance. See that method's javadocs for warnings on the length of numbers and how many elements in the result are likely trustworthy.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN
IllegalStateException - if any element of the result is calculated to be NaN or infinity

autocovariance

public static double[] autocovariance(double[] numbers)
                               throws IllegalArgumentException,
                                      IllegalStateException
Returns the autocovariance function (acvf) of numbers.

The result has a length one less than numbers' length.

Beware of the following issues:

  1. should probably not call this method if numbers has less than 50 elements
  2. only the first 20-25% elements in the result are likely trustworthy
All these issues go unchecked by this method; it is up to the user to address.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN
IllegalStateException - if any element of the result is calculated to be NaN or infinity

min

public static double min(double[] numbers)
                  throws IllegalArgumentException
Returns the minimum element of numbers.

Contract: the result is never NaN.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN

max

public static double max(double[] numbers)
                  throws IllegalArgumentException
Returns the maximum element of numbers.

Contract: the result is never NaN.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN

minMax

public static double[] minMax(double[] numbers)
                       throws IllegalArgumentException
Returns both the minimum and maximum element of numbers. The result is always a 2 element array, with the min at index 0 and the max at index 1.

If you need both the min and the max calculated, calling this method once is more efficient than calling min and max separately because numbers is only iterated over once.

Contract: the result is never null, always has length == 2, and never has a NaN element.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN

mean

public static double mean(double[] numbers)
                   throws IllegalArgumentException
Returns the arithmetic mean of numbers.

In the terminology of statistics, if numbers is the population, then the result is the population's mean. But if numbers is merely a sample from the population, then the result is the sample mean, which is an unbiased estimate of the population's mean.

Contract: the result is never NaN, but may be infinite.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN

median

public static double median(double[] numbers)
                     throws IllegalArgumentException
Returns the median element of numbers.

The implementation here simply returns quantile(numbers, 1, 2). This technique requires numbers to be sorted by the user before calling this method.

Contract: the result is never NaN but could be infinity.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN; numbers is not sorted

quantile

public static double quantile(double[] numbers,
                              int k,
                              int q)
                       throws IllegalArgumentException
Returns the kth q-quantile of numbers.

Special case: if numbers.length == 1, then this method immediately returns numbers[0] regardless of the values of k and q.

Otherwise, the calculation uses the Weighted average technique. This technique requires numbers to be sorted by the user before calling this method.

Contract: the result is never NaN but could be infinity.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN; numbers is not sorted; k < 1; q < 2; k >= q

sd

public static double sd(double[] numbers)
                 throws IllegalArgumentException
Returns sd(numbers, true).

Use this version only when mean has not previously been calculated, since this method will internally calculate it.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN

sd

public static double sd(double[] numbers,
                        boolean biased)
                 throws IllegalArgumentException
Returns sd(numbers, mean(numbers), biased).

Use this version only when mean has not previously been calculated, since this method will internally calculate it.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; any element of numbers is NaN

sd

public static double sd(double[] numbers,
                        double mean)
                 throws IllegalArgumentException
Returns sd(numbers, mean, true).

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; mean or any element of numbers is NaN

sd

public static double sd(double[] numbers,
                        double mean,
                        boolean biased)
                 throws IllegalArgumentException
Returns the standard deviation of numbers.

In the terminology of statistics, if numbers is the population, then the result is the population's standard deviation. But if numbers is merely a sample from the population, then the result is the sample standard deviation, which is a biased estimate of the population's standard deviation.

This method simply returns the square root of variance(numbers, mean, biased). Therefore, the mean and biased parameters must have the exact meanings expected by variance. In particular, the biased parameter will control whether or not the variance estimate is biased. It does not control the bias of the standard deviation estimate returned by this method. In fact, the estimate returned by this method is always biased regardless of the value of the biased parameter. The effect of biased == true is that variance will use the "divide by N" rule, while biased == false causes variance to use the "divide by N - 1" rule.

A correction exists to get an unbiased estimator for the standard deviation if normality is assumed. This method does not implement this correction, however, for two reasons. First, that unbiased estimator is inferior to the simple "divide by N" estimator:

[the "divide by N" estimator has] uniformly smaller mean squared error than does the unbiased estimator, and is the maximum-likelihood estimate when the population is normally distributed.
Reference
Second, it is better to avoid assumptions like normality.

Summary of the above: use biased == true if want the most accurate result. Only use biased == false if there is some other requirement for it (e.g if are computing confidence intervals, the standard theory which leads to Student's t-distribution was developed using the biased == false, "divide by N - 1", estimator).

Contract: the result is always >= 0 (including positive infinity), and is never NaN.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; mean or any element of numbers is NaN
See Also:
Mathworld article on the sample standard deviation distribution

variance

public static double variance(double[] numbers)
                       throws IllegalArgumentException
Returns variance(numbers, true).

Here, the default value of true for biased is supplied because that yields the most accurate results (see variance).

Use this version only when mean has not previously been calculated, since this method will internally calculate it.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; mean is calculated to be or any element of numbers is NaN

variance

public static double variance(double[] numbers,
                              boolean biased)
                       throws IllegalArgumentException
Returns variance(numbers, mean(numbers), biased).

Use this version only when mean has not previously been calculated, since this method will internally calculate it.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; mean is calculated to be or any element of numbers is NaN

variance

public static double variance(double[] numbers,
                              double mean)
                       throws IllegalArgumentException
Returns variance(numbers, mean, true).

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; mean or any element of numbers is NaN

variance

public static double variance(double[] numbers,
                              double mean,
                              boolean biased)
                       throws IllegalArgumentException
Returns the variance of numbers.

Statistically speaking, if numbers is the population, then the result is the population's variance. But if numbers is merely a sample from the population, then the result is an estimate of the population's variance.

The mean parameter must be the arithmetic mean of numbers (i.e. what mean(numbers) returns).

Algorithms exist for computing the variance without doing an explicit calculation of the mean. For example, one can compute the sum of the squares of numbers and subtract the square of the sum of numbers, with both sums efficiently calculated in a single loop; see Algorithm 1. And there is even an algorithm that does not require storing all the elements, and so is good for streaming data; see: this article

Nevertheless, this method requires that the mean parameter be supplied because the most accurate numerical algorithms use differences from the mean (this method relies on sst; see additional algorithm notes there).

The biased parameter determines whether or not the result is a biased or unbiased extimate (assuming that numbers is a sample from a population). Specifically, this method returns sst(numbers, mean) / denominator. If biased is true, then denominator is numbers.length (i.e. the "divide by N" rule), else denominator is numbers.length - 1 (i.e. the "divide by N - 1" rule).

There are a few situations where one should use the unbiased (biased param == false) estimator. The best example is probably the calculation of confidence intervals, because the conventional theory (which leads to Student's t-distribution) was developed using this unbiased variance estimator.

In general, however, the biased estimator is more accurate:

                What is the BEST estimator for the population variance given a sampling of the population?

                The naive estimator formula is simply
                        variance = SST / n
                where SST is the sum of the squares of the differences of each sample from the estimated population mean, and n is the number of samples.  But this estimator is biased.

                The usual formula for the unbiased estimator is
                        variance = SST / (n - 1)

                Now, here is the trickiness: while unbiasedness is nice, what you really want is a low mean squared error (MSE):
                        http://en.wikipedia.org/wiki/Estimator
                In fact, the OPTIMAL estimator is one with the minimum MSE (MMSE):
                        http://en.wikipedia.org/wiki/Minimum_mean_squared_error

                Now, this article, says that the
                        SST / n
                estimator has a "lower estimation variability"

                http://en.wikipedia.org/wiki/Unbiased_estimator
                where I assume that "lower estimation variability" has the same meaning as "variance of the estimator" in that MSE reference above:
                        http://en.wikipedia.org/wiki/Estimator

                In fact, its smaller estimator variance, combined with its negative bias (due to the larger denominator, which causes it to underestime the true variance), actually causes the simple n formula to have lower MSE than the n - 1 one:
                        http://en.wikipedia.org/wiki/Mean_squared_error.htm

                That implies the n estimator is better than the n - 1 one.

                But is it the OPTIMAL one--the estimator with MMSE?

                Or is that an unknown at present in statistics?

                This article
                        http://cnx.rice.edu/content/m11267/latest/
                has a seemingly related discussion, but the Example 1 that they give seems to be irrelevant (e.g. you generally do NOT know the variance-sub-n quantity, nor the mean and variance of the theta quantities, so his result in formulas 9 or 10 is practically useless).
 

Contract: the result is always >= 0 (including positive infinity), and is never NaN.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; mean or any element of numbers is NaN
See Also:
Mathworld article on sample variance, Wikipedia article on unbiased estimators

sst

public static double sst(double[] numbers)
                  throws IllegalArgumentException
Returns sst(numbers, mean(numbers)).

Use this version only when mean has not previously been calculated, since this method will internally calculate it.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; mean is calculated to be or any element of numbers is NaN

sst

public static double sst(double[] numbers,
                         double mean)
                  throws IllegalArgumentException
Caluculates the SST (Sum of Squares, Total), that is, the sum of the squares of the differences from mean of each element of numbers.

In order to obtain the highest accuracy, this method uses a form of compensated summation (see Algorithm II (compensated)).

Contract: the result is always >= 0 (including positive infinity), and is never NaN.

Throws:
IllegalArgumentException - if numbers == null; numbers.length == 0; mean or any element of numbers is NaN
See Also:
Wikipedia article on Total sum of squares