bb.science
Class Bins

java.lang.Object
  extended by bb.science.Bins

public class Bins
extends Object

Introduction

Used to sort data into distinct bins (also known as intervals or cells). The original data values are never stored. Instead, just the number of values in each bin is maintained. The most common use is to measure the distribution of some sample, since the bins can be directly used to plot histograms.

Data

The data type is restricted to double values. Like Samples, this class places no restriction on the values except that they must be normal (non-NaN and non-infinite). This restriction means that the results returned by getBounds, getBoundsMid, and getPdf may be safely supplied to other classes (e.g. the statistical routines inside Math2).

Bins

A bin is defined as an interval (i.e. some continuous subrange) in the value space along with a count of the data values which fall inside that interval.

Intervals

This class guarantees that the set of bin intervals it generates will always cover every value pesented to this class. These intervals are always approximately equally sized. (Because doubles are used for all the calculations, floating point error may be present. Consequently, bin interval widths may slightly vary.) Every interval is distinct; it never overlaps with another interval.

This class almost always uses the following closed-open bin interval convention: each is of the form [x0, x1), that is, it includes all values x that satisfy x0 <= x < x1. So, the left boundary is included inside the interval, but the right boundary is not.

The sole exception is the rightmost (largest) interval: it is a fully closed interval of the form [x0, x1]. Reason: one way to specify bin intervals is by supplying the min and max values that should be covered by the intervals, and users will want that max value to occur inside an interval that includes more points than just that single max value. For example, suppose the user is generating histograms of percents, and they want 10 bins to cover the range [0, 100]. Then with the interval scheme just described, the 10 intervals in this case will be [0, 10), [10, 20), [20, 30), [30, 40), [40, 50), [50, 60), [60, 70), [70, 80), [80, 90), [90, 100].

This class always use a finite set of contiguous intervals. These can be written, in order, as [x0, x1), [x1, x2), [x2, x3), ..., [xN-1, xN] where N is the total number of intervals. A simplifying convention when speaking of contiguous intervals is to only list the beginning of each interval, since each bin's end is the beginning of the subsequent bin. Thus, we designate the previous set of intervals as {x0, x1, x2, ..., xN-1}. This is the format of bin interval boundary points that is returned by getBounds. Note that because the intervals are always equally sized, {x0, x1, x2, ..., xN-1} is equivalent to {x0, x0 + width, x0 + 2*width), ..., x0 + (N - 1)*width}.

Concurrency

This class is multithread safe: it is immutable (both its immediate state, as well as the deep state of its fields).

Author:
Brent Boyer

Nested Class Summary
static class Bins.Intervals
          Specifies how a given set of intervals are laid out.
static class Bins.UnitTest
          See the Overview page of the project's javadocs for a general description of this unit test class.
 
Field Summary
private  double[] bounds
           
private  long[] counts
           
private  Bins.Intervals intervals
           
 
Constructor Summary
private Bins(double[] values, Bins.Intervals intervals)
          Returns this( values, new Intervals(begin, end, numberIntervals) ).
  Bins(double[] values, double offset, double width)
          Returns this( values, Intervals.make(values, offset, width) ).
  Bins(double[] values, double begin, double end, int numberIntervals)
          Returns this( values, new Intervals(begin, end, numberIntervals) ).
  Bins(double[] values, int numberIntervals)
          Returns this( values, Intervals.make(values, numberIntervals) ).
 
Method Summary
 double[] getBounds()
          Accessor for bounds.
 double[] getBoundsMid()
          Returns the bin interval midpoints, that is, the points that are in the middle of each interval.
 long[] getCounts()
          Accessor for counts.
 long getCountTotal()
          Returns the total count, that is, the sum of all the elements of counts.
 Bins.Intervals getIntervals()
          Accessor for intervals.
 double getIntervalWidth()
          Accessor for intervals.width.
 double[] getPdf()
          Returns the probability density function (PDF) that is approximated by the bins.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

intervals

private final Bins.Intervals intervals

bounds

private final double[] bounds

counts

private final long[] counts
Constructor Detail

Bins

public Bins(double[] values,
            double offset,
            double width)
     throws IllegalArgumentException,
            IllegalStateException
Returns this( values, Intervals.make(values, offset, width) ).

Throws:
IllegalArgumentException - if values == null; values.length == 0; any element of values is NaN; offset is not normal; width is not normal and positive
IllegalStateException - if some internal problem occurs

Bins

public Bins(double[] values,
            int numberIntervals)
     throws IllegalArgumentException,
            IllegalStateException
Returns this( values, Intervals.make(values, numberIntervals) ).

Throws:
IllegalArgumentException - if values == null; values.length == 0; any element of values is NaN; numberIntervals <= 0
IllegalStateException

Bins

public Bins(double[] values,
            double begin,
            double end,
            int numberIntervals)
     throws IllegalArgumentException,
            IllegalStateException
Returns this( values, new Intervals(begin, end, numberIntervals) ).

Throws:
IllegalArgumentException - if begin is not normal; end is not normal; begin is not < end; numberIntervals <= 0
IllegalStateException

Bins

private Bins(double[] values,
             Bins.Intervals intervals)
      throws IllegalArgumentException
Returns this( values, new Intervals(begin, end, numberIntervals) ).

Throws:
IllegalArgumentException - if values or intervals is null; any element of values falls outside the range [begin, end]
Method Detail

toString

public String toString()
Overrides:
toString in class Object

getIntervals

public Bins.Intervals getIntervals()
Accessor for intervals.


getBounds

public double[] getBounds()
Accessor for bounds.

Warning: the field is directly returned, not a copy, so mutating the result invalidates this instance. So, only mutate the result if this instance will no longer be used.


getCounts

public long[] getCounts()
Accessor for counts.

Warning: the field is directly returned, not a copy, so mutating the result invalidates this instance. So, only mutate the result if this instance will no longer be used.


getIntervalWidth

public double getIntervalWidth()
Accessor for intervals.width.


getBoundsMid

public double[] getBoundsMid()
Returns the bin interval midpoints, that is, the points that are in the middle of each interval.

This method is useful if the user wishes to characterize each bin interval by its middle point instead of left boundary.


getCountTotal

public long getCountTotal()
Returns the total count, that is, the sum of all the elements of counts.


getPdf

public double[] getPdf()
Returns the probability density function (PDF) that is approximated by the bins. For each bin, this is the probability that a sample falls in the bin, divided by the bin's width. In code: (count / n) / width. Here, n is the total number of stored values (i.e. getCountTotal) (so count / n converts the bin count to a probabilty) and width is the bin interval's width (so / width converts the probability into a probability density).

One reason why the PDF is useful is that, when sufficiently many values have been added the bins, the PDF curve becomes approximately independent both of the number of stored values and of the bin width, assuming that the values are drawn from a stable distribution.