HISTOGRAM
Applies to PredictiveInsight only.
Syntax
HISTOGRAM(data, bin_col)
Parameters
data
The cell range to compute the histogram of. This can be a constant value, a column, a cell range, or an expression evaluating to any of the above. All columns in data must be the same data type (that is, numeric or text string). For the format definition of data, see the "Macro Function Parameters" section in the chapter in this guide for your IBM® product.
bin_col
The values for the bin boundaries. This can be a constant value, a column, a single-column cell range, or an expression evaluating to any of the above. The data type of bin_col must be the same as data. For the format definition of data, see the "Macro Function Parameters" section in the chapter in this guide for your IBM® product.
Description
HISTOGRAM computes the histogram (that is, frequency of occurrence of data values in various bins) of the values in the specified data range. It returns a single column with the number of data values in data that fall within the corresponding bin range specified by bin_col.
For numerical values, each two adjacent rows of bin_col form a "bin". Any value in data that falls within a bin is accumulated for that bin. The output column contains the final count of the number of data values within each bin. The first boundary value is included in the bin; the second boundary value is excluded. For example, the pair of boundary values 1 and 2 will contain a count of all values in data are greater than or equal to 1 and less than 2. The length of the output column is one less than the length of bin_col.
For text strings, only exact matches of the text string in bin_col is counted in that bin. The length of the output column is the length of bin_col. For numerical data, if bin_col is scalar (that is, contains a single cell value), than the number of items in data is counted.
*
The HISTOGRAM macro function places data points into bins differently than IBM® PredictiveInsight 's histogram graph. The histogram graph exclude the minimum (except for the leftmost bin) and include the maximum of each bin boundary.
Examples
Creates a new column named TEMP containing the values 2 and 7.
Creates a new column named TEMP containing the values 2, 1, and 0.
Creates a new column named TEMP containing four values. The first value is the number of values in column V1 greater than or equal to 1 and less than 25. The second value is the number of values in column V1 greater than or equal to 25 and less than 50. The third and fourth values contain the counts in the third and fourth quartiles, respectively.
Creates a new column named TEMP, where each value a count of the number of values in columns V1 - V3 that fall within the bin boundaries specified by column V4.
Creates a new column named TEMP containing 10 values. Each value is the number of data values in rows 50-100 of columns V1 - V5 that fall within the bin boundaries specified by rows 1-10 of column V6.
Related Functions