CV_FOLDS
Applies to PredictiveInsight only.
Syntax
CV_FOLDS(num_folds, data [, class_data] [seed])
Parameters
num_folds
The number of folds to create for cross-validation. This value must be a positive integer greater than 1. This value must be less than 65,536 or the number of rows in data, which ever is less.
data
The input variables. This can be a column, a cell range, or an expression evaluating to either of the above. For the format definition of data, see the "Macro Function Parameters" section in the chapter in this guide for your IBM® product.
class_data
If this optional data range is provided, the CV_FOLDS macro function will create folds while maintaining even class probabilities. The contents of class_data are used as the outputs for each corresponding input pattern.
If class_data is a single column, CV_FOLDS assumes that the specified column contains values for multiple output classes (that is, each distinct value is considered a separate class). If class_data is a data range, each output column is considered a different class. (With a data range, the values in each column would be one if a pattern belongs to that class, or zero if the pattern does not belong to that class.)
For the format definition of class_data (same as data), see the "Macro Function Parameters" section in the chapter in this guide for your IBM® product.
seed
A seed value to use for the random-number generator. This must be an integer.
Description
CV_FOLDS evenly divides the input data into the specified number of folds. Each fold will contain the same number of input patterns. 2 It places each row of the input data range into a fold by returning a new column containing fold numbers ranging in value between one and num_folds.
If the optional parameter class_data is provided, the output class information is used to create cross-validation folds such that output class probabilities are maintained. That is, within each fold, the probability of each output class will be the same. 3
Examples
Creates a new column named TEMP containing a value for each row of column V1. The column TEMP will contain the values 1, 2, and 3 for the three different folds. No class probabilities are maintained. The value zero is used as the seed for the random number generator.
Creates a new column named TEMP containing a value for each row of the shortest column in V1-V15. The column TEMP will contain the values 1-100 for the 100 different folds. No class probabilities are maintained. A random seed is selected.
Creates a new column named TEMP containing a value for each row of the shortest column in V1-V10. The column TEMP will contain the values 1-50 for the 50 different folds. The column V11 contains the output classes. Each fold will have the same output class probabilities. A random seed is selected.
Creates a new column named TEMP containing a value for each row of the shortest column in V1-V10. The column TEMP will contain the values 1-10 for the 10 different folds. Each of the output columns V11-V15 represents an output class. Each fold will have the same output class probabilities. The value 96 is used as a seed for the random number generator.