April 8, 2003

SAS binning

I'd like to be able to randomize and split the dataset so that I can do the so-called 10-fold cross validation. How can I do this in SAS? I found some helpful hints on Marilyn Collins' website. The first step is the command SET which allows you to build a new dataset off of an existing dataset. You can, for instance, only choose triggers values greater than 4, e.g., 5, 6, and 7, using:


DATA temp;
SET classify.s6a_c37;
IF var62>4;
RUN;

This posting by David Ward puts us even closer. Aha, but I struck gold at University of Texas at Austin Statistical Services...

Randomly selecting an approximate proportion


DATA analysis holdout;
SET alldata;
IF RANUNI(0) < = 2/3 OUTPUT analysis;
ELSE OUTPUT holdout;
RUN;

Randomly selecting an exact proportion

DATA analysis holdout;
SET alldata;
RETAIN k 67 n 100;
IF RANUNI(358798) < = k/n THEN DO;
k = k-1;
OUTPUT analysis;
END;
ELSE OUTPUT holdout;
n = n-1;
DROP k n;
RUN;

The later is what I want. The webpage is pretty cool though, it also has code for doing fancy resampling techniques such as Jackknife, Split-Sample, and Bootstrap.

Posted by torque at April 8, 2003 1:50 AM
Comments
Post a comment









Remember personal info?