April 7, 2003

Observation-based logistic regression

After some discussion with Pat, I decided to attempt to run logisitic regression using downsampled observations as the input variables. For each channel there are 61 points, so in total, we have 61x60=3660, far too many points! After a failed attempted at loading the file into SAS, I chose one channel from S6A, C2P (the best monopolar channel using least square).

Data. The dataset consists of S6's responses to seven words presented aurally. As mentioned earlier, channel 37, C2P, was selected via earlier results using our traditional method. EEG recordings using Neuroscan were recorded in continous mode (.cnt) was filtered at 25 Hz and downsampled to 50 Hz. For each trial, the start point was -200 ms from the stimulus trigger, and the end point was 1000 ms from the same trigger. Data were then converted into a comma-delimited format and imported into SAS.

Procedure. I used the CATMOD procedure in SAS. Among other things, this procedure allows one to run nominal logistic regression. I used the following script:


PROC CATMOD DATA=classify.s6a_c37;
RESPONSE logits;
DIRECT var1-var61;
MODEL var62=var1-var61 / noprofile;
RUN;

There are several important items to note. CATMOD by default categorizes the variables. In this case, our variables are continous, so we must use the command DIRECT. There are a total of 61 observations per trial, var1-var61. The final column, var62, is the class of the trial (1-7).

Results and analysis. The results are difficult to interpret! The format leaves room for improvement... the table to start out occurs halfway through the file and is entitled "Maximum Likelihood Analysis of Variance". The trouble with a multinomial analysis is deciding what variables to throw out. Here's what I did. I looked at the analysis of variance and removed from my model variables which were considered "redundant" by the system. I then re-ran CATMOD. I kept doing this until there were no more "redundant" variables (there is an '*' by in the df column). Then, I took out variables which had p=values which were much larger than the smallest (>0.025). After five iterations, I was left with this: var29, var37, var44. The confusing news is that the likelihood ratio is 1.0. What does this mean? I think it means we have a perfect fit - which may have occurred just because I threw out all the other observations that didn't help. The way to evaluate this is to somehow test it on test data - OR - run the model-making algorithm on two sets of data and see how the results compare. If we end up with the same observation numbers, that is awesome news. Most likely they will be completely different.

Other stuff. Oh, I figured out how to output to HTML using ODS. I regenerated the 'fifth cut' nesting the MODEL command:

ods html body='WWW/tuning/saslogs/test.html';
MODEL var62=var29 var37 var44 / noprofile;
title2 'Fifth cut';
run;
ods html close;

Posted by torque at April 7, 2003 9:50 PM
Comments
Post a comment









Remember personal info?