Groups | Blog | Home
all groups > sql server data mining > january 2006 >

sql server data mining : data sampling for a training data set


anonymous_user NO[at]SPAM sqlserverdatamining.com
1/25/2006 6:40:12 PM
jamiemac NO[at]SPAM online.microsoft.com
1/26/2006 3:00:06 AM
I would build a more balanced training set and you have to realize that your resultant probabilities for churn will be artifically higher.

To validate, use a testing set with the original distribution of data to get an accurate portrayal of the performance of the model.

You could also try both and compare the results.
[quoted text, click to view]
AddThis Social Bookmark Button