sql server data mining:
Hi All,
I'm trying to create a data mining model to allow me to predict response
rates for a mail shot, based on the response to many other mailshots,
and hence to generate the next mailing list.
My question is about the underlying fact table structure - is the
following structure sensible, and if not why not?
Server: SQL 2005 and AS 2005.
Table: "Events_Fact" - includes orders and mailings
Columns:
Time_ID - The date of the event
Event_ID - Type of event, either an Order, or a mailshot sent
Customer_ID - Customer receiving mail, or placing order
Product_ID - either a real product for order lines,
or a mailshot identifier for mailings
Qty - Qty for Orders, 1 for mailshots
Value - Value for Orders, 0 for mailshots
PrevOrderDays - Number of days since the previous order
placed by the customer
PrevMailDays - Number of days since the last mailshot sent to
the customer.
Dimension tables are time, customer, product (including mailing list
codes) and have the other customer predictive attributes such as
location, gender etc.
What I would like to be able to do is to mine this data to get those
customers who respond in the shortest time, spend the most, repeat
order, etc. etc. We can then tailor the next list for maximum return. We
would also like to analyse by recency of purchase, as this is a known
predictor of response.
Is this a sensible structure? Will it support the analysis I want to do?
I think it will, but although I've done a lot of OLAP analysis over the
years, I'm not so familiar with data mining. I am pretty convinced I
need to create a bespoke cube to include mailshots and orders, otherwise
I never see people who don't respond to a mailshot in a normal orders cube.
If this is not sensible, can anyone suggest something better?
Thanks is advance,