Groups | Blog | Home
all groups > sql server data mining > april 2004 >

sql server data mining : NULL IGNORE clause ext.


Cesar
4/27/2004 10:08:24 AM
Thank you for your reply. =20
=20
I understand your point, however I cannot filter records=20
that have null fields because I need them.
What I need is to make a segmentation based on the fields=20
that have values only. I did a segmentation but some of=20
the clusters have a description that=20
says for example: "ATTRIBUTE_NAME=3D"Field1 Val"=20
ATTRIBUTE_VALUE=3D"missing" SUPPORT=3D"24.122610171863009"=20
PROBABILITY=3D"0.89322856501640546"
=20
I need to ignore missing values. For example, if you=20
have the following vectors:
=20
(1,2,3)
(1,7,null)
(9,8,null)
=20
(1,2,3) and (1,7,null) must be in the same cluster and=20
(9,8,null) could be in another cluster.
=20
In other words, when a record has a missing value it must=20
be "projected" across the other dimensions in order to=20
find the closest cluster.
=20
Do you know if it is possible to do this in a relational=20
mining model? Or is it better to try with an OLAP or=20
nested table mining model?
=20
Thanks,
=20
C=E9sar
=20
[quoted text, click to view]
=20
Jamie MacLennan (MS)
4/27/2004 10:34:36 AM
The problem you are encountering is whether or not we consider "missing" as
a value or "missing at random" where it doesn't make a difference.

You're suggestion to use a nested table is a good one - I believe that the
clustering algorithm was written to ignore missing values when there are
only two states. That is, that the existence of an attribute will be used
to pull it into a cluster, but the absence of an attribute is not
considered.

You don't need to create an OLAP Mining Model to accomplish this - simply
create a relational model as usual with the wizard and add nested tables in
the mining editor. You may have to play around a bit with how you specify
columns, etc., to get the desired behavior.

--

-Jamie MacLennan
SQL Server Data Mining
This posting is provided "AS IS" with no warranties, and confers no rights.
[quoted text, click to view]
Thank you for your reply.

I understand your point, however I cannot filter records
that have null fields because I need them.
What I need is to make a segmentation based on the fields
that have values only. I did a segmentation but some of
the clusters have a description that
says for example: "ATTRIBUTE_NAME="Field1 Val"
ATTRIBUTE_VALUE="missing" SUPPORT="24.122610171863009"
PROBABILITY="0.89322856501640546"

I need to ignore missing values. For example, if you
have the following vectors:

(1,2,3)
(1,7,null)
(9,8,null)

(1,2,3) and (1,7,null) must be in the same cluster and
(9,8,null) could be in another cluster.

In other words, when a record has a missing value it must
be "projected" across the other dimensions in order to
find the closest cluster.

Do you know if it is possible to do this in a relational
mining model? Or is it better to try with an OLAP or
nested table mining model?

Thanks,

César

[quoted text, click to view]


AddThis Social Bookmark Button