madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satoshi Nagayasu <sn...@uptime.jp>
Subject Re: create_indicator_variables() with svec?
Date Tue, 09 Aug 2016 02:21:53 GMT
Hi Rahul,

2016-08-09 2:05 GMT+09:00 Rahul Iyer <riyer@pivotal.io>:
> Array output for *create_indicator_variables* would be quite helpful when
> number of categories is large and the svec representation would be ideal
> for it. There might be similar implications for *pivoting*, but we can keep
> that as future discussion.

Sounds great.

> I'm curious about how you're using the indicator variables - svec is not
> widely supported in MADlib (yet) and might not give much benefit after the
> encoding is complete.

I'm trying to implement some recommendation or similarity search stuff
for several media items (movies, books, documents, else) with its metadata.
It has several categorical variables, such as authors, publishers,
actors/actresses, genres, else. Some of them have many categories.

BTW, I'm a starter of data-mining and machine-learning, not having much
experience.

Of course, I can reduce number of those categories, but playing with raw
data would be more fun. :)

Regards,
-- 
Satoshi Nagayasu <snaga@uptime.jp>

Mime
View raw message