madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Iyer <ri...@pivotal.io>
Subject Re: create_indicator_variables() with svec?
Date Tue, 09 Aug 2016 16:53:20 GMT
Thanks, Satoshi.

Created feature request (MADLIB-1013
<https://issues.apache.org/jira/browse/MADLIB-1013>), this will probably go
into 1.9.2 since we've already started the release process for 1.9.1.

On Mon, Aug 8, 2016 at 7:21 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote:

> Hi Rahul,
>
> 2016-08-09 2:05 GMT+09:00 Rahul Iyer <riyer@pivotal.io>:
> > Array output for *create_indicator_variables* would be quite helpful when
> > number of categories is large and the svec representation would be ideal
> > for it. There might be similar implications for *pivoting*, but we can
> keep
> > that as future discussion.
>
> Sounds great.
>
> > I'm curious about how you're using the indicator variables - svec is not
> > widely supported in MADlib (yet) and might not give much benefit after
> the
> > encoding is complete.
>
> I'm trying to implement some recommendation or similarity search stuff
> for several media items (movies, books, documents, else) with its metadata.
> It has several categorical variables, such as authors, publishers,
> actors/actresses, genres, else. Some of them have many categories.
>
> BTW, I'm a starter of data-mining and machine-learning, not having much
> experience.
>
> Of course, I can reduce number of those categories, but playing with raw
> data would be more fun. :)
>
> Regards,
> --
> Satoshi Nagayasu <snaga@uptime.jp>
>



-- 

---------------------------------------------------------
Rahul Iyer
Principal software engineer | Predictive Analytics

*Pivotal**A new platform for a new era*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message