mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zach Richardson <z...@raveldata.com>
Subject Re: Does mahout have nominal attributes?
Date Mon, 26 Dec 2011 22:55:07 GMT
1-of-n encoding.  That's it.

On Mon, Dec 26, 2011 at 4:36 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Mahout uses 1-of-n encoding (aka Zach's bitmap) but stores these encodings
> all together in double vectors for consistency.
>
> In the hashed encoding, we do this, but all of the encoded variables live
> on top of each other in randomized and multiple locations in the encoded
> vector.  This sounds crazy, but works quite well.
>
> On Sun, Dec 25, 2011 at 9:18 PM, Zach Richardson <zach@raveldata.com>
> wrote:
>
> > In a way yes.
> >
> > Generally you want to convert nominal attributes to a "bitmap" (this has
> a
> > fancier name that is slipping my mind at the moment).  Where each "name"
> in
> > the nominal feature has a spot in the vector for being on or off.  In
> most
> > cases this should be set to one.  I am not aware of anything like that in
> > mahout for regular vector encoding.  You could reasonably easy write your
> > own.
> >
> > For instance if you have A, B, and C as the three possible values in your
> > nominal feature, you would encode
> >
> > A B C
> > 1 0 0 for A
> > 0 1 0 for B etc.
> >
> > However, if you are planning on using the SGD classifiers you can use the
> > Hash based encoding for Categorical / Nominal features through the
> > WordValueEncoder.
> >
> > Hope this helps.
> >
> > Zach
> >
> > On Sun, Dec 25, 2011 at 10:18 PM, Donald A. Smith
> > <thinkerfeeler@yahoo.com>wrote:
> >
> > > I believe that vectorized attributes are stored as doubles in mahout.
> >  Are
> > > some
> > > attributes "nominal"? That is, for some attributes is the distance
> > > function such that any two unequal values are at distance 1?
> > >
> > > Looking
> > > at MapBackedARFFModel.java, I see that weka nominal attributes get
> > > converted to integer-valued doubles (1.0, 2.0, 3.0, ...).   Will the
> > > nominal with value 1.0 be closer to the nominal with value 2.0 than to
> > > the nominal with value 3.0?  Or is the distance between 1.0 and 3.0
> also
> > 1?
> > >
> > >
> > >
> > >  Thanks, Don
> >
> >
> >
> >
> > --
> > Zach Richardson
> > Ravel, Co-founder
> > Austin, TX
> > zach@raveldata.com
> > 512.825.6031
> >
>



-- 
Zach Richardson
Ravel, Co-founder
Austin, TX
zach@raveldata.com
512.825.6031

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message