mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Have a idea of leveraging hbase for machine learning
Date Mon, 16 Nov 2009 17:35:14 GMT

Glad to hear you are looking at Mahout.

Practically speaking, it probalby isn't feasible to have an hbase column per
matrix column.  That makes storage of matrix data in hbase somewhat less
compelling, although clearly still very useful for some applications.

As Grant pointed out, Mahout is trying to stay pretty agnostic relative to
data storage methods.  Some people need to read matrices from Lucene
indexes, others from files, still others from hbase.  We need to support all
of those options.

Your suggestion about making sure that Taste supports hbase is a good one.

On Mon, Nov 16, 2009 at 12:54 AM, Jeff Zhang <> wrote:

> Then we can store them as one hbase row:
> A: {tilte:love=>1,
> content:I=>1,content:love=>1,content:this=>1,content:game=>1}
> Using hbase, it will be very easy for us to compute the similarity between
> documents.
> And another  advantage of hbase compared to raw text data is that it's
> semi-structured. And I think it will be easy for programming if we use
> hbase
> rather than the raw data.

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message