hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: mapreduce on two tables
Date Mon, 07 Nov 2011 14:59:28 GMT
You don't really need to store that into another HBase table, just
dump it into HDFS (unless you want to do random access on that second
table, which acts as a secondary index for documents by authors).

It's a workable solution, it's just brute force.


On Mon, Nov 7, 2011 at 11:02 AM, Rohit Kelkar <rohitkelkar@gmail.com> wrote:
> I needed some feedback about best way of implementing the following -
> In my document table I have documentid as row-id and content:author,
> content:text stored in each row. I want to process all documents
> pertaining to each author in a map reduce job. ie. my map will take
> key=author and values="all documentids sent by that sender". But for
> this first I would have to find all distinct authors and store them in
> another table. Then run map-reduce job on the second table. Am I
> thinking in the right direction or is there a better way to achieve
> this?
> - Rohit Kelkar

View raw message