hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@fb.com>
Subject RE: Results from a Map/Reduce
Date Fri, 17 Dec 2010 19:12:37 GMT
Hey Peter,

That System.exit line is nothing important, just the main thread waiting for the tasks to
finish before closing.

You're interested in having the MR job return a single result?  To do that, you would need
to roll-up the processing done in each of your Map tasks into a single Reduce task.  With
one reducer, you can have a single point to do the final aggregation of the result.

I'm not sure exactly what kind of aggregation you are doing but funneling into a single reducer
can range from no problem to don't even try it.  Sounds like you just want a final number
or something so shouldn't be an issue.

You might also consider doing your aggregations with coprocessors if you're into experimenting
on HBase Trunk :)

As for FirstKeyOnlyFilter:

 * A filter that will only return the first KV from each row.
 * <p>
 * This filter can be used to more efficiently perform row count operations.

That's what it does.  If you scan a table, regardless of what you ask for in the query, the
filter will just return whatever the first KeyValue is on each row and will skip every other
column/version/value of that row except the first.

Like it says, it's generally useful for doing row counting but that's about it.


> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Friday, December 17, 2010 10:56 AM
> To: user@hbase.apache.org
> Subject: Results from a Map/Reduce
> Hi, dumb question again.
>   I have been using a Scan to return a result back to my client which works
> fine except when I am returning a million rows just to aggregate the results.
> The next logical step would be to do the aggregation in a Map/Reduce. I've
> been looking at what samples I could find and they see to all do this...
>     System.exit(job.waitForCompletion(true) ? 0 : 1);
> My question, is there a way to return a result from the job in a similar way of
> getting a ResultScanner back in iterating through the results?
> Also, is there a good definition of what a 'FirstKeyOnlyFilter' does?
> Thanks
> -Pete

View raw message