hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: MapReduce HBASE examples
Date Tue, 06 Jul 2010 17:11:31 GMT
That won't be very efficient either... are you trying to do this for a real
time user request. If so, it really isn't the way you want to go.

If you are in a batch processing situation, I'd say it depends on how many
rows you have VS how many you need to retrieve eg scanning 2B rows only to
find 10 rows really doesn't make sense. How do you determine which users you
need to process? How big is your dataset? I understand that you wish to use
the MR-provided functionalities of grouping and such, but simply issuing a
bunch of Gets in parallel may just be easier to write and maintain.


On Tue, Jul 6, 2010 at 10:02 AM, Kilbride, James P. <
James.Kilbride@gd-ais.com> wrote:

> So, if that's the case, and you argument makes sense understanding how scan
> versus get works, I'd have to write a custom InputFormat class that looks
> like the TableInputFormat class, but uses a get(or series of gets) rather
> than a scan object as the current table mapper does?
> James Kilbride
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of
> Jean-Daniel Cryans
> Sent: Tuesday, July 06, 2010 12:53 PM
> To: general@hadoop.apache.org
> Subject: Re: MapReduce HBASE examples
> >
> >
> > Does this make any sense?
> >
> >
> Not in a MapReduce context, what you want to do is a LIKE with a bunch of
> values right? Since a mapper will always read all the input that it's given
> (minus some filters like you can do with HBase), whatever you do will
> always
> end up being a full table scan. You "could" solve your problem by
> configuring your Scan object with a RowFilter that knows about the names
> you
> are looking for, but that still ends up being a full scan on the region
> server side so it will be slow and will generate a lot of IO.
> WRT examples, HBase ships with a couple of utility classes that can also be
> used as examples. The Export class has the Scan configuration stuff:
> http://github.com/apache/hbase/blob/0.20/src/java/org/apache/hadoop/hbase/mapreduce/Export.java
> J-D

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message