hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From manobal <mano...@gmail.com>
Subject Re: TableMapReduceUtil.initTableMapperJob takes only 1 scan object
Date Fri, 28 Jan 2011 17:36:57 GMT

1 table will contain all the event types.. key is eventId (type of event) +
timestamp and value would be visitorId.. we want to find out all the
visitorId that has seen 4-5 specific event types in sequence.. 

As far as I understand the scanner object takes starting key and ending
key.. if that is the case we will end up scanning lot more data then
required because the table contains data for all the event types..

I hope it make sense.


stack-3 wrote:
> Whats your schema like?  Whats the key schema like?  Are these events
> all in the one table?  Are they co-located?  If so, what would be
> wrong with one scanner only fetching the 4 or 5 event types?
> St.Ack
> P.S. Have you seen http://opentsdb.net/?  You might check it out
> either for inspiration, or for use, because it seems to be doing close
> to what you want to do with HBase.
> On Thu, Jan 27, 2011 at 12:01 PM, manobal <manobal@gmail.com> wrote:
>> I am just trying to evaluate HBase for some of data analysis stuff we are
>> doing.
>> HBase would contain our event data. Key would be eventId + time. We want
>> to
>> run analysis on few events types (4-5) between a date range. Total number
>> of
>> event type is around 1000.
>> The problem with running mapreduce job on the hbase table is that
>> initTableMapperJob (see below) takes only 1 scan object. For performance
>> reason we want to scan the data for only 4-5 events in a give date range
>> and
>> not the 1000 events. If we use the method below then I guess we don't
>> have
>> that choice.
>> public static void initTableMapperJob(String table,
>>                                      Scan scan,
>>                                      Class<? extends TableMapper>
>>                                      Class<? extends
>> org.apache.hadoop.io.WritableComparable> outputKeyClass,
>>                                      Class<? extends
>> org.apache.hadoop.io.Writable> outputValueClass,
>>                                      org.apache.hadoop.mapreduce.Job
>>                               throws IOException
>> Is it possible to run mapreduce on a list of scan objects? any
>> workaround?
>> Thanks
>> --
>> View this message in context:
>> http://old.nabble.com/TableMapReduceUtil.initTableMapperJob-takes-only-1-scan-object-tp30778208p30778208.html
>> Sent from the HBase User mailing list archive at Nabble.com.

View this message in context: http://old.nabble.com/TableMapReduceUtil.initTableMapperJob-takes-only-1-scan-object-tp30778208p30789130.html
Sent from the HBase User mailing list archive at Nabble.com.

View raw message