hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: HFileInputFormat for MapReduce
Date Thu, 09 Feb 2012 23:00:57 GMT
Hey Stack,

We see the difference between a scan and TextFileInputFormat of the
same data as csv being 10x slower.  This is what prompted me to look
at MR using an HFIF just out of curiosity.


On Thu, Feb 9, 2012 at 7:32 PM, Stack <stack@duboce.net> wrote:
> On Thu, Feb 9, 2012 at 12:55 AM, Tim Robertson
> <timrobertson100@gmail.com> wrote:
>> From the limitations you mention, 1) and 2) we can live with, but 3)
>> could be why my quick tests are already giving incorrect record
>> counts.  That sounds like a show stopper straight away right?
> So Tim, you are going against the hfiles directly and not via the
> HBase API?  If so, you'll need to do a merge read of the multiple
> hfiles like hbase does (as per Amandeep).  You need this facility?
>> One option for us would be HBase for the primary store for random
>> access, and periodic (e.g. 12 hourly) exports to HDFS for all the full
>> scanning.  Would you consider that sane?
> You are not getting good scan performance from hbase Tim?
> St.Ack

View raw message