hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gordoslocos <gordoslo...@gmail.com>
Subject Re: HBase vs. HDFS
Date Tue, 02 Oct 2012 12:46:16 GMT
Thank you all! Setting a cache size helped a great deal. It's still slower though.

I think it might be possible that the overhead of processing the data from the table might
be the cause.

I guess if HBase adds an indirection to the HDFS then it makes sense that it'd be slower,
right?

On 02/10/2012, at 09:28, Doug Meil <doug.meil@explorysmedical.com> wrote:

> 
> Hi there, 
> 
> Another thing to consider on top of the scan-caching is that that HBase is
> doing more in the process of scanning the table.  See...
> 
> http://hbase.apache.org/book.html#conceptual.view
> 
> http://hbase.apache.org/book.html#regions.arch
> 
> 
> ... Specifically, processing the KeyValues, potentially merging rows between
> StoreFiles, checking for un-flushed updates in the MemStore per CF.
> 
> 
> 
> On 10/1/12 8:54 PM, "Doug Meil" <doug.meil@explorysmedical.com> wrote:
> 
>> 
>> Hi there-
>> 
>> Might want to start with thisŠ
>> 
>> http://hbase.apache.org/book.html#perf.reading
>> 
>> Š if you're using default scan caching (which is 1) that would explain a
>> lot.
>> 
>> 
>> 
>> 
>> On 10/1/12 7:01 PM, "Juan P." <gordoslocos@gmail.com> wrote:
>> 
>>> Hi guys,
>>> I'm trying to get familiarized with HBase and one thing I noticed is that
>>> reads seem to very slow. I just tried doing a "scan 'my_table'" to get
>>> 120K
>>> records and it took about 50 seconds to print it all out.
>>> 
>>> In contrast "hadoop fs -cat my_file.csv" where my_file.csv has 120K lines
>>> completed in under a second.
>>> 
>>> Is that possible? Am I missing something about HBase reads?
>>> 
>>> Thanks,
>>> Joni
> 
> 

Mime
View raw message