hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <andrew.purt...@gmail.com>
Subject Re: How Hbase achieves efficient random access?
Date Mon, 07 Jul 2014 20:14:22 GMT
A bit more context. 

Initially we had Facebook go off on 0.89-FB, which had to do (as we heard from them) with
internal process considerations more than anything else. This has evolved into HydraBase.
Later, OhmData revealed another fork. Probably this was about differentiating and providing
product value. Now we have BigBase coming out soon, not 100% a fork per se but bigbase.org
positions it as one might use HBase or BigBase. 

Forks can be good things in OSS. GCC vs EGCS is one that comes to mind. All can benefit from
multiple pathways for exploring ideas and improvements and cross-pollinate. Or, like GCC vs
EGCS, a fork can become a new mainline. 

But I am curious if there might be something suboptimal going on with our community or dev
process, as opposed to technical or product reasons for pursuing independent development and
alternate codebases. 

> On Jul 7, 2014, at 12:56 PM, Andrew Purtell <andrew.purtell@gmail.com> wrote:
> 
> Out of curiosity Vladimir, did you feel like a fork of HBase was necessary because of
something about the Apache HBase project's process or community? Or was it more of a licensing
thing (noting you're not using ASL 2)?
> 
> 
> On Jul 6, 2014, at 11:26 PM, Vladimir Rodionov <vrodionov@carrieriq.com> wrote:
> 
>>>> 
>>>> Another issue is that we cache only blocks. So for workloads with random
reads where the working set of blocks does not fit into the aggregate block cache HBase would
need to load an entire block for each KV it wants to read. For those >>workloads we
might want to consider a KV cache. (See also Vladimirs BigBase - https://github.com/VladRodionov/bigbase).
>> 
>> Yes, the upcoming first release of BigBase (later this month) will have support for
SSD cache in row (KV) cache and block cache. You will be able to use efficiently both :
>> all server's RAM and available SSD disks (especially useful for those who run HBase
on AWS EC2: all new instances come, by default, with local SSD disks.)
>> 
>> Best regards,
>> Vladimir Rodionov
>> 
>> http://www.bigbase.org
>> ________________________________________
>> From: lars hofhansl [larsh@apache.org]
>> Sent: Saturday, July 05, 2014 5:23 AM
>> To: user@hbase.apache.org
>> Subject: Re: How Hbase achieves efficient random access?
>> 
>> What Ted and Intea said.
>> 
>> Are you asking out of interest or do you see performance issues?
>> 
>> One "issue" is that the KeyValues (KVs) in the blocks is not indexed. KVs are variable
length and hence once a block is loaded it needs to be searched linearly in order to find
the KV (or determine its absence).
>> It's on my list of things to investigate noting the start offsets of all KVs somewhere
and hence allow a binary search the KVs.
>> 
>> Since blocks are small (64k by default) it might not make a difference, but we should
check.
>> 
>> Another issue is that we cache only blocks. So for workloads with random reads where
the working set of blocks does not fit into the aggregate block cache HBase would need to
load an entire block for each KV it wants to read. For those workloads we might want to consider
a KV cache. (See also Vladimirs BigBase - https://github.com/VladRodionov/bigbase).
>> 
>> 
>> -- Lars
>> 
>> 
>> 
>> ________________________________
>> From: Ted Yu <yuzhihong@gmail.com>
>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>> Sent: Friday, July 4, 2014 7:39 AM
>> Subject: Re: How Hbase achieves efficient random access?
>> 
>> 
>> For description of HFile v2, see http://hbase.apache.org/book.html#hfilev2
>> 
>> For block cache, see http://hbase.apache.org/book.html#block.cache
>> 
>> In "HBase In Action", starting page 28, there is description for read path.
>> 
>> Cheers
>> 
>> 
>> 
>>> On Fri, Jul 4, 2014 at 2:02 AM, Intae Kim <inking007@gmail.com> wrote:
>>> 
>>> Except memstore, blockcache, hfile count etc..
>>> 
>>> Simply stated, data are sorted in file called HFile (composed of  blocks)
>>> when client try to access data, hbase search proper block in file and load
>>> block to check if the block has the data.
>>> 
>>> See HFile Format in more details, (meta index, data index ...)
>>> 
>>> Good Luck!!
>>> 
>>> 
>>> 2014-07-04 17:30 GMT+09:00 Ted Yu <yuzhihong@gmail.com>:
>>> 
>>>> Please take a look at http://hbase.apache.org/book/perf.reading.html
>>>> 
>>>> Cheers
>>>> 
>>>>> On Jul 4, 2014, at 12:22 AM, yl wu <wuyl6099@gmail.com> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> HBase has sorted and indexed Hfile format, which enables fast lookup.
>>>>> I am wondering is there any other feature help Hbase achieve efficient
>>>>> random access?
>>>>> I want to know the whole story, but I can't find any article talks
>>> about
>>>>> random access in HBase in high level.
>>>>> 
>>>>> Can anyone help me resolve my confusion in this?
>>>>> 
>>>>> Best,
>>>>> Yanglin
>> 
>> Confidentiality Notice:  The information contained in this message, including any
attachments hereto, may be confidential and is intended to be read only by the individual
or entity to whom this message is addressed. If the reader of this message is not the intended
recipient or an agent or designee of the intended recipient, please note that any review,
use, disclosure or distribution of this message or its attachments, in any form, is strictly
prohibited.  If you have received this message in error, please immediately notify the sender
and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its
attachments.

Mime
View raw message