Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5540F11065 for ; Mon, 7 Jul 2014 16:29:27 +0000 (UTC) Received: (qmail 5133 invoked by uid 500); 7 Jul 2014 16:29:25 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 5058 invoked by uid 500); 7 Jul 2014 16:29:25 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 5038 invoked by uid 99); 7 Jul 2014 16:29:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jul 2014 16:29:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adrien.mogenet@gmail.com designates 209.85.216.178 as permitted sender) Received: from [209.85.216.178] (HELO mail-qc0-f178.google.com) (209.85.216.178) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jul 2014 16:29:21 +0000 Received: by mail-qc0-f178.google.com with SMTP id c9so4039111qcz.23 for ; Mon, 07 Jul 2014 09:29:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=xIiKXc4pg/x0a0CfuN7UnHLG/3UxC0PQnntT6bmC3KQ=; b=Xf1DlbCSLGgqjNbQ0EluXTGzedkR4cvssETs64ijSzDMGcocvWetYurZmup4bo3d+g 5OpnoosQ0cyfaQ0JyjXr2pBELdP5wAG4n3KbX0njji5UkI9QSodYVybcHDZRV++rl5Tb EHWhGMpGW/qa+KoZmddMGzu8iOu+rFXsmKyjx96ixNAXRvHZl7dHaPOtDC0bfeQQLQx9 A4GDZQIVWlX0y7aJSzOpHeTQIqQwSx1XHpCwIJJcsZROMa5/Yzkm4HoAYaLFgqrWTuG0 miBMujTWk2bZ9B+dlnCc7FPcLqMc2SKVBA4vk6J4aPsJ0t/R/r89BTy8kQ/e50UE7Otj TP4A== X-Received: by 10.140.41.202 with SMTP id z68mr46805552qgz.37.1404750540241; Mon, 07 Jul 2014 09:29:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.81.139 with HTTP; Mon, 7 Jul 2014 09:28:20 -0700 (PDT) In-Reply-To: References: <1404563024.63511.YahooMailNeo@web140606.mail.bf1.yahoo.com> From: Adrien Mogenet Date: Mon, 7 Jul 2014 18:28:20 +0200 Message-ID: Subject: Re: How Hbase achieves efficient random access? To: user Cc: lars hofhansl Content-Type: multipart/alternative; boundary=001a11c11a32f3020704fd9cf855 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c11a32f3020704fd9cf855 Content-Type: text/plain; charset=UTF-8 btw, another worth reading article about block caches: http://www.n10k.com/blog/blockcache-101/ On Mon, Jul 7, 2014 at 8:26 AM, Vladimir Rodionov wrote: > >> > >>Another issue is that we cache only blocks. So for workloads with random > reads where the working set of blocks does not fit into the aggregate block > cache HBase would need to load an entire block for each KV it wants to > read. For those >>workloads we might want to consider a KV cache. (See also > Vladimirs BigBase - https://github.com/VladRodionov/bigbase). > >> > > Yes, the upcoming first release of BigBase (later this month) will have > support for SSD cache in row (KV) cache and block cache. You will be able > to use efficiently both : > all server's RAM and available SSD disks (especially useful for those who > run HBase on AWS EC2: all new instances come, by default, with local SSD > disks.) > > Best regards, > Vladimir Rodionov > > http://www.bigbase.org > ________________________________________ > From: lars hofhansl [larsh@apache.org] > Sent: Saturday, July 05, 2014 5:23 AM > To: user@hbase.apache.org > Subject: Re: How Hbase achieves efficient random access? > > What Ted and Intea said. > > Are you asking out of interest or do you see performance issues? > > One "issue" is that the KeyValues (KVs) in the blocks is not indexed. KVs > are variable length and hence once a block is loaded it needs to be > searched linearly in order to find the KV (or determine its absence). > It's on my list of things to investigate noting the start offsets of all > KVs somewhere and hence allow a binary search the KVs. > > Since blocks are small (64k by default) it might not make a difference, > but we should check. > > Another issue is that we cache only blocks. So for workloads with random > reads where the working set of blocks does not fit into the aggregate block > cache HBase would need to load an entire block for each KV it wants to > read. For those workloads we might want to consider a KV cache. (See also > Vladimirs BigBase - https://github.com/VladRodionov/bigbase). > > > -- Lars > > > > ________________________________ > From: Ted Yu > To: "user@hbase.apache.org" > Sent: Friday, July 4, 2014 7:39 AM > Subject: Re: How Hbase achieves efficient random access? > > > For description of HFile v2, see http://hbase.apache.org/book.html#hfilev2 > > For block cache, see http://hbase.apache.org/book.html#block.cache > > In "HBase In Action", starting page 28, there is description for read path. > > Cheers > > > > On Fri, Jul 4, 2014 at 2:02 AM, Intae Kim wrote: > > > Except memstore, blockcache, hfile count etc.. > > > > Simply stated, data are sorted in file called HFile (composed of blocks) > > when client try to access data, hbase search proper block in file and > load > > block to check if the block has the data. > > > > See HFile Format in more details, (meta index, data index ...) > > > > Good Luck!! > > > > > > 2014-07-04 17:30 GMT+09:00 Ted Yu : > > > > > Please take a look at http://hbase.apache.org/book/perf.reading.html > > > > > > Cheers > > > > > > On Jul 4, 2014, at 12:22 AM, yl wu wrote: > > > > > > > Hi All, > > > > > > > > HBase has sorted and indexed Hfile format, which enables fast lookup. > > > > I am wondering is there any other feature help Hbase achieve > efficient > > > > random access? > > > > I want to know the whole story, but I can't find any article talks > > about > > > > random access in HBase in high level. > > > > > > > > Can anyone help me resolve my confusion in this? > > > > > > > > Best, > > > > Yanglin > > > > > > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or Notifications@carrieriq.com and > delete or destroy any copy of this message and its attachments. > -- Adrien Mogenet http://www.borntosegfault.com --001a11c11a32f3020704fd9cf855--