Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7CA50200BA7 for ; Fri, 21 Oct 2016 22:44:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7AFCB160AE8; Fri, 21 Oct 2016 20:44:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 98A5F160ADE for ; Fri, 21 Oct 2016 22:44:07 +0200 (CEST) Received: (qmail 29284 invoked by uid 500); 21 Oct 2016 20:44:06 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 29272 invoked by uid 99); 21 Oct 2016 20:44:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Oct 2016 20:44:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 971391A02BC for ; Fri, 21 Oct 2016 20:44:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.398 X-Spam-Level: ** X-Spam-Status: No, score=2.398 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id xG6IDkO0ysRI for ; Fri, 21 Oct 2016 20:44:01 +0000 (UTC) Received: from mail-vk0-f44.google.com (mail-vk0-f44.google.com [209.85.213.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 9D05D5F251 for ; Fri, 21 Oct 2016 20:44:01 +0000 (UTC) Received: by mail-vk0-f44.google.com with SMTP id b186so128068664vkb.1 for ; Fri, 21 Oct 2016 13:44:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=SJvxvDTjI1Rp5edtqVziPLq+gTdHibvoFGRMFi/VXmA=; b=s8l34BOAgsGvpLt+cwMph4xObVdiViOlWdrn1EOYgaDJgpbMPhip+r19QMZ6DmNFiv TvlgKgAHLEtwXt866LT/JPsWbUoP/Ycy+DfB7Bk+03PmlNe3Xa8FcHUE9+l7n9eRvB8v ySdRKVW6B0vUr8vNjICaW5ksQHtMaRe+mZCd1tja6bGPBQg1EcQevQd6XQCjin0KbQ3T 0qNS21Zp3A02r7lNmPBGLVL/Oe+0TeB5ZQE0ZlUHl1uWvNrp4GLyB8ZXXZehyStrrLsC aGBhqF4F/IrI25KgyKVIpr+QqA4Ibfz2MLSdEzC5ami2aO0XxA9+j0SJ7aKCdwAhJMl2 u1/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=SJvxvDTjI1Rp5edtqVziPLq+gTdHibvoFGRMFi/VXmA=; b=IR/B/leYsD+3BNWn3ZH4ACeFchw+yNUSyvFOK8P8SqFPIkzeGXyoMXFQflPMDM/cyl ht6P7DgUVawe4BmilNjRw+cbR4s0gKtPqyQ1bnACwA3DafppPnMc1TQ4o/HA78F8TNOo g+6aiKsQ/5D3WmQJHzSUYL+Nv9/VjxzZ5LttlnOgoz43ILEVB1WxuCrUTtLbdkuINHae rBxEQYguS3jHMCBUkwQbKWAdwSJXcl627U1yqu2xHwSArqXGe3ipyLfh2KJaLR0Y51Hs MXalhcvq0FFpvmswlCAFgemSKkj1NRgJDiGCLkpIgH/O9hs9mgLixYWLr5wYpZCYQvjG 1Aaw== X-Gm-Message-State: ABUngvelA5DBCs1v4MMu5FCQaPn3v02h8D548BbmaLRlfUq8Hde+0hpkRfJ+Dm1xVB0oVO81wKQ1vpQml5sIMA== X-Received: by 10.31.89.197 with SMTP id n188mr2236992vkb.47.1477082631057; Fri, 21 Oct 2016 13:43:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.31.188.198 with HTTP; Fri, 21 Oct 2016 13:43:50 -0700 (PDT) In-Reply-To: References: <4E6AF854-15AA-4CC0-A7E5-3A47071B3C42@gmail.com> From: Mich Talebzadeh Date: Fri, 21 Oct 2016 21:43:50 +0100 Message-ID: Subject: Re: Hbase fast access To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a114e19b0873b7c053f66198f archived-at: Fri, 21 Oct 2016 20:44:08 -0000 --001a114e19b0873b7c053f66198f Content-Type: text/plain; charset=UTF-8 thanks having read the docs it appears to me that the main reason of hbase being faster is: 1. it behaves like an rdbms like oracle tetc. reads are looked for in the buffer cache for consistent reads and if not found then store files on disks are searched. Does this mean that this search is carried out through map-reduce on region servers? 2. when the data is written it is written to log file sequentially first, then to in-memory store, sorted like b-tree of rdbms and then flushed to disk. this is exactly what checkpoint in an rdbms does 3. one can point out that hbase is faster because log structured merge tree (LSM-trees) has less depth than a B-tree in rdbms. 4. all updates are done in memory o disk access 5. in summary LSM-trees reduce disk access when data is read from disk because of reduced seek time again less depth to get data with LSM-tree appreciate any comments cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 21 October 2016 at 17:51, Ted Yu wrote: > See some prior blog: > > http://www.cyanny.com/2014/03/13/hbase-architecture- > analysis-part1-logical-architecture/ > > w.r.t. compaction in Hive, it is used to compact deltas into a base file > (in the context of transactions). Likely they're different. > > Cheers > > On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh < > mich.talebzadeh@gmail.com> > wrote: > > > Hi, > > > > Can someone in a nutshell explain *the *Hbase use of log-structured > > merge-tree (LSM-tree) as data storage architecture > > > > The idea of merging smaller files to larger files periodically to reduce > > disk seeks, is this similar concept to compaction in HDFS or Hive? > > > > Thanks > > > > > > Dr Mich Talebzadeh > > > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > OABUrV8Pw>* > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > > loss, damage or destruction of data or any other property which may arise > > from relying on this email's technical content is explicitly disclaimed. > > The author will in no case be liable for any monetary damages arising > from > > such loss, damage or destruction. > > > > > > > > On 21 October 2016 at 15:27, Mich Talebzadeh > > wrote: > > > > > Sorry that should read Hive not Spark here > > > > > > Say compared to Spark that is basically a SQL layer relying on > different > > > engines (mr, Tez, Spark) to execute the code > > > > > > Dr Mich Talebzadeh > > > > > > > > > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > AAEAAAAWh2gBxianrbJd6zP6AcPCCd > > OABUrV8Pw>* > > > > > > > > > > > > http://talebzadehmich.wordpress.com > > > > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for > any > > > loss, damage or destruction of data or any other property which may > arise > > > from relying on this email's technical content is explicitly > disclaimed. > > > The author will in no case be liable for any monetary damages arising > > from > > > such loss, damage or destruction. > > > > > > > > > > > > On 21 October 2016 at 13:17, Ted Yu wrote: > > > > > >> Mich: > > >> Here is brief description of hbase architecture: > > >> https://hbase.apache.org/book.html#arch.overview > > >> > > >> You can also get more details from Lars George's or Nick Dimiduk's > > books. > > >> > > >> HBase doesn't support SQL directly. There is no cost based > optimization. > > >> > > >> Cheers > > >> > > >> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh < > > mich.talebzadeh@gmail.com> > > >> wrote: > > >> > > > >> > Hi, > > >> > > > >> > This is a general question. > > >> > > > >> > Is Hbase fast because Hbase uses Hash tables and provides random > > access, > > >> > and it stores the data in indexed HDFS files for faster lookups. > > >> > > > >> > Say compared to Spark that is basically a SQL layer relying on > > different > > >> > engines (mr, Tez, Spark) to execute the code (although it has Cost > > Base > > >> > Optimizer), how Hbase fares, beyond relying on these engines > > >> > > > >> > Thanks > > >> > > > >> > > > >> > Dr Mich Talebzadeh > > >> > > > >> > > > >> > > > >> > LinkedIn * https://www.linkedin.com/profile/view?id= > > AAEAAAAWh2gBxianrbJ > > >> d6zP6AcPCCdOABUrV8Pw > > >> > > >> Jd6zP6AcPCCdOABUrV8Pw>* > > >> > > > >> > > > >> > > > >> > http://talebzadehmich.wordpress.com > > >> > > > >> > > > >> > *Disclaimer:* Use it at your own risk. Any and all responsibility > for > > >> any > > >> > loss, damage or destruction of data or any other property which may > > >> arise > > >> > from relying on this email's technical content is explicitly > > disclaimed. > > >> > The author will in no case be liable for any monetary damages > arising > > >> from > > >> > such loss, damage or destruction. > > >> > > > > > > > > > --001a114e19b0873b7c053f66198f--