Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: <CALte62x_e7-mWJL5WXuC67f597cz_+6U+VPfDvbAa1gxtMK9bA@mail.gmail.com>
References: <CAJ3fcbCytkXyRiqCn8PPxchVjyX75LAWZWWfh8uBicq1Q+a27w@mail.gmail.com>
 <4E6AF854-15AA-4CC0-A7E5-3A47071B3C42@gmail.com> <CAJ3fcbBYCi18Mr0eQLkSaAGgfTsUWszpp-wgwbTtsBWUw+ELXQ@mail.gmail.com>
 <CAJ3fcbBLC3n+nyf2GbGysjUmVdaPnCd=pxLPt0nB_bHPzWkyyA@mail.gmail.com> <CALte62x_e7-mWJL5WXuC67f597cz_+6U+VPfDvbAa1gxtMK9bA@mail.gmail.com>
From: Mich Talebzadeh <mich.talebzadeh@gmail.com>
Date: Fri, 21 Oct 2016 21:43:50 +0100
Message-ID: <CAJ3fcbAgmjjbd7wfLCmb-YQ2+h9NJNefUR=Y0YtnfPry=9siJw@mail.gmail.com>
Subject: Re: Hbase fast access
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=001a114e19b0873b7c053f66198f
archived-at: Fri, 21 Oct 2016 20:44:08 -0000

--001a114e19b0873b7c053f66198f
Content-Type: text/plain; charset=UTF-8

thanks

having read the docs it appears to me that the main reason of hbase being
faster is:


   1. it behaves like an rdbms like oracle tetc. reads are looked for in
   the buffer cache for consistent reads and if not found then store files on
   disks are searched. Does this mean that this search is carried out through
   map-reduce on region servers?
   2. when the data is written it is written to log file sequentially
   first, then to in-memory store, sorted like b-tree of rdbms and then
   flushed to disk. this is exactly what checkpoint in an rdbms does
   3. one can point out that hbase is faster because log structured merge
   tree (LSM-trees)  has less depth than a B-tree in rdbms.
   4. all updates are done in memory o disk access
   5. in summary LSM-trees reduce disk access when data is read from disk
   because of reduced seek time again less depth to get data with LSM-tree


appreciate any comments


cheers


Dr Mich Talebzadeh


LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*


http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


On 21 October 2016 at 17:51, Ted Yu <yuzhihong@gmail.com> wrote:

> See some prior blog:
>
> http://www.cyanny.com/2014/03/13/hbase-architecture-
> analysis-part1-logical-architecture/
>
> w.r.t. compaction in Hive, it is used to compact deltas into a base file
> (in the context of transactions).  Likely they're different.
>
> Cheers
>
> On Fri, Oct 21, 2016 at 9:08 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com>
> wrote:
>
> > Hi,
> >
> > Can someone in a nutshell explain *the *Hbase use of log-structured
> > merge-tree (LSM-tree) as data storage architecture
> >
> > The idea of merging smaller files to larger files periodically to reduce
> > disk seeks,  is this similar concept to compaction in HDFS or Hive?
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> > On 21 October 2016 at 15:27, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> > wrote:
> >
> > > Sorry that should read Hive not Spark here
> > >
> > > Say compared to Spark that is basically a SQL layer relying on
> different
> > > engines (mr, Tez, Spark) to execute the code
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > <https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw>*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> > from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> > > On 21 October 2016 at 13:17, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > >> Mich:
> > >> Here is brief description of hbase architecture:
> > >> https://hbase.apache.org/book.html#arch.overview
> > >>
> > >> You can also get more details from Lars George's or Nick Dimiduk's
> > books.
> > >>
> > >> HBase doesn't support SQL directly. There is no cost based
> optimization.
> > >>
> > >> Cheers
> > >>
> > >> > On Oct 21, 2016, at 1:43 AM, Mich Talebzadeh <
> > mich.talebzadeh@gmail.com>
> > >> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > This is a general question.
> > >> >
> > >> > Is Hbase fast because Hbase uses Hash tables and provides random
> > access,
> > >> > and it stores the data in indexed HDFS files for faster lookups.
> > >> >
> > >> > Say compared to Spark that is basically a SQL layer relying on
> > different
> > >> > engines (mr, Tez, Spark) to execute the code (although it has Cost
> > Base
> > >> > Optimizer), how Hbase fares, beyond relying on these engines
> > >> >
> > >> > Thanks
> > >> >
> > >> >
> > >> > Dr Mich Talebzadeh
> > >> >
> > >> >
> > >> >
> > >> > LinkedIn * https://www.linkedin.com/profile/view?id=
> > AAEAAAAWh2gBxianrbJ
> > >> d6zP6AcPCCdOABUrV8Pw
> > >> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrb
> > >> Jd6zP6AcPCCdOABUrV8Pw>*
> > >> >
> > >> >
> > >> >
> > >> > http://talebzadehmich.wordpress.com
> > >> >
> > >> >
> > >> > *Disclaimer:* Use it at your own risk. Any and all responsibility
> for
> > >> any
> > >> > loss, damage or destruction of data or any other property which may
> > >> arise
> > >> > from relying on this email's technical content is explicitly
> > disclaimed.
> > >> > The author will in no case be liable for any monetary damages
> arising
> > >> from
> > >> > such loss, damage or destruction.
> > >>
> > >
> > >
> >
>

--001a114e19b0873b7c053f66198f--