hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: hbase architecture doubts
Date Tue, 03 May 2016 02:11:51 GMT
On Mon, May 2, 2016 at 5:34 PM, Shushant Arora <shushantarora09@gmail.com>
wrote:

> Thanks Stack.
>
> 1.So is it at any time there will be two reference 1.active memstore
> 2.snapshot memstore
> snapshot will be initialised at time of flush using active memstore with a
> momentaily lock and then active will be discarded and read will be served
> usinmg snapshot and write will go to new active memstore.
>
>
Yes


> 2key of CSLS is keyvalue . Which part of keyValue is used while sorting the
> set. Is it whole keyvalue or just row key. Does Hfile has separate entry
> for each key value and keyvalues of same row key are always stored
> contiguosly in HFile and may not be in same block?
>
>
Just the row key. Value is not considered in the sort.

Yes, HFile has separate entry for each KeyValue (or 'Cell' in hbase-speak).

Cells in HFile are sorted. Those of the same or near 'Cell' coordinates
will be sorted together and may therefore appear inside the same block.

St.Ack



> On Tue, May 3, 2016 at 12:05 AM, Stack <stack@duboce.net> wrote:
>
> > On Mon, May 2, 2016 at 10:06 AM, Shushant Arora <
> shushantarora09@gmail.com
> > >
> > wrote:
> >
> > > Thanks Stack
> > >
> > > for point 2 :
> > > I am concerned with downtime of Hbase for read and write.
> > > If write lock is just for the time while we move aside the current
> > > MemStore.
> > > Then when a write happens to key will it update the memstore only but
> > > snapshot does not have that update and when snapshot is dunmped to
> Hfile
> > > won't we loose the update?
> > >
> > >
> > >
> > No. The update is in the new currently active MemStore. The update will
> be
> > included in the next flush added to a new hfile.
> >
> > St.Ack
> >
> >
> >
> >
> >
> > > On Mon, May 2, 2016 at 9:06 PM, Stack <stack@duboce.net> wrote:
> > >
> > > > On Mon, May 2, 2016 at 1:25 AM, Shushant Arora <
> > > shushantarora09@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks!
> > > > >
> > > > > Few doubts;
> > > > >
> > > > > 1.LSM tree comprises two tree-like
> > > > > <https://en.wikipedia.org/wiki/Tree_(data_structure)> structures,
> > > called
> > > > > C0 and
> > > > > C1 and If the insertion causes the C0 component to exceed a certain
> > > size
> > > > > threshold, a contiguous segment of entries is removed from C0 and
> > > merged
> > > > > into C1 on disk
> > > > >
> > > > > But in Hbase when C0 which is memstore I guess? is exceeded the
> > > threshold
> > > > > size its dumped on to HDFS as HFIle(c1 I guess?) - and does
> > compaction
> > > is
> > > > > the process which here means as merging of C0 and C1 ?
> > > > >
> > > > >
> > > > The 'merge' in the quoted high-level description may just mean that
> the
> > > > dumped hfile is 'merged' with the others at read time. Or it may be
> as
> > > > stated, that the 'merge' happens at flush time. Some LSM tree
> > > > implementations do it this way -- Bigtable, and it calls the merge of
> > > > memstore and a file-on-disk a form of compaction -- but this is not
> > what
> > > > HBase does; it just dumps the memstore as a flushed hfile. Later,
> we'll
> > > run
> > > > a compaction process to merge hfiles in background.
> > > >
> > > >
> > > >
> > > > > 2.Moves current, active Map aside as a snapshot (while a write lock
> > is
> > > > held
> > > > > for a short period of time), and then creates a new CSLS instances.
> > > > >
> > > > > In background, the snapshot is then dumped to disk. We get an
> > Iterator
> > > on
> > > > > CSLS. We write a block at a time. When we exceed configured block
> > size,
> > > > we
> > > > > start a new one.
> > > > >
> > > > > -- Does write lock is held till the time complete CSLS is dumpled
> on
> > > > > disk.
> > > >
> > > >
> > > >
> > > > No. Just while we move aside the current MemStore.
> > > >
> > > > What is your concern/objective? Are you studying LSM trees generally
> or
> > > are
> > > > you worried that HBase is offline for periods of time for read and
> > write?
> > > >
> > > > Thanks,
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > > And read is allowed using snapshot.
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > > Thanks!
> > > > >
> > > > >
> > > > >
> > > > > On Mon, May 2, 2016 at 11:39 AM, Stack <stack@duboce.net> wrote:
> > > > >
> > > > > > On Sun, May 1, 2016 at 3:36 AM, Shushant Arora <
> > > > > shushantarora09@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > 1.Does Hbase uses ConcurrentskipListMap(CSLM) to store
data in
> > > > > memstore?
> > > > > > >
> > > > > > > Yes (We use a CSLS but this is implemented over a CSLM).
> > > > > >
> > > > > >
> > > > > > > 2.When mwmstore is flushed to HDFS- does it dump the memstore
> > > > > > > Concurrentskiplist as Hfile2? Then How does it calculates
> blocks
> > > out
> > > > of
> > > > > > > CSLM and dmp them in HDFS.
> > > > > > >
> > > > > > >
> > > > > > Moves current, active Map aside as a snapshot (while a write
lock
> > is
> > > > held
> > > > > > for a short period of time), and then creates a new CSLS
> instances.
> > > > > >
> > > > > > In background, the snapshot is then dumped to disk. We get an
> > > Iterator
> > > > on
> > > > > > CSLS. We write a block at a time. When we exceed configured
block
> > > size,
> > > > > we
> > > > > > start a new one.
> > > > > >
> > > > > >
> > > > > > > 3.After dumping the inmemory CSLM of memstore to HFILe
does
> > > memstore
> > > > > > > content is discarded
> > > > > >
> > > > > >
> > > > > > Yes
> > > > > >
> > > > > >
> > > > > >
> > > > > > > and if while dumping memstore any read request comes
> > > > > > > will it be responded by copy of memstore or discard of
memstore
> > > will
> > > > be
> > > > > > > blocked until read request is completed?
> > > > > > >
> > > > > > > We will respond using the snapshot until it has been
> successfully
> > > > > dumped.
> > > > > > Once dumped, we'll respond using the hfile.
> > > > > >
> > > > > > No blocking (other than for the short period during which the
> > > snapshot
> > > > is
> > > > > > made and the file is swapped into the read path).
> > > > > >
> > > > > >
> > > > > >
> > > > > > > 4.When a read request comes does it look in inmemory CSLM
and
> > then
> > > in
> > > > > > > HFile?
> > > > > >
> > > > > >
> > > > > > Generally, yes.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > And what is LogStructuredMerge tree and its usage in Hbase.
> > > > > > >
> > > > > > >
> > > > > > Suggest you read up on LSM Trees (
> > > > > > https://en.wikipedia.org/wiki/Log-structured_merge-tree) and
if
> > you
> > > > > still
> > > > > > can't see the LSM tree in the HBase forest, ask specific
> questions
> > > and
> > > > > > we'll help you out.
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message