hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shushant Arora <shushantaror...@gmail.com>
Subject Re: hbase architecture doubts
Date Mon, 09 May 2016 11:27:28 GMT
Thanks!

1.Will write take lock on all the column families or just the column family
being affected by write?

2.How does eviction in LRUBlockcache is implemeted for InMemory or
multiaccess priority. Say all elements of InMemory priority area(25%) are
recently used than single and multiaccess area. Now if a new inmemory row
comes will it evict from inmemory or single access area ?

3.Why block cache is single per regionserver. Why not single per region.


On Sun, May 8, 2016 at 11:43 PM, Stack <stack@duboce.net> wrote:

> On Sun, May 8, 2016 at 6:12 AM, Shushant Arora <shushantarora09@gmail.com>
> wrote:
>
> > Thanks !
> >
> > One doubt regarding locking in memtore :
> >
> > Hbase use implicit row lock while applying put operation on a row.
> >
> > put(byte[] rowkey).
> >
> > when htable.put(p) is fired , regionserver will lock the row but all get
> > operations will not lock the row and return the row state which was at
> > state previous to put took lock.
> >
> > Memstore is implemented as CSLM so how does it return the row state
> > previous to put lock when get is fired before put is finished?
> >
> >
> Multiversion Concurrency Control. This is the core class:
>
> http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/MultiVersionConcurrencyControl.html
> See how it is used in the codebase.
>
> Ask more questions if not clear.
> St.Ack
>
>
>
> > On Tue, May 3, 2016 at 7:41 AM, Stack <stack@duboce.net> wrote:
> >
> > > On Mon, May 2, 2016 at 5:34 PM, Shushant Arora <
> > shushantarora09@gmail.com>
> > > wrote:
> > >
> > > > Thanks Stack.
> > > >
> > > > 1.So is it at any time there will be two reference 1.active memstore
> > > > 2.snapshot memstore
> > > > snapshot will be initialised at time of flush using active memstore
> > with
> > > a
> > > > momentaily lock and then active will be discarded and read will be
> > served
> > > > usinmg snapshot and write will go to new active memstore.
> > > >
> > > >
> > > Yes
> > >
> > >
> > > > 2key of CSLS is keyvalue . Which part of keyValue is used while
> sorting
> > > the
> > > > set. Is it whole keyvalue or just row key. Does Hfile has separate
> > entry
> > > > for each key value and keyvalues of same row key are always stored
> > > > contiguosly in HFile and may not be in same block?
> > > >
> > > >
> > > Just the row key. Value is not considered in the sort.
> > >
> > > Yes, HFile has separate entry for each KeyValue (or 'Cell' in
> > hbase-speak).
> > >
> > > Cells in HFile are sorted. Those of the same or near 'Cell' coordinates
> > > will be sorted together and may therefore appear inside the same block.
> > >
> > > St.Ack
> > >
> > >
> > >
> > > > On Tue, May 3, 2016 at 12:05 AM, Stack <stack@duboce.net> wrote:
> > > >
> > > > > On Mon, May 2, 2016 at 10:06 AM, Shushant Arora <
> > > > shushantarora09@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Stack
> > > > > >
> > > > > > for point 2 :
> > > > > > I am concerned with downtime of Hbase for read and write.
> > > > > > If write lock is just for the time while we move aside the
> current
> > > > > > MemStore.
> > > > > > Then when a write happens to key will it update the memstore
only
> > but
> > > > > > snapshot does not have that update and when snapshot is dunmped
> to
> > > > Hfile
> > > > > > won't we loose the update?
> > > > > >
> > > > > >
> > > > > >
> > > > > No. The update is in the new currently active MemStore. The update
> > will
> > > > be
> > > > > included in the next flush added to a new hfile.
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > On Mon, May 2, 2016 at 9:06 PM, Stack <stack@duboce.net>
wrote:
> > > > > >
> > > > > > > On Mon, May 2, 2016 at 1:25 AM, Shushant Arora <
> > > > > > shushantarora09@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Few doubts;
> > > > > > > >
> > > > > > > > 1.LSM tree comprises two tree-like
> > > > > > > > <https://en.wikipedia.org/wiki/Tree_(data_structure)>
> > > structures,
> > > > > > called
> > > > > > > > C0 and
> > > > > > > > C1 and If the insertion causes the C0 component to
exceed a
> > > certain
> > > > > > size
> > > > > > > > threshold, a contiguous segment of entries is removed
from C0
> > and
> > > > > > merged
> > > > > > > > into C1 on disk
> > > > > > > >
> > > > > > > > But in Hbase when C0 which is memstore I guess? is
exceeded
> the
> > > > > > threshold
> > > > > > > > size its dumped on to HDFS as HFIle(c1 I guess?) -
and does
> > > > > compaction
> > > > > > is
> > > > > > > > the process which here means as merging of C0 and
C1 ?
> > > > > > > >
> > > > > > > >
> > > > > > > The 'merge' in the quoted high-level description may just
mean
> > that
> > > > the
> > > > > > > dumped hfile is 'merged' with the others at read time.
Or it
> may
> > be
> > > > as
> > > > > > > stated, that the 'merge' happens at flush time. Some LSM
tree
> > > > > > > implementations do it this way -- Bigtable, and it calls
the
> > merge
> > > of
> > > > > > > memstore and a file-on-disk a form of compaction -- but
this is
> > not
> > > > > what
> > > > > > > HBase does; it just dumps the memstore as a flushed hfile.
> Later,
> > > > we'll
> > > > > > run
> > > > > > > a compaction process to merge hfiles in background.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > 2.Moves current, active Map aside as a snapshot (while
a
> write
> > > lock
> > > > > is
> > > > > > > held
> > > > > > > > for a short period of time), and then creates a new
CSLS
> > > instances.
> > > > > > > >
> > > > > > > > In background, the snapshot is then dumped to disk.
We get an
> > > > > Iterator
> > > > > > on
> > > > > > > > CSLS. We write a block at a time. When we exceed configured
> > block
> > > > > size,
> > > > > > > we
> > > > > > > > start a new one.
> > > > > > > >
> > > > > > > > -- Does write lock is held till the time complete
CSLS is
> > dumpled
> > > > on
> > > > > > > > disk.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > No. Just while we move aside the current MemStore.
> > > > > > >
> > > > > > > What is your concern/objective? Are you studying LSM trees
> > > generally
> > > > or
> > > > > > are
> > > > > > > you worried that HBase is offline for periods of time for
read
> > and
> > > > > write?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > St.Ack
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > And read is allowed using snapshot.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, May 2, 2016 at 11:39 AM, Stack <stack@duboce.net>
> > wrote:
> > > > > > > >
> > > > > > > > > On Sun, May 1, 2016 at 3:36 AM, Shushant Arora
<
> > > > > > > > shushantarora09@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > 1.Does Hbase uses ConcurrentskipListMap(CSLM)
to store
> data
> > > in
> > > > > > > > memstore?
> > > > > > > > > >
> > > > > > > > > > Yes (We use a CSLS but this is implemented
over a CSLM).
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > 2.When mwmstore is flushed to HDFS- does
it dump the
> > memstore
> > > > > > > > > > Concurrentskiplist as Hfile2? Then How does
it calculates
> > > > blocks
> > > > > > out
> > > > > > > of
> > > > > > > > > > CSLM and dmp them in HDFS.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > Moves current, active Map aside as a snapshot
(while a
> write
> > > lock
> > > > > is
> > > > > > > held
> > > > > > > > > for a short period of time), and then creates
a new CSLS
> > > > instances.
> > > > > > > > >
> > > > > > > > > In background, the snapshot is then dumped to
disk. We get
> an
> > > > > > Iterator
> > > > > > > on
> > > > > > > > > CSLS. We write a block at a time. When we exceed
configured
> > > block
> > > > > > size,
> > > > > > > > we
> > > > > > > > > start a new one.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > 3.After dumping the inmemory CSLM of memstore
to HFILe
> does
> > > > > > memstore
> > > > > > > > > > content is discarded
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > and if while dumping memstore any read request
comes
> > > > > > > > > > will it be responded by copy of memstore
or discard of
> > > memstore
> > > > > > will
> > > > > > > be
> > > > > > > > > > blocked until read request is completed?
> > > > > > > > > >
> > > > > > > > > > We will respond using the snapshot until
it has been
> > > > successfully
> > > > > > > > dumped.
> > > > > > > > > Once dumped, we'll respond using the hfile.
> > > > > > > > >
> > > > > > > > > No blocking (other than for the short period
during which
> the
> > > > > > snapshot
> > > > > > > is
> > > > > > > > > made and the file is swapped into the read path).
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > 4.When a read request comes does it look
in inmemory CSLM
> > and
> > > > > then
> > > > > > in
> > > > > > > > > > HFile?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Generally, yes.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > And what is LogStructuredMerge tree and
its usage in
> Hbase.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > Suggest you read up on LSM Trees (
> > > > > > > > > https://en.wikipedia.org/wiki/Log-structured_merge-tree)
> and
> > > if
> > > > > you
> > > > > > > > still
> > > > > > > > > can't see the LSM tree in the HBase forest, ask
specific
> > > > questions
> > > > > > and
> > > > > > > > > we'll help you out.
> > > > > > > > >
> > > > > > > > > St.Ack
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message