hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikael Sitruk <mikael.sit...@gmail.com>
Subject Re: Major Compaction Concerns
Date Fri, 13 Jan 2012 23:23:24 GMT
Hi

Nicolas - Can you point on lazy seek patch referenced? Like Ted, from the
release note i can't find "lazy seek", unless you
are referring to HBASE-4434

> Filling up the logs faster than you can flush normally indicates that you
> have disk or network saturation.  If you have an increment workload, I
> know there are a number of patches in 0.92 that will drastically reduce
> your flush size (1: read memstore before going to disk, 2: don't flush all
> versions).  You don't have a compaction problem, you have a write/read
> problem.

I'm sorry but i don't understand, of course i have a disk and network
saturation and the flush stop to flush because he is waiting for compaction
to finish. Since this a major compaction was triggered - all the
stores (large number)  present on the disks (7 disk per RS) will be grabbed
for major compact, and the I/O is affected. Network is also affected since
all are major compacting at the same time and replicating files on same
time (1GB network).
I don't have an increment workload (the workload either update columns on a
CF or add column on a CF for the same key), so how those patch will help?

> Per-storefile compactions & multi-threaded compactions were added 0.92 to
> address this problem.  However, a high StoreFile count is not necessarily
> a bad thing.  For an update workload, you only have to read the newest
> StoreFile and lazy seek optimizes your situation a lot (again 0.92).
I don't say this is a bad thing, this is just an observation from our test,
HBase will slow down the flush in case too many store file are present, and
will add pressure on GC and memory affecting performance.
The update workload does not send all the row content for a certain key so
only partial data is written, in order to get all the row i presume that
reading the newest Store is not enough ("all" stores need to be read
collecting the more up to date field a rebuild a full row), or i'm missing
something?

> There are mostly 3 different workloads that require different
> optimizations (not necessarily compaction-related):
> 1. Read old data.  Should properly use bloom filters to filter out
> StoreFiles
> 2. R+W.  Will really benefit from lazy seeks & cache on write (0.92).  Far
> more than a compaction algorithm
> 3. Write mostly.  Don't really care about compactions here.  Just don't
> want them to be sucking too much IO

1. If i did not set a specific property for bloom filter (BF), does it
means that i'm not using them (the book only refer to BF with regards to
CF)?
3. How can we ensure that compaction will not suck too much I/O if we
cannot control major compaction?

Thanks  & Regards
Mikael.S

On Fri, Jan 13, 2012 at 12:20 AM, lars hofhansl <lhofhansl@yahoo.com> wrote:

> HBASE-4465 is not needed for correctness.
> Personally I'd rather release 0.94 sooner rather than backporting
> non-trivial patches.
>
> I realize I am guilty of this myself (see HBASE-4838... although that was
> an important correctness fix)
>
> -- Lars
>
> ________________________________
> From: Ted Yu <yuzhihong@gmail.com>
> To: dev@hbase.apache.org
> Cc: Mikael Sitruk <mikael.sitruk@gmail.com>
> Sent: Thursday, January 12, 2012 2:09 PM
> Subject: Re: Major Compaction Concerns
>
> Thanks for the tips, Nicolas.
>
> About lazy seek, if you were referring to HBASE-4465, that was only
> integrated into TRUNK and 0.89-fb.
> I was thinking about backporting it to 0.92
>
> Cheers
>
> On Thu, Jan 12, 2012 at 1:44 PM, Nicolas Spiegelberg <nspiegelberg@fb.com
> >wrote:
>
> > Mikael,
> >
> > >The system is an OLTP system, with strict latency and throughput
> > >requirements, regions are pre-splitted and throughput is controlled.
> > >
> > >The system has heavy load period for few hours, during heavy load i mean
> > >high proportion insert/update and small proportion of read.
> >
> > I'm not sure about the production status of your system, but you sound
> > like you have critical need for dozens of optimization features coming
> out
> > in 0.92 and even some trunk patches.  In particular, update speed has
> been
> > drastically improved due to lazy seek.  Although you can get incremental
> > wins with a different compaction features, you will get exponential wins
> > from looking at other features right now.
> >
> > >we fall in the memstore flush throttling (
> > >will wait 90000 ms before flushing the memstore) retaining more logs,
> > >triggering more flush that can't be flushed.... adding pressure on the
> > >system memory (memstore is not flushed on time)
> >
> > Filling up the logs faster than you can flush normally indicates that you
> > have disk or network saturation.  If you have an increment workload, I
> > know there are a number of patches in 0.92 that will drastically reduce
> > your flush size (1: read memstore before going to disk, 2: don't flush
> all
> > versions).  You don't have a compaction problem, you have a write/read
> > problem.
> >
> > In 0.92, you can try setting your compaction.ratio down (0.25 is a good
> > start) to increase the StoreFile count to slow reads but save Network IO
> > on write.  This setting is very similar to the defaults suggested in the
> > BigTable paper.  However, this is only going to cut your Network IO in
> > half.  The LevelDB or BigTable algorithm can reduce your outlier
> StoreFile
> > count, but they wouldn't be able to cut this IO volume down much either.
> >
> > >Please remember i'm on 0.90.1 so when major compaction is running minor
> is
> > >blocked, when a memstore for a column family is flushed all other
> memstore
> > >(for other) column family are also (no matter if they are smaller or
> not).
> > >As you already wrote, the best way is to manage compaction, and it is
> what
> > >i tried to do.
> >
> > Per-storefile compactions & multi-threaded compactions were added 0.92 to
> > address this problem.  However, a high StoreFile count is not necessarily
> > a bad thing.  For an update workload, you only have to read the newest
> > StoreFile and lazy seek optimizes your situation a lot (again 0.92).
> >
> > >Regarding the compaction plug-ability needs.
> > >Let suppose that the data you are inserting in different column family
> has
> > >a different pattern, for example on CF1 (column family #1) you update
> > >fields in the same row key while in CF2 you add each time new fields or
> > >CF2 has new row and older rows are never updated won't you use different
> > >algorithms for compacting these CF?
> >
> > There are mostly 3 different workloads that require different
> > optimizations (not necessarily compaction-related):
> > 1. Read old data.  Should properly use bloom filters to filter out
> > StoreFiles
> > 2. R+W.  Will really benefit from lazy seeks & cache on write (0.92).
> Far
> > more than a compaction algorithm
> > 3. Write mostly.  Don't really care about compactions here.  Just don't
> > want them to be sucking too much IO
> >
> > >Finally the schema design is guided by the ACID property of a row, we
> have
> > >2 CF only both CF holds a different volume of data even if they are
> > >Updated approximately with the same amount of data (cell updated vs cell
> > >created).
> >
> > Note that 0.90 only had row-based write atomicity.  HBASE-2856 is
> > necessary for row-based read atomicity across column families.
> >
> >
>



-- 
Mikael.S

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message