accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hai Pham <htp0...@tigermail.auburn.edu>
Subject Re: How to control Minor Compaction by programming
Date Fri, 31 Jul 2015 20:31:26 GMT
Thanks Josh and everyone. You did help greatly. 
Hai 
________________________________________
From: Josh Elser <josh.elser@gmail.com>
Sent: Friday, July 31, 2015 3:09 PM
To: user@accumulo.apache.org
Subject: Re: How to control Minor Compaction by programming

You may benefit from reading the following section:

http://accumulo.apache.org/1.6/accumulo_user_manual.html#_administration_configuration

Specifically the formula for choosing a new size for
tserver.memory.maps.max.

Hai Pham wrote:
> Hi John,
>
>
> For your advice, I will test with other options for number of splits.
>
>
> My native map size is 1G (default). I am also trying increasing it.
> Maybe if the problem is reproduced, I will have more information to
> provide. Thank you!
>
>
> Hai
>
>
> ------------------------------------------------------------------------
> *From:* John Vines <vines@apache.org>
> *Sent:* Friday, July 31, 2015 11:12 AM
> *To:* user@accumulo.apache.org
> *Subject:* Re: How to control Minor Compaction by programming
> If you have only 4/8 tablets for 4 tservers, you're not really
> parallelizing well.
>
> That doesn't explain a 5 minute hold time though, that is strange. How
> large is your in memory map size?
>
>
> On Fri, Jul 31, 2015 at 11:53 AM Hai Pham <htp0005@tigermail.auburn.edu
> <mailto:htp0005@tigermail.auburn.edu>> wrote:
>
>     Hi Josh and John,
>
>
>     Correct. Since one of my constraint was the time, I tested with wal
>     flush and wal disabled and the the lost data case happened in wal
>     disabled mode - my mistake for not having described.
>
>
>     I have 1 master + 16 hadoop slaves under Accumulo, all are Centos
>     6.5 physical boxes times at least 500GB, 24G RAM each, but the
>     network is only 1G. DFS replication = 3 by default. I tested with 4
>     and 8 splits, the hold time problem was likely happen more often in
>     4 splits. And you are right, changing flushing scheme got the
>     problem remediated.
>
>
>     Thank you a lot!
>
>     Hai
>
>     ------------------------------------------------------------------------
>     *From:* John Vines <vines@apache.org <mailto:vines@apache.org>>
>     *Sent:* Friday, July 31, 2015 10:29 AM
>     *To:* user@accumulo.apache.org <mailto:user@accumulo.apache.org>
>
>     *Subject:* Re: How to control Minor Compaction by programming
>     Data could be lost if walogs were disabled or configured to use a
>     poor flushing mechanism.
>
>     However, I'm also concerned about the hold times from a single
>     ingest being enough to bring down a server. What's the environment
>     you're running in? Are these virtualized or real servers? How many
>     splits did you make. How many disks per node do you have? And are
>     you using default hdfs replication?
>
>     On Fri, Jul 31, 2015 at 11:11 AM Josh Elser <josh.elser@gmail.com
>     <mailto:josh.elser@gmail.com>> wrote:
>
>
>         Hai Pham wrote:
>          > Hi Keith,
>          >
>          >
>          > I have 4 tablet servers + 1 master. I also did a pre-split before
>          > ingesting and it increased the speed a lot.
>          >
>          >
>          > And you're right, when I created too many ingest threads,
>         many of them
>          > were on the queue of thread pools and the hold time will
>         increases. In
>          > some intense ingest, there was a case when a tablet was
>         killed by master
>          > for the hold time exceeded 5 min. In this situation, all
>         Tablets were in
>          > stuck. Only after that one is dead, the ingest was back with the
>          > comparable speed. But the entries in dead tablet were all
>         gone and lost
>          > to the table.
>
>         You're saying that you lost data? If a server dies, all of the
>         tablets
>         that were hosted there are reassigned to other servers. This is
>         done in
>         a manner that guarantees that there is no data lost in this
>         transition.
>         If you actually lost data, this would be a critical bug, but I would
>         certainly hope you just didn't realize that the data was
>         automatically
>         being hosted by another server.
>
>          > I have had no idea to repair this except for regulating the
>         number of
>          > ingest threads and speed to make it more friendly to the
>         terminal of
>          > Accumulo itself.
>          >
>          >
>          > Another myth to me is that when I did a pre-split to, e.g. 8
>         tablets.
>          > But along with the ingest operation, the tablet number
>         increases (e.g.
>          > 10, 14 or bigger). Any idea?
>
>         Yep, Accumulo will naturally split tablets when they exceed a
>         certain
>         size (1GB by default for normal tables). Unless you increase the
>         property table.split.threshold, as you ingest more data, you will
>         observe more tablets.
>
>         Given enough time, Accumulo will naturally split your table enough.
>         Pre-splitting quickly gets you to a good level of performance
>         right away.
>
>          >
>          > Hai
>          >
>         ------------------------------------------------------------------------
>          > *From:* Keith Turner <keith@deenlo.com <mailto:keith@deenlo.com>>
>          > *Sent:* Friday, July 31, 2015 8:39 AM
>          > *To:* user@accumulo.apache.org <mailto:user@accumulo.apache.org>
>          > *Subject:* Re: How to control Minor Compaction by programming
>          > How many tablets do you have? Entire tablets are minor
>         compacted at
>          > once. If you have 1 tablet per tablet server, then minor
>         compactions
>          > will have a lot of work to do at once. While this work is
>         being done,
>          > the tablet servers memory may fill up, leading to writes
>         being held.
>          >
>          > If you have 10 tablets per tablet server, then tablets can be
>         compacted
>          > in parallel w/ less work to do at any given point in time.
>         This can
>          > avoid memory filling up and writes being held.
>          >
>          > In short, its possible that adding good split points to the
>         table (and
>          > therefore creating more tablets) may help w/ this issue.
>          >
>          > Also, are you seeing hold times?
>          >
>          > On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham
>         <htp0005@tigermail.auburn.edu <mailto:htp0005@tigermail.auburn.edu>
>          > <mailto:htp0005@tigermail.auburn.edu
>         <mailto:htp0005@tigermail.auburn.edu>>> wrote:
>          >
>          > Hey William, Josh and David,
>          >
>          > Thanks for explaining, I might not have been clear: I used
>         the web
>          > interface with port 50095 to monitor the real-time charts
>         (ingest,
>          > scan, load average, minor compaction, major compaction, ...).
>          >
>          > Nonetheless, as I witnessed, when I ingested about 100k
>         entries ->
>          > then minor compaction happened -> ingest was stuck -> the
>         level of
>          > minor compaction on the charts was just about 1.0, 2.0 and
>         max 3.0
>          > while about >20k entries were forced out of memory (I knew
>         this by
>          > looking at the number of entries in memory w.r.t the table being
>          > ingested to) -> then when minor compaction ended, ingest resumed,
>          > somewhat faster.
>          >
>          > Thus I presume the level 1.0, 2.0, 3.0 is not representative for
>          > number of files being minor-compacted from memory?
>          >
>          > Hai
>          > ________________________________________
>          > From: Josh Elser <josh.elser@gmail.com
>         <mailto:josh.elser@gmail.com> <mailto:josh.elser@gmail.com
>         <mailto:josh.elser@gmail.com>>>
>          > Sent: Thursday, July 30, 2015 7:12 PM
>          > To: user@accumulo.apache.org
>         <mailto:user@accumulo.apache.org>
>         <mailto:user@accumulo.apache.org <mailto:user@accumulo.apache.org>>
>          > Subject: Re: How to control Minor Compaction by programming
>          >
>          > >
>          > > Also, can you please explain the number 0, 1.0, 2.0, ... in
>          > charts (web
>          > > monitoring) denoting the level of Minor Compaction and Major
>          > Compaction?
>          >
>          > On the monitor, the number of compactions are of the form:
>          >
>          > active (queued)
>          >
>          > e.g. 4 (2), would mean that 4 are running and 2 are queued.
>          >
>          > >
>          > >
>          > > Thank you!
>          > >
>          > > Hai Pham
>          > >
>          > >
>          > >
>          > >
>          >
>          >
>

Mime
View raw message