accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Vines <vi...@apache.org>
Subject Re: How to control Minor Compaction by programming
Date Fri, 31 Jul 2015 16:12:28 GMT
If you have only 4/8 tablets for 4 tservers, you're not really
parallelizing well.

That doesn't explain a 5 minute hold time though, that is strange. How
large is your in memory map size?


On Fri, Jul 31, 2015 at 11:53 AM Hai Pham <htp0005@tigermail.auburn.edu>
wrote:

> Hi Josh and John,
>
>
> Correct. Since one of my constraint was the time, I tested with wal flush
> and wal disabled and the the lost data case happened in wal disabled mode -
> my mistake for not having described.
>
>
> I have 1 master + 16 hadoop slaves under Accumulo, all are Centos
> 6.5 physical boxes times at least 500GB, 24G RAM each, but the network is
> only 1G. DFS replication = 3 by default. I tested with 4  and 8 splits, the
> hold time problem was likely happen more often in 4 splits. And you are
> right, changing flushing scheme got the problem remediated.
>
>
> Thank you a lot!
>
> Hai
> ------------------------------
> *From:* John Vines <vines@apache.org>
> *Sent:* Friday, July 31, 2015 10:29 AM
> *To:* user@accumulo.apache.org
>
> *Subject:* Re: How to control Minor Compaction by programming
> Data could be lost if walogs were disabled or configured to use a poor
> flushing mechanism.
>
> However, I'm also concerned about the hold times from a single ingest
> being enough to bring down a server. What's the environment you're running
> in? Are these virtualized or real servers? How many splits did you make.
> How many disks per node do you have? And are you using default hdfs
> replication?
>
> On Fri, Jul 31, 2015 at 11:11 AM Josh Elser <josh.elser@gmail.com> wrote:
>
>>
>> Hai Pham wrote:
>> > Hi Keith,
>> >
>> >
>> > I have 4 tablet servers + 1 master. I also did a pre-split before
>> > ingesting and it increased the speed a lot.
>> >
>> >
>> > And you're right, when I created too many ingest threads, many of them
>> > were on the queue of thread pools and the hold time will increases. In
>> > some intense ingest, there was a case when a tablet was killed by master
>> > for the hold time exceeded 5 min. In this situation, all Tablets were in
>> > stuck. Only after that one is dead, the ingest was back with the
>> > comparable speed. But the entries in dead tablet were all gone and lost
>> > to the table.
>>
>> You're saying that you lost data? If a server dies, all of the tablets
>> that were hosted there are reassigned to other servers. This is done in
>> a manner that guarantees that there is no data lost in this transition.
>> If you actually lost data, this would be a critical bug, but I would
>> certainly hope you just didn't realize that the data was automatically
>> being hosted by another server.
>>
>> > I have had no idea to repair this except for regulating the number of
>> > ingest threads and speed to make it more friendly to the terminal of
>> > Accumulo itself.
>> >
>> >
>> > Another myth to me is that when I did a pre-split to, e.g. 8 tablets.
>> > But along with the ingest operation, the tablet number increases (e.g.
>> > 10, 14 or bigger). Any idea?
>>
>> Yep, Accumulo will naturally split tablets when they exceed a certain
>> size (1GB by default for normal tables). Unless you increase the
>> property table.split.threshold, as you ingest more data, you will
>> observe more tablets.
>>
>> Given enough time, Accumulo will naturally split your table enough.
>> Pre-splitting quickly gets you to a good level of performance right away.
>>
>> >
>> > Hai
>> > ------------------------------------------------------------------------
>> > *From:* Keith Turner <keith@deenlo.com>
>> > *Sent:* Friday, July 31, 2015 8:39 AM
>> > *To:* user@accumulo.apache.org
>> > *Subject:* Re: How to control Minor Compaction by programming
>> > How many tablets do you have? Entire tablets are minor compacted at
>> > once. If you have 1 tablet per tablet server, then minor compactions
>> > will have a lot of work to do at once. While this work is being done,
>> > the tablet servers memory may fill up, leading to writes being held.
>> >
>> > If you have 10 tablets per tablet server, then tablets can be compacted
>> > in parallel w/ less work to do at any given point in time. This can
>> > avoid memory filling up and writes being held.
>> >
>> > In short, its possible that adding good split points to the table (and
>> > therefore creating more tablets) may help w/ this issue.
>> >
>> > Also, are you seeing hold times?
>> >
>> > On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham <
>> htp0005@tigermail.auburn.edu
>> > <mailto:htp0005@tigermail.auburn.edu>> wrote:
>> >
>> >     Hey William, Josh and David,
>> >
>> >     Thanks for explaining, I might not have been clear: I used the web
>> >     interface with port 50095 to monitor the real-time charts (ingest,
>> >     scan, load average, minor compaction, major compaction, ...).
>> >
>> >     Nonetheless, as I witnessed, when I ingested about 100k entries ->
>> >     then minor compaction happened -> ingest was stuck -> the level of
>> >     minor compaction on the charts was just about 1.0, 2.0 and max 3.0
>> >     while about >20k entries were forced out of memory (I knew this by
>> >     looking at the number of entries in memory w.r.t the table being
>> >     ingested to) -> then when minor compaction ended, ingest resumed,
>> >     somewhat faster.
>> >
>> >     Thus I presume the level 1.0, 2.0, 3.0 is not representative for
>> >     number of files being minor-compacted from memory?
>> >
>> >     Hai
>> >     ________________________________________
>> >     From: Josh Elser <josh.elser@gmail.com <mailto:josh.elser@gmail.com
>> >>
>> >     Sent: Thursday, July 30, 2015 7:12 PM
>> >     To: user@accumulo.apache.org <mailto:user@accumulo.apache.org>
>> >     Subject: Re: How to control Minor Compaction by programming
>> >
>> >     >
>> >      > Also, can you please explain the number 0, 1.0, 2.0, ... in
>> >     charts (web
>> >      > monitoring) denoting the level of Minor Compaction and Major
>> >     Compaction?
>> >
>> >     On the monitor, the number of compactions are of the form:
>> >
>> >     active (queued)
>> >
>> >     e.g. 4 (2), would mean that 4 are running and 2 are queued.
>> >
>> >      >
>> >      >
>> >      > Thank you!
>> >      >
>> >      > Hai Pham
>> >      >
>> >      >
>> >      >
>> >      >
>> >
>> >
>>
>

Mime
View raw message