accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hai Pham <htp0...@tigermail.auburn.edu>
Subject Re: How to control Minor Compaction by programming
Date Fri, 31 Jul 2015 15:52:11 GMT
Hi Josh and John,


Correct. Since one of my constraint was the time, I tested with wal flush and wal disabled
and the the lost data case happened in wal disabled mode - my mistake for not having described.


I have 1 master + 16 hadoop slaves under Accumulo, all are Centos 6.5 physical boxes times
at least 500GB, 24G RAM each, but the network is only 1G. DFS replication = 3 by default.
I tested with 4  and 8 splits, the hold time problem was likely happen more often in 4 splits.
And you are right, changing flushing scheme got the problem remediated.


Thank you a lot!

Hai

________________________________
From: John Vines <vines@apache.org>
Sent: Friday, July 31, 2015 10:29 AM
To: user@accumulo.apache.org
Subject: Re: How to control Minor Compaction by programming

Data could be lost if walogs were disabled or configured to use a poor flushing mechanism.

However, I'm also concerned about the hold times from a single ingest being enough to bring
down a server. What's the environment you're running in? Are these virtualized or real servers?
How many splits did you make. How many disks per node do you have? And are you using default
hdfs replication?

On Fri, Jul 31, 2015 at 11:11 AM Josh Elser <josh.elser@gmail.com<mailto:josh.elser@gmail.com>>
wrote:

Hai Pham wrote:
> Hi Keith,
>
>
> I have 4 tablet servers + 1 master. I also did a pre-split before
> ingesting and it increased the speed a lot.
>
>
> And you're right, when I created too many ingest threads, many of them
> were on the queue of thread pools and the hold time will increases. In
> some intense ingest, there was a case when a tablet was killed by master
> for the hold time exceeded 5 min. In this situation, all Tablets were in
> stuck. Only after that one is dead, the ingest was back with the
> comparable speed. But the entries in dead tablet were all gone and lost
> to the table.

You're saying that you lost data? If a server dies, all of the tablets
that were hosted there are reassigned to other servers. This is done in
a manner that guarantees that there is no data lost in this transition.
If you actually lost data, this would be a critical bug, but I would
certainly hope you just didn't realize that the data was automatically
being hosted by another server.

> I have had no idea to repair this except for regulating the number of
> ingest threads and speed to make it more friendly to the terminal of
> Accumulo itself.
>
>
> Another myth to me is that when I did a pre-split to, e.g. 8 tablets.
> But along with the ingest operation, the tablet number increases (e.g.
> 10, 14 or bigger). Any idea?

Yep, Accumulo will naturally split tablets when they exceed a certain
size (1GB by default for normal tables). Unless you increase the
property table.split.threshold, as you ingest more data, you will
observe more tablets.

Given enough time, Accumulo will naturally split your table enough.
Pre-splitting quickly gets you to a good level of performance right away.

>
> Hai
> ------------------------------------------------------------------------
> *From:* Keith Turner <keith@deenlo.com<mailto:keith@deenlo.com>>
> *Sent:* Friday, July 31, 2015 8:39 AM
> *To:* user@accumulo.apache.org<mailto:user@accumulo.apache.org>
> *Subject:* Re: How to control Minor Compaction by programming
> How many tablets do you have? Entire tablets are minor compacted at
> once. If you have 1 tablet per tablet server, then minor compactions
> will have a lot of work to do at once. While this work is being done,
> the tablet servers memory may fill up, leading to writes being held.
>
> If you have 10 tablets per tablet server, then tablets can be compacted
> in parallel w/ less work to do at any given point in time. This can
> avoid memory filling up and writes being held.
>
> In short, its possible that adding good split points to the table (and
> therefore creating more tablets) may help w/ this issue.
>
> Also, are you seeing hold times?
>
> On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham <htp0005@tigermail.auburn.edu<mailto:htp0005@tigermail.auburn.edu>
> <mailto:htp0005@tigermail.auburn.edu<mailto:htp0005@tigermail.auburn.edu>>>
wrote:
>
>     Hey William, Josh and David,
>
>     Thanks for explaining, I might not have been clear: I used the web
>     interface with port 50095 to monitor the real-time charts (ingest,
>     scan, load average, minor compaction, major compaction, ...).
>
>     Nonetheless, as I witnessed, when I ingested about 100k entries ->
>     then minor compaction happened -> ingest was stuck -> the level of
>     minor compaction on the charts was just about 1.0, 2.0 and max 3.0
>     while about >20k entries were forced out of memory (I knew this by
>     looking at the number of entries in memory w.r.t the table being
>     ingested to) -> then when minor compaction ended, ingest resumed,
>     somewhat faster.
>
>     Thus I presume the level 1.0, 2.0, 3.0 is not representative for
>     number of files being minor-compacted from memory?
>
>     Hai
>     ________________________________________
>     From: Josh Elser <josh.elser@gmail.com<mailto:josh.elser@gmail.com> <mailto:josh.elser@gmail.com<mailto:josh.elser@gmail.com>>>
>     Sent: Thursday, July 30, 2015 7:12 PM
>     To: user@accumulo.apache.org<mailto:user@accumulo.apache.org> <mailto:user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
>     Subject: Re: How to control Minor Compaction by programming
>
>     >
>      > Also, can you please explain the number 0, 1.0, 2.0, ... in
>     charts (web
>      > monitoring) denoting the level of Minor Compaction and Major
>     Compaction?
>
>     On the monitor, the number of compactions are of the form:
>
>     active (queued)
>
>     e.g. 4 (2), would mean that 4 are running and 2 are queued.
>
>      >
>      >
>      > Thank you!
>      >
>      > Hai Pham
>      >
>      >
>      >
>      >
>
>

Mime
View raw message