accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Wonders <jwonder...@gmail.com>
Subject Re: maximize usage of cluster resources during ingestion
Date Thu, 06 Jul 2017 01:01:42 GMT
Hi Massimilian,

Are you seeing held commits during the ingest pauses?  Just based on having
looked at many similar graphs in the past, this might be one of the major
culprits.  A tablet server has a memory region with a bounded size (
tserver.memory.maps.max) where it buffers data that has not yet been
written to RFiles (through the process of minor compaction).  The region is
segmented by tablet and each tablet can have a buffer that is undergoing
ingest as well as a buffer that is undergoing minor compaction.  A memory
manager decides when to initiate minor compactions for the tablet buffers
and the default implementation tries to keep the memory region 80-90% full
while preferring to compact the largest tablet buffers.  Creating larger
RFiles during minor compaction should lead to less major compactions.
During a minor compaction, the tablet buffer still "consumes" memory within
the in memory map and high ingest rates can lead to exhausing the remaining
capacity.  The default memory manage uses an adaptive strategy to predict
the expected memory usage and makes compaction decisions that should
maintain some free memory.  Batch writers can be bursty and a bit
unpredictable which could throw off these estimates.  Also, depending on
the ingest profile, sometimes an in-memory tablet buffer will consume a
large percentage of the total buffer.  This leads to long minor compactions
when the buffer size is large which can allow ingest enough time to exhaust
the buffer before that memory can be reclaimed.  When a tablet server has
to block ingest, it can affect client ingest rates to other tablet servers
due to the way that batch writers work.  This can lead to other tablet
servers underestimating future ingest rates which can further exacerbate
the problem.

There are some configuration changes that could reduce the severity of held
commits, although they might reduce peak ingest rates.  Reducing the in
memory map size can reduce the maximum pause time due to held commits.
Adding additional tablets should help avoid the problem of a single tablet
buffer consuming a large percentage of the memory region.  It might be
better to aim for ~20 tablets per server if your problem allows for it.  It
is also possible to replace the memory manager with a custom one.  I've
tried this in the past and have seen stability improvements by making the
memory thresholds less aggressive (50-75% full).  This did reduce peak
ingest rate in some cases, but that was a reasonable tradeoff.

Based on your current configuration, if a tablet server is serving 4
tablets and has a 32GB buffer, your first minor compactions will be at
least 8GB and they will probably grow larger over time until the tablets
naturally split.  Consider how long it would take to write this RFile
compared to your peak ingest rate.  As others have suggested, make sure to
use the native maps.  Based on your current JVM heap size, using the Java
in-memory map would probably lead to OOME or very bad GC performance.

Accumulo can trace minor compaction durations so you can get a feel for max
pause times or measure the effect of configuration changes.

Cheers,
--Jonathan

On Wed, Jul 5, 2017 at 7:16 PM, Dave Marion <dlmarion@comcast.net> wrote:

>
>
> Based on what Cyrille said, I would look at garbage collection,
> specifically I would look at how much of your newly allocated objects spill
> into the old generation before they are flushed to disk. Additionally, I
> would turn off the debug log or log to SSD’s if you have them. Another
> thought, seeing that you have 256GB RAM / node, is to run multiple tablet
> servers per node. Do you have 10 threads on your Batch Writers? What about
> the Batch Writer latency, is it too low such that you are not filling the
> buffer?
>
>
>
> *From:* Massimilian Mattetti [mailto:MASSIMIL@il.ibm.com]
> *Sent:* Wednesday, July 05, 2017 8:37 AM
> *To:* user@accumulo.apache.org
> *Subject:* maximize usage of cluster resources during ingestion
>
>
>
> Hi all,
>
> I have an Accumulo 1.8.1 cluster made by 12 bare metal servers. Each
> server has 256GB of Ram and 2 x 10 cores CPU. 2 machines are used as
> masters (running HDFS NameNodes, Accumulo Master and Monitor). The other 10
> machines has 12 Disks of 1 TB (11 used by HDFS DataNode process) and are
> running Accumulo TServer processes. All the machines are connected via a
> 10Gb network and 3 of them are running ZooKeeper. I have run some heavy
> ingestion test on this cluster but I have never been able to reach more
> than *20% *CPU usage on each Tablet Server. I am running an ingestion
> process (using batch writers) on each data node. The table is pre-split in
> order to have 4 tablets per tablet server. Monitoring the network I have
> seen that data is received/sent from each node with a peak rate of about
> 120MB/s / 100MB/s while the aggregated disk write throughput on each tablet
> servers is around 120MB/s.
>
> The table configuration I am playing with are:
> "table.file.replication": "2",
> "table.compaction.minor.logs.threshold": "10",
> "table.durability": "flush",
> "table.file.max": "30",
> "table.compaction.major.ratio": "9",
> "table.split.threshold": "1G"
>
> while the tablet server configuration is:
> "tserver.wal.blocksize": "2G",
> "tserver.walog.max.size": "8G",
> "tserver.memory.maps.max": "32G",
> "tserver.compaction.minor.concurrent.max": "50",
> "tserver.compaction.major.concurrent.max": "8",
> "tserver.total.mutation.queue.max": "50M",
> "tserver.wal.replication": "2",
> "tserver.compaction.major.thread.files.open.max": "15"
>
> the tablet server heap has been set to 32GB
>
> From Monitor UI
>
>
> As you can see I have a lot of valleys in which the ingestion rate reaches
> 0.
> What would be a good procedure to identify the bottleneck which causes the
> 0 ingestion rate periods?
> Thanks.
>
> Best Regards,
> Max
>

Mime
View raw message