accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Marion <dlmar...@comcast.net>
Subject Re: maximize usage of cluster resources during ingestion
Date Thu, 06 Jul 2017 12:18:25 GMT
That's a good point. I would also look at increasing tserver.total.mutation.queue.max. Are
you seeing hold times? If not, I would keep pushing harder until you do, then move to multiple
tablet servers. Do you have any GC logs?


> On July 6, 2017 at 4:47 AM Cyrille Savelief <csavelief@gmail.com> wrote:
> 
>     Are you sure Accumulo is not waiting for your app's data? There might be GC pauses
in your ingest code (we have already experienced that).
> 
>     Le jeu. 6 juil. 2017 à 10:32, Massimilian Mattetti <MASSIMIL@il.ibm.com mailto:MASSIMIL@il.ibm.com
> a écrit :
> 
>         > > Thank you all for the suggestions.
> > 
> >         About the native memory map I checked the logs on each tablet server and
it was loaded correctly (of course the tserver.memory.maps.native.enabled was set to true),
so the GC pauses should not be the problem eventually. I managed to get much better ingestion
graph by reducing the native map size to 2GB and increasing the Batch Writer threads number
from the default (3 was really bad for my configuration) to 10 (I think it does not make sense
having more threads than tablet servers, am I right?).
> > 
> >         The configuration that I used for the table is:
> >         "table.file.replication": "2",
> >         "table.compaction.minor.logs.threshold": "3",
> >         "table.durability": "flush",
> >         "table.split.threshold": "1G"
> > 
> >         while for the tablet servers is:
> >         "tserver.wal.blocksize": "1G",
> >          "tserver.walog.max.size": "2G",
> >         "tserver.memory.maps.max": "2G",
> >         "tserver.compaction.minor.concurrent.max": "50",
> >         "tserver.compaction.major.concurrent.max": "20",
> >         "tserver.wal.replication": "2",
> >          "tserver.compaction.major.thread.files.open.max": "15"
> > 
> >         The new graph:
> > 
> > 
> >         I still have the problem of a CPU usage that is less than 20%. So I am thinking
to run multiple tablet servers per node (like 5 or 10) in order to maximize the CPU usage.
Besides that I do not have any other idea on how to stress those servers with ingestion.
> >         Any suggestions are very welcome. Meanwhile, thank you all again for your
help.
> > 
> > 
> >         Best Regards,
> >         Massimiliano
> > 
> > 
> > 
> >         From:        Jonathan Wonders <jwonders88@gmail.com mailto:jwonders88@gmail.com
>
> >         To:        user@accumulo.apache.org mailto:user@accumulo.apache.org
> >         Date:        06/07/2017 04:01
> >         Subject:        Re: maximize usage of cluster resources during ingestion
> > 
> >         ---------------------------------------------
> > 
> > 
> > 
> >         Hi Massimilian,
> > 
> >         Are you seeing held commits during the ingest pauses?  Just based on having
looked at many similar graphs in the past, this might be one of the major culprits.  A tablet
server has a memory region with a bounded size (tserver.memory.maps.max) where it buffers
data that has not yet been written to RFiles (through the process of minor compaction).  The
region is segmented by tablet and each tablet can have a buffer that is undergoing ingest
as well as a buffer that is undergoing minor compaction.  A memory manager decides when to
initiate minor compactions for the tablet buffers and the default implementation tries to
keep the memory region 80-90% full while preferring to compact the largest tablet buffers.
 Creating larger RFiles during minor compaction should lead to less major compactions.  During
a minor compaction, the tablet buffer still "consumes" memory within the in memory map and
high ingest rates can lead to exhausing the remaining capacity.  The default memory manage
uses an adaptive strategy to predict the expected memory usage and makes compaction decisions
that should maintain some free memory.  Batch writers can be bursty and a bit unpredictable
which could throw off these estimates.  Also, depending on the ingest profile, sometimes an
in-memory tablet buffer will consume a large percentage of the total buffer.  This leads to
long minor compactions when the buffer size is large which can allow ingest enough time to
exhaust the buffer before that memory can be reclaimed.  When a tablet server has to block
ingest, it can affect client ingest rates to other tablet servers due to the way that batch
writers work.  This can lead to other tablet servers underestimating future ingest rates which
can further exacerbate the problem.
> > 
> >         There are some configuration changes that could reduce the severity of held
commits, although they might reduce peak ingest rates.  Reducing the in memory map size can
reduce the maximum pause time due to held commits.  Adding additional tablets should help
avoid the problem of a single tablet buffer consuming a large percentage of the memory region.
 It might be better to aim for ~20 tablets per server if your problem allows for it.  It is
also possible to replace the memory manager with a custom one.  I've tried this in the past
and have seen stability improvements by making the memory thresholds less aggressive (50-75%
full).  This did reduce peak ingest rate in some cases, but that was a reasonable tradeoff.
> > 
> >         Based on your current configuration, if a tablet server is serving 4 tablets
and has a 32GB buffer, your first minor compactions will be at least 8GB and they will probably
grow larger over time until the tablets naturally split.  Consider how long it would take
to write this RFile compared to your peak ingest rate.  As others have suggested, make sure
to use the native maps.  Based on your current JVM heap size, using the Java in-memory map
would probably lead to OOME or very bad GC performance.
> > 
> >         Accumulo can trace minor compaction durations so you can get a feel for
max pause times or measure the effect of configuration changes.
> > 
> >         Cheers,
> >         --Jonathan
> > 
> >         On Wed, Jul 5, 2017 at 7:16 PM, Dave Marion <dlmarion@comcast.net mailto:dlmarion@comcast.net
> wrote:
> >          
> > 
> >         Based on what Cyrille said, I would look at garbage collection, specifically
I would look at how much of your newly allocated objects spill into the old generation before
they are flushed to disk. Additionally, I would turn off the debug log or log to SSD’s if
you have them. Another thought, seeing that you have 256GB RAM / node, is to run multiple
tablet servers per node. Do you have 10 threads on your Batch Writers? What about the Batch
Writer latency, is it too low such that you are not filling the buffer?
> > 
> >          
> > 
> >         From: Massimilian Mattetti [mailto:MASSIMIL@il.ibm.com mailto:MASSIMIL@il.ibm.com
]
> >         Sent: Wednesday, July 05, 2017 8:37 AM
> >         To: user@accumulo.apache.org mailto:user@accumulo.apache.org
> >         Subject: maximize usage of cluster resources during ingestion
> > 
> >          
> > 
> >         Hi all,
> > 
> >         I have an Accumulo 1.8.1 cluster made by 12 bare metal servers. Each server
has 256GB of Ram and 2 x 10 cores CPU. 2 machines are used as masters (running HDFS NameNodes,
Accumulo Master and Monitor). The other 10 machines has 12 Disks of 1 TB (11 used by HDFS
DataNode process) and are running Accumulo TServer processes. All the machines are connected
via a 10Gb network and 3 of them are running ZooKeeper. I have run some heavy ingestion test
on this cluster but I have never been able to reach more than 20% CPU usage on each Tablet
Server. I am running an ingestion process (using batch writers) on each data node. The table
is pre-split in order to have 4 tablets per tablet server. Monitoring the network I have seen
that data is received/sent from each node with a peak rate of about 120MB/s / 100MB/s while
the aggregated disk write throughput on each tablet servers is around 120MB/s.
> > 
> >         The table configuration I am playing with are:
> >         "table.file.replication": "2",
> >         "table.compaction.minor.logs.threshold": "10",
> >         "table.durability": "flush",
> >         "table.file.max": "30",
> >         "table.compaction.major.ratio": "9",
> >         "table.split.threshold": "1G"
> > 
> >         while the tablet server configuration is:
> >         "tserver.wal.blocksize": "2G",
> >         "tserver.walog.max.size": "8G",
> >         "tserver.memory.maps.max": "32G",
> >         "tserver.compaction.minor.concurrent.max": "50",
> >         "tserver.compaction.major.concurrent.max": "8",
> >         "tserver.total.mutation.queue.max": "50M",
> >         "tserver.wal.replication": "2",
> >         "tserver.compaction.major.thread.files.open.max": "15"
> > 
> >         the tablet server heap has been set to 32GB
> > 
> >         From Monitor UI
> > 
> > 
> >         As you can see I have a lot of valleys in which the ingestion rate reaches
0.
> >         What would be a good procedure to identify the bottleneck which causes the
0 ingestion rate periods?
> >         Thanks.
> > 
> >         Best Regards,
> >         Max
> > 
> > 
> >     > 
 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message