accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: maximize usage of cluster resources during ingestion
Date Wed, 05 Jul 2017 19:32:38 GMT
Huge GC pauses can be mitigated by ensuring you're using the Accumulo
native maps library.

On Wed, Jul 5, 2017 at 11:05 AM Cyrille Savelief <csavelief@gmail.com>
wrote:

> Hi Massimilian*,*
>
> Using a MultiTableBatchWriter we are able to ingest about 600K entries/s
> on a single node (30Gb of memory, 8 vCPU) running Hadoop, Zookeeper,
> Accumulo and our ingest process. For us, "valleys" came from huge GC pauses.
>
> Best,
>
> Cyrille
>
> Le mer. 5 juil. 2017 à 14:37, Massimilian Mattetti <MASSIMIL@il.ibm.com>
> a écrit :
>
>> Hi all,
>>
>> I have an Accumulo 1.8.1 cluster made by 12 bare metal servers. Each
>> server has 256GB of Ram and 2 x 10 cores CPU. 2 machines are used as
>> masters (running HDFS NameNodes, Accumulo Master and Monitor). The other 10
>> machines has 12 Disks of 1 TB (11 used by HDFS DataNode process) and are
>> running Accumulo TServer processes. All the machines are connected via a
>> 10Gb network and 3 of them are running ZooKeeper. I have run some heavy
>> ingestion test on this cluster but I have never been able to reach more
>> than *20% *CPU usage on each Tablet Server. I am running an ingestion
>> process (using batch writers) on each data node. The table is pre-split in
>> order to have 4 tablets per tablet server. Monitoring the network I have
>> seen that data is received/sent from each node with a peak rate of about
>> 120MB/s / 100MB/s while the aggregated disk write throughput on each tablet
>> servers is around 120MB/s.
>>
>> The table configuration I am playing with are:
>> "table.file.replication": "2",
>> "table.compaction.minor.logs.threshold": "10",
>> "table.durability": "flush",
>> "table.file.max": "30",
>> "table.compaction.major.ratio": "9",
>> "table.split.threshold": "1G"
>>
>> while the tablet server configuration is:
>> "tserver.wal.blocksize": "2G",
>> "tserver.walog.max.size": "8G",
>> "tserver.memory.maps.max": "32G",
>> "tserver.compaction.minor.concurrent.max": "50",
>> "tserver.compaction.major.concurrent.max": "8",
>> "tserver.total.mutation.queue.max": "50M",
>> "tserver.wal.replication": "2",
>> "tserver.compaction.major.thread.files.open.max": "15"
>>
>> the tablet server heap has been set to 32GB
>>
>> From Monitor UI
>>
>>
>> As you can see I have a lot of valleys in which the ingestion rate
>> reaches 0.
>> What would be a good procedure to identify the bottleneck which causes
>> the 0 ingestion rate periods?
>> Thanks.
>>
>> Best Regards,
>> Max
>>
>>

Mime
View raw message