accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: hdfs cpu usage
Date Sat, 07 Feb 2015 16:23:15 GMT
What version of Hadoop are you using?

Have you considered hooking up a profiler to the Datanode on GCE to see 
where the time is being spent? That might help shed some light on the 

Ara Ebrahimi wrote:
> Hi,
> We’re seeing some weird behavior from the hdfs daemon on google cloud environment when
we use accumulo Scanner to sequentially scan a table. Top reports 200-300% cpu usage for the
hdfs daemon. Accumulo is also around 500%. iostat %util is low. avgrq-sz is low, rMB/s is
low, there’s lots of free memory. It seems like something causes the hdfs daemon to consume
a lot of cpu and not to send enough read requests to the disk (ssd actually, so disk is super
fast and vastly under-utilized). The process which sends scan requests to accumulo is 500%
active (using 3 query batch threads and aggressive scan-batch-size/read-ahead-threashold values).
So it seems like somehow hdfs is the bottleneck. On another cluster we rarely see hdfs daemon
going over 10% cpu usage. Any idea what the issue could be?
> Thanks,
> Ara.
> ________________________________
> This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Thank you in advance for your cooperation.
> ________________________________

View raw message