accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Reichman <>
Subject increase "running scans" in monitor?
Date Tue, 02 Apr 2013 14:56:00 GMT

I am running a accumulo-based MR job using the AccumuloRowInputFormat on
1.4.1. Config is more-or-less default, using the native-standalone 3GB
template, but with the TServer memory put up to 2GB in from
its default. accumulo-site.xml has tserver.memory.maps.max at 1G, at 50M, and tserver.cache.index.size at 512M.

My tables are created with maxversions for all three types (scan, minc,
majc) at 1 and compress type as gz.

I am finding, on an 8 node test cluster with 64 map task slots, that when a
job is running, the 'Running Scans' count in the monitor is roughly 0-4 on
average for each tablet server. When viewed at the table view, this puts
the running scans anywhere from 4-24 on average. I would expect/hope the
scans to be somewhere close to the map task count. To me, this means one of
the following.
1. There is a configuration setting inhibiting the amount of scans from
accumulating (excuse the pun) to about the same amount as my map tasks
2. My map task job is cpu-intensive enough to introduce delays between
scans and everything is fine
3. Some combination of 1/2.

On an alternate cluster, 40 nodes with 320 task slots, we haven't seen
anywhere near full capacity scanning with map tasks which have the same
performance, and the problem seems much worse.

I am experimenting with some of the readahead configuration variables for
the tablet servers in the meantime, but haven't found any smoking guns yet.

Thank you,


View raw message