accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: AccumuloFileOutputFormat tuning
Date Sat, 03 Jan 2015 18:26:38 GMT
Could also use JVisualVM which is capable of giving some better reports 
on benchmarks compared to manually inspecting jstacks.

Keith Turner wrote:
> You can try sampling using jstack as a simple and quick way to profile.
> Jstack a process writing rfiles ~10 times, with some pause he tween.
> Then look at a particular thread writing data across the jstack saves,
> do you see the same code being executed in multiple jstacks?  If so what
> code is that?
>
> Sent from phone. Please excuse typos and brevity.
>
> On Jan 3, 2015 12:46 AM, "Ara Ebrahimi" <ara.ebrahimi@argyledata.com
> <mailto:ara.ebrahimi@argyledata.com>> wrote:
>
>     Hi,
>
>     I’m trying to optimize our map/reduce job which generates RFiles
>     using AccumuloFileOutputFormat. We have a specific time window and
>     within that time window we need to generate a predefined amount of
>     simulation data and in terms of number of core we also have an upper
>     bound we can use. Disks are also fixed at 4 per node and they are
>     all SSDs. So I can’t employ more machines or more disks or cores to
>     achieve higher write/s numbers.
>
>     So far we’ve managed to utilize 100% of all available cores and the
>     SSD disks are also highly utilized. I’m trying to reduce processing
>     time and we are willing to waste more disk space to achieve higher
>     data generation speed. The data itself is 10s of columns of floating
>     numbers, all serialized to fixed 9-byte values which doesn’t lend
>     well to compression. With no compression and replication set to 1 we
>     can generate the same amount of data in almost half the time. With
>     snappy it’s almost 10% more data generation time compared to no
>     compression and almost twice more size on disk for the all the
>     generated RFiles.
>
>     dataBlockSize doesn’t seem to change anything for non-compressed
>     data. indexBlockSize also didn't change anything (tried 64K vs the
>     default 128K).
>
>     Any other tricks I could employ to achieve higher write/s numbers?
>
>     Ara.
>
>
>
>     ________________________________
>
>     This message is for the designated recipient only and may contain
>     privileged, proprietary, or otherwise confidential information. If
>     you have received it in error, please notify the sender immediately
>     and delete the original. Any other use of the e-mail by you is
>     prohibited. Thank you in advance for your cooperation.
>
>     ________________________________
>

Mime
View raw message