accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: AccumuloFileOutputFormat tuning
Date Sun, 04 Jan 2015 03:04:07 GMT
Yeah, it does buffer the blocks in memory until it exceeds the configured
size.

It will be interesting to see where Accumulo is spending its time.
Hopefully you can gather that info in a less convoluted way than initially
described.  After reading Josh's email and I was amused by original email.

Sent from phone. Please excuse typos and brevity.
On Jan 3, 2015 5:11 PM, "Ara Ebrahimi" <ara.ebrahimi@argyledata.com> wrote:

>  Thanks. I’ll try profiling the jvms. Since my mapper logic is very simple
> I suspect it only consumes a tiny amount of resources. How
> does AccumuloFileOutputFormat create RFile blocks? I assume it holds the
> whole thing in memory and creates the index and so on, right? Is there a
> way to improve that?
>
>  Ara.
>
>  On Jan 3, 2015, at 10:26 AM, Josh Elser <josh.elser@gmail.com> wrote:
>
> Could also use JVisualVM which is capable of giving some better reports
> on benchmarks compared to manually inspecting jstacks.
>
> Keith Turner wrote:
>
> You can try sampling using jstack as a simple and quick way to profile.
> Jstack a process writing rfiles ~10 times, with some pause he tween.
> Then look at a particular thread writing data across the jstack saves,
> do you see the same code being executed in multiple jstacks?  If so what
> code is that?
>
> Sent from phone. Please excuse typos and brevity.
>
> On Jan 3, 2015 12:46 AM, "Ara Ebrahimi" <ara.ebrahimi@argyledata.com
> <mailto:ara.ebrahimi@argyledata.com <ara.ebrahimi@argyledata.com>>> wrote:
>
>    Hi,
>
>    I’m trying to optimize our map/reduce job which generates RFiles
>    using AccumuloFileOutputFormat. We have a specific time window and
>    within that time window we need to generate a predefined amount of
>    simulation data and in terms of number of core we also have an upper
>    bound we can use. Disks are also fixed at 4 per node and they are
>    all SSDs. So I can’t employ more machines or more disks or cores to
>    achieve higher write/s numbers.
>
>    So far we’ve managed to utilize 100% of all available cores and the
>    SSD disks are also highly utilized. I’m trying to reduce processing
>    time and we are willing to waste more disk space to achieve higher
>    data generation speed. The data itself is 10s of columns of floating
>    numbers, all serialized to fixed 9-byte values which doesn’t lend
>    well to compression. With no compression and replication set to 1 we
>    can generate the same amount of data in almost half the time. With
>    snappy it’s almost 10% more data generation time compared to no
>    compression and almost twice more size on disk for the all the
>    generated RFiles.
>
>    dataBlockSize doesn’t seem to change anything for non-compressed
>    data. indexBlockSize also didn't change anything (tried 64K vs the
>    default 128K).
>
>    Any other tricks I could employ to achieve higher write/s numbers?
>
>    Ara.
>
>
>
>    ________________________________
>
>    This message is for the designated recipient only and may contain
>    privileged, proprietary, or otherwise confidential information. If
>    you have received it in error, please notify the sender immediately
>    and delete the original. Any other use of the e-mail by you is
>    prohibited. Thank you in advance for your cooperation.
>
>    ________________________________
>
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Thank you in
> advance for your cooperation.
>
> ________________________________
>
>
>
>
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Thank you in
> advance for your cooperation.
> ------------------------------
>

Mime
View raw message