accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wall <mjw...@gmail.com>
Subject Re: Accumulo performance on various hardware configurations
Date Wed, 29 Aug 2018 19:20:07 GMT
Couple of things to look at/try

1 - Is the data spread out amongst all the tablets and tservers when you
have multiple tservers?
2 - How much of the data is in memory on the tablet server and how much is
on disk.  You can try flushing the table before running your scan.
3 - You could also launch compaction before running your scan to minimize
the number of rfiles per tablet

Mike

On Wed, Aug 29, 2018 at 3:12 PM guy sharon <guy.sharon.1977@gmail.com>
wrote:

> hi Mike,
>
> As per Mike Miller's suggestion I started using
> org.apache.accumulo.examples.simple.helloworld.ReadData from Accumulo with
> debugging turned off and a BatchScanner with 10 threads. I redid all the
> measurements and although this was 20% faster than using the shell there
> was no difference once I started playing with the hardware configurations.
>
> Guy.
>
> On Wed, Aug 29, 2018 at 10:06 PM Michael Wall <mjwall@gmail.com> wrote:
>
>> Guy,
>>
>> Can you go into specifics about how you are measuring this?  Are you
>> still using "bin/accumulo shell -u root -p secret -e "scan -t hellotable
>> -np" | wc -l" as you mentioned earlier in the thread?  As Mike Miller
>> suggested, serializing that back to the display and then counting 6M
>> entries is going to take some time.  Try using a Batch Scanner directly.
>>
>> Mike
>>
>> On Wed, Aug 29, 2018 at 2:56 PM guy sharon <guy.sharon.1977@gmail.com>
>> wrote:
>>
>>> Yes, I tried the high performance configuration which translates to 4G
>>> heap size, but that didn't affect performance. Neither did setting
>>> table.scan.max.memory to 4096k (default is 512k). Even if I accept that the
>>> read performance here is reasonable I don't understand why none of the
>>> hardware configuration changes (except going to 48 cores, which made things
>>> worse) made any difference.
>>>
>>> On Wed, Aug 29, 2018 at 8:33 PM Mike Walch <mwalch@apache.org> wrote:
>>>
>>>> Muchos does not automatically change its Accumulo configuration to take
>>>> advantage of better hardware. However, it does have a performance profile
>>>> setting in its configuration (see link below) where you can select a
>>>> profile (or create your own) based on your the hardware you are using.
>>>>
>>>>
>>>> https://github.com/apache/fluo-muchos/blob/master/conf/muchos.props.example#L94
>>>>
>>>> On Wed, Aug 29, 2018 at 11:35 AM Josh Elser <elserj@apache.org> wrote:
>>>>
>>>>> Does Muchos actually change the Accumulo configuration when you are
>>>>> changing the underlying hardware?
>>>>>
>>>>> On 8/29/18 8:04 AM, guy sharon wrote:
>>>>> > hi,
>>>>> >
>>>>> > Continuing my performance benchmarks, I'm still trying to figure
out
>>>>> if
>>>>> > the results I'm getting are reasonable and why throwing more
>>>>> hardware at
>>>>> > the problem doesn't help. What I'm doing is a full table scan on
a
>>>>> table
>>>>> > with 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and
>>>>> Hadoop
>>>>> > 2.8.4. The table is populated by
>>>>> > org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter
>>>>> > modified to write 6M entries instead of 50k. Reads are performed
by
>>>>> > "bin/accumulo
>>>>> org.apache.accumulo.examples.simple.helloworld.ReadData -i
>>>>> > muchos -z localhost:2181 -u root -t hellotable -p secret". Here
are
>>>>> the
>>>>> > results I got:
>>>>> >
>>>>> > 1. 5 tserver cluster as configured by Muchos
>>>>> > (https://github.com/apache/fluo-muchos), running on m5d.large AWS
>>>>> > machines (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate
>>>>> > server. Scan took 12 seconds.
>>>>> > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
>>>>> > 3. Splitting the table to 4 tablets causes the runtime to increase
>>>>> to 16
>>>>> > seconds.
>>>>> > 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
>>>>> > 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM),
>>>>> running
>>>>> > Amazon Linux. Configuration as provided by Uno
>>>>> > (https://github.com/apache/fluo-uno). Total time was 26 seconds.
>>>>> >
>>>>> > Offhand I would say this is very slow. I'm guessing I'm making some
>>>>> sort
>>>>> > of newbie (possibly configuration) mistake but I can't figure out
>>>>> what
>>>>> > it is. Can anyone point me to something that might help me find
out
>>>>> what
>>>>> > it is?
>>>>> >
>>>>> > thanks,
>>>>> > Guy.
>>>>> >
>>>>> >
>>>>>
>>>>

Mime
View raw message