kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Bad insert performance of java kudu-client
Date Wed, 26 Apr 2017 21:11:15 GMT
Thanks a lot for submitting the patch, Pavel. I'm traveling at the moment
but will try to take a look in the next day or two if no one else gets
there first.

Would also be great if you have any notes to share about how you built the
Java client on Windows. We could add them to the developer docs so the next
person can save some time.

-Todd

On Wed, Apr 26, 2017 at 9:31 AM, Pavel Martynov <mr.xkurt@gmail.com> wrote:

> Yes, I submit patch https://gerrit.cloudera.org/#/c/6735/. Worked on it
> all my day, you know, make it compile on Windows not so easy... :)
>
> 2017-04-25 22:57 GMT+03:00 Todd Lipcon <todd@cloudera.com>:
>
>> Hi Pavel,
>>
>> That's a good find. It certainly does look like we could do caching of
>> this data. We use the local network interface address list to determine
>> whether a remote server is local or not.
>>
>> In fact in many cases we are calling this we don't even care about the
>> result - it's just computed as a side effect of creating the 'ServerInfo'
>> object.
>>
>> I filed KUDU-1982 to track this issue.
>>
>> Any interest in working on a fix?
>>
>> -Todd
>>
>>
>> On Tue, Apr 25, 2017 at 5:10 AM, Pavel Martynov <mr.xkurt@gmail.com>
>> wrote:
>>
>>> I reproduce this problem with java.net.NetworkInterface.getByInetAddress
>>> and Windows on a few other machines. Also found this 'not an issue'
>>> http://bugs.java.com/view_bug.do?bug_id=7039343.
>>> Maybe kudu-client will use some memoization for this function?
>>>
>>> 2017-04-25 13:09 GMT+03:00 Pavel Martynov <mr.xkurt@gmail.com>:
>>>
>>>> I figure out that problem was that I run this program on my development
>>>> Windows machine. It seems that there is some performance issue with
>>>> java.net.NetworkInterface.getByInetAddress on Windows (I found only
>>>> that http://stackoverflow.com/questions/35541870/java-networ
>>>> kinterface-getbyinetaddress-takes-way-too-long confirmation so far).
>>>> See profiler screenshot http://pasteboard.co/8uHil3I5H.png
>>>> (kudu-client v1.3.1), every call take 53 ms (!) on average.
>>>> Also, could you recheck logic, why this function recalls 88 times in 12
>>>> seconds for that small program?
>>>>
>>>> 2017-04-24 22:29 GMT+03:00 Todd Lipcon <todd@cloudera.com>:
>>>>
>>>>> I tried to reproduce this locally using your code and couldn't. I get
>>>>> around 100K inserts/second for 1.0, 1.1, 1.2, and 1.3 clients (against
a
>>>>> 1.4-SNAPSHOT cluster)
>>>>>
>>>>> Is it always reproducible for you? eg if you switch back to the
>>>>> earlier client and try another set of runs, do you get the same results?
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Mon, Apr 24, 2017 at 10:56 AM, Todd Lipcon <todd@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> I vaguely recall some bug in earlier versions of the Java client
>>>>>> where 'shutdown' wouldn't properly block on the data being flushed.
So it's
>>>>>> possible in 1.0.x and below, you're not actually measuring the full
amount
>>>>>> of time to write all the data, whereas when the bug is fixed, you
are.
>>>>>>
>>>>>> I'll see if I can repro this locally as well using your code.
>>>>>>
>>>>>> -Todd
>>>>>>
>>>>>> On Mon, Apr 24, 2017 at 10:49 AM, David Alves <davidralves@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Pavel
>>>>>>>
>>>>>>>   Interesting, Thanks for sharing those numbers.
>>>>>>>   I assume you weren't using AUTOFLUSH_BACKGROUND for the first
>>>>>>> versions you tested (don't think it was available then iirc).
>>>>>>>   Could you try without in the last version and see how the numbers
>>>>>>> compare?
>>>>>>>   We'd be happy to help track down the reason for this perf
>>>>>>> regression.
>>>>>>>
>>>>>>> Best
>>>>>>> David
>>>>>>>
>>>>>>> On Mon, Apr 24, 2017 at 4:58 AM, Pavel Martynov <mr.xkurt@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi, I ran into the fact that I can not achieve high insertion
speed
>>>>>>>> and I start to experiment with https://github.com/cloude
>>>>>>>> ra/kudu-examples/tree/master/java/insert-loadgen.
>>>>>>>> My slightly modified code (recreation of table on startup
+
>>>>>>>> duration measuring): https://gist.githu
>>>>>>>> b.com/xkrt/9405a2eeb98a56288b7c5a7d817097b4.
>>>>>>>> On every run I change kudu-client version, results:
>>>>>>>>
>>>>>>>> kudu-client-ver  perf
>>>>>>>> 0.10             Duration: 626 ms, 79872/sec
>>>>>>>> 1.0.0            Duration: 622 ms, 80385 inserts/sec
>>>>>>>> 1.0.1            Duration: 630 ms, 79365 inserts/sec
>>>>>>>> 1.1.0            Duration: 11703 ms, 4272 inserts/sec
>>>>>>>> 1.3.1            Duration: 12317 ms, 4059 inserts/sec
>>>>>>>>
>>>>>>>> As can you see there was a great degradation between 1.0.1
and
>>>>>>>> 1.1.0 (about a ~20 times!).
>>>>>>>> What could be a problem, how can I fix it? (actually I interested
>>>>>>>> in kudu-spark, so probably using of kudu-client 1.0.1 is
not right
>>>>>>>> solution?).
>>>>>>>>
>>>>>>>> My test cluster: 3 hosts with master and tserver on each
(3 masters
>>>>>>>> and 3 tservers overall).
>>>>>>>> No extra settings, flags used:
>>>>>>>> fs_wal_dir
>>>>>>>> fs_data_dirs
>>>>>>>> master_addresses
>>>>>>>> tserver_master_addrs
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> with best regards, Pavel Martynov
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> with best regards, Pavel Martynov
>>>>
>>>
>>>
>>>
>>> --
>>> with best regards, Pavel Martynov
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> with best regards, Pavel Martynov
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message