kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Martynov <mr.xk...@gmail.com>
Subject Re: Bad insert performance of java kudu-client
Date Wed, 26 Apr 2017 16:31:44 GMT
Yes, I submit patch https://gerrit.cloudera.org/#/c/6735/. Worked on it all
my day, you know, make it compile on Windows not so easy... :)

2017-04-25 22:57 GMT+03:00 Todd Lipcon <todd@cloudera.com>:

> Hi Pavel,
>
> That's a good find. It certainly does look like we could do caching of
> this data. We use the local network interface address list to determine
> whether a remote server is local or not.
>
> In fact in many cases we are calling this we don't even care about the
> result - it's just computed as a side effect of creating the 'ServerInfo'
> object.
>
> I filed KUDU-1982 to track this issue.
>
> Any interest in working on a fix?
>
> -Todd
>
>
> On Tue, Apr 25, 2017 at 5:10 AM, Pavel Martynov <mr.xkurt@gmail.com>
> wrote:
>
>> I reproduce this problem with java.net.NetworkInterface.getByInetAddress
>> and Windows on a few other machines. Also found this 'not an issue'
>> http://bugs.java.com/view_bug.do?bug_id=7039343.
>> Maybe kudu-client will use some memoization for this function?
>>
>> 2017-04-25 13:09 GMT+03:00 Pavel Martynov <mr.xkurt@gmail.com>:
>>
>>> I figure out that problem was that I run this program on my development
>>> Windows machine. It seems that there is some performance issue with
>>> java.net.NetworkInterface.getByInetAddress on Windows (I found only
>>> that http://stackoverflow.com/questions/35541870/java-networ
>>> kinterface-getbyinetaddress-takes-way-too-long confirmation so far).
>>> See profiler screenshot http://pasteboard.co/8uHil3I5H.png (kudu-client
>>> v1.3.1), every call take 53 ms (!) on average.
>>> Also, could you recheck logic, why this function recalls 88 times in 12
>>> seconds for that small program?
>>>
>>> 2017-04-24 22:29 GMT+03:00 Todd Lipcon <todd@cloudera.com>:
>>>
>>>> I tried to reproduce this locally using your code and couldn't. I get
>>>> around 100K inserts/second for 1.0, 1.1, 1.2, and 1.3 clients (against a
>>>> 1.4-SNAPSHOT cluster)
>>>>
>>>> Is it always reproducible for you? eg if you switch back to the earlier
>>>> client and try another set of runs, do you get the same results?
>>>>
>>>> -Todd
>>>>
>>>> On Mon, Apr 24, 2017 at 10:56 AM, Todd Lipcon <todd@cloudera.com>
>>>> wrote:
>>>>
>>>>> I vaguely recall some bug in earlier versions of the Java client where
>>>>> 'shutdown' wouldn't properly block on the data being flushed. So it's
>>>>> possible in 1.0.x and below, you're not actually measuring the full amount
>>>>> of time to write all the data, whereas when the bug is fixed, you are.
>>>>>
>>>>> I'll see if I can repro this locally as well using your code.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Mon, Apr 24, 2017 at 10:49 AM, David Alves <davidralves@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Pavel
>>>>>>
>>>>>>   Interesting, Thanks for sharing those numbers.
>>>>>>   I assume you weren't using AUTOFLUSH_BACKGROUND for the first
>>>>>> versions you tested (don't think it was available then iirc).
>>>>>>   Could you try without in the last version and see how the numbers
>>>>>> compare?
>>>>>>   We'd be happy to help track down the reason for this perf
>>>>>> regression.
>>>>>>
>>>>>> Best
>>>>>> David
>>>>>>
>>>>>> On Mon, Apr 24, 2017 at 4:58 AM, Pavel Martynov <mr.xkurt@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, I ran into the fact that I can not achieve high insertion
speed
>>>>>>> and I start to experiment with https://github.com/cloude
>>>>>>> ra/kudu-examples/tree/master/java/insert-loadgen.
>>>>>>> My slightly modified code (recreation of table on startup + duration
>>>>>>> measuring): https://gist.github.com/xkrt/9405a2eeb98a56288b7
>>>>>>> c5a7d817097b4.
>>>>>>> On every run I change kudu-client version, results:
>>>>>>>
>>>>>>> kudu-client-ver  perf
>>>>>>> 0.10             Duration: 626 ms, 79872/sec
>>>>>>> 1.0.0            Duration: 622 ms, 80385 inserts/sec
>>>>>>> 1.0.1            Duration: 630 ms, 79365 inserts/sec
>>>>>>> 1.1.0            Duration: 11703 ms, 4272 inserts/sec
>>>>>>> 1.3.1            Duration: 12317 ms, 4059 inserts/sec
>>>>>>>
>>>>>>> As can you see there was a great degradation between 1.0.1 and
1.1.0
>>>>>>> (about a ~20 times!).
>>>>>>> What could be a problem, how can I fix it? (actually I interested
in
>>>>>>> kudu-spark, so probably using of kudu-client 1.0.1 is not right
solution?).
>>>>>>>
>>>>>>> My test cluster: 3 hosts with master and tserver on each (3 masters
>>>>>>> and 3 tservers overall).
>>>>>>> No extra settings, flags used:
>>>>>>> fs_wal_dir
>>>>>>> fs_data_dirs
>>>>>>> master_addresses
>>>>>>> tserver_master_addrs
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> with best regards, Pavel Martynov
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>>
>>> --
>>> with best regards, Pavel Martynov
>>>
>>
>>
>>
>> --
>> with best regards, Pavel Martynov
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
with best regards, Pavel Martynov

Mime
View raw message