kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: close Kudu client on timeout
Date Fri, 18 Jan 2019 02:17:30 GMT
I did not want to overload my question with details but since you asked :)
We use NiFi to consume data from 700+ topics. Each message is a json
object, produced by GoldenGate.

NiFi has ability to call a custom script, written in Groovy, and we use
that feature to parse json out, apply some logic like time zone conversion,
data type conversion, figure out operation type (insert, update, delete or
primary key update) and then apply operation to Kudu.

That script is a custom class which is initialized only once when you start
NiFi flow and then it has actual script executed repeatedly for each batch
of data

We thought about reusing Kudu client but because there is no init method or
anything like that, we need to create a client, open session, apply
operations and then close session and client. Even if we do all of that,
one batch can be processed under 400-500ms which is more than enough for us.

Back to your suggestion, since we do not have a lot of control over how
this script is executed, it is a bit tricky to reuse client instance. I
will look into this again though.

But if we re-use client and keep it open forever, is there a downside to
that? Like with relational databases, one would normally use connection
pool, that would create and dispose connections.

On Thu, Jan 17, 2019 at 7:23 PM Todd Lipcon <todd@cloudera.com> wrote:

> On Thu, Jan 17, 2019 at 1:46 PM Boris Tyukin <boris@boristyukin.com>
> wrote:
>> Hi Alexey,
>> it was "single idle Kudu Java client that created so many threads".
>> 20,000 threads in a few days to be precise :)  that code is running
>> non-stop and basically listens to kafka topics, then for every batch from
>> kafka, we create new kudu client instance, upsert data and close client.
>> the part we missed was *client.close()* in the end of that loop in the
>> code - once we put it in there, problem was solved.
>> So it is hard to tell if it was Java GC or something else.
>> But ideally, it would be nice, if Kudu server itself would kill idle
>> connections from clients on a timeout. I think Impala has similar global
>> setting.
>> --rpc_default_keepalive_time_ms  maybe it - I will look into this.
> I don't think that will help. The Kudu client is built around Netty, which
> is an async networking framework that decouples threads from connections.
> That is to say, regardless of the TCP connections, each Kudu client that
> you create will create N netty worker threads, even when it has no TCP
> connections open.
> I do think it would make sense to have some sort of LOG.warn() if the
> KuduClient detects that there are more than 10 live clients or something,
> so that would make this issue more obvious.
> As for your use case, creating a new client for each batch seems somewhat
> heavyweight. Why are you doing that vs just creating a new session?
> -Todd
>> On Thu, Jan 17, 2019 at 2:51 PM Alexey Serbin <aserbin@cloudera.com>
>> wrote:
>>> Hi Boris,
>>> Kudu servers have a setting for connection inactivity period: idle
>>> connections to the servers will be automatically closed after the specified
>>> time (--rpc_default_keepalive_time_ms is the flag).  So, from that
>>> perspective idle clients is not a big concern to the Kudu server side.
>>> As for your question, right now Kudu doesn't have a way to initiate a
>>> shutdown of an idle client from the server side.
>>> BTW, I'm curious what it was in your case you reported: were there too
>>> many idle Kudu client objects around created by the same application?  Or
>>> that was something else, like a single idle Kudu Java client that created
>>> so many threads?
>>> Thanks,
>>> Alexey
>>> On Wed, Jan 16, 2019 at 1:31 PM Boris Tyukin <boris@boristyukin.com>
>>> wrote:
>>>> sorry it is Java
>>>> On Wed, Jan 16, 2019 at 3:32 PM Mike Percy <mpercy@apache.org> wrote:
>>>>> Java or C++ / Python client?
>>>>> Mike
>>>>> Sent from my iPhone
>>>>> > On Jan 16, 2019, at 12:27 PM, Boris Tyukin <boris@boristyukin.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi guys,
>>>>> >
>>>>> > is there a setting on Kudu server to close/clean-up inactive Kudu
>>>>> clients?
>>>>> >
>>>>> > we just found some rogue code that did not close client on code
>>>>> completion and wondering if we can prevent this in future on Kudu server
>>>>> level rather than relying on good developers.
>>>>> >
>>>>> > That code caused 22,000 threads opened on our edge node over the
>>>>> last few days.
>>>>> >
>>>>> > Boris
> --
> Todd Lipcon
> Software Engineer, Cloudera

View raw message