Hi Josef, 

we looked at it but we had a bunch of specific things we wanted to do so a custom processor was easier for us. We might convert Groovy script at some point to the true custom Java processor, which does have ability to share clients or shut them down when a processor stopped from UI, but Groovy worked really well for prototyping and performance was really good as well.

Thanks

On Fri, Jan 18, 2019 at 2:40 AM <Josef.Zahner1@swisscom.com> wrote:

Hi Boris

 

I guess you have a good reason why you don’t use the NiFi PutKudu or eg. ExecuteSQL (combined with Impala) processors? We started as well with our custom kudu client implementation in NiFi, but at the end we switched over to the existing processors as it was much easier to handle…

 

Cheers Josef

 

 

 

From: Boris Tyukin <boris@boristyukin.com>
Reply-To: "user@kudu.apache.org" <user@kudu.apache.org>
Date: Friday, 18 January 2019 at 03:19
To: "user@kudu.apache.org" <user@kudu.apache.org>
Subject: Re: close Kudu client on timeout

 

I did not want to overload my question with details but since you asked :) We use NiFi to consume data from 700+ topics. Each message is a json object, produced by GoldenGate. 

 

NiFi has ability to call a custom script, written in Groovy, and we use that feature to parse json out, apply some logic like time zone conversion, data type conversion, figure out operation type (insert, update, delete or primary key update) and then apply operation to Kudu.

 

That script is a custom class which is initialized only once when you start NiFi flow and then it has actual script executed repeatedly for each batch of data

 

We thought about reusing Kudu client but because there is no init method or anything like that, we need to create a client, open session, apply operations and then close session and client. Even if we do all of that, one batch can be processed under 400-500ms which is more than enough for us.

 

Back to your suggestion, since we do not have a lot of control over how this script is executed, it is a bit tricky to reuse client instance. I will look into this again though. 

 

But if we re-use client and keep it open forever, is there a downside to that? Like with relational databases, one would normally use connection pool, that would create and dispose connections.

 

 

On Thu, Jan 17, 2019 at 7:23 PM Todd Lipcon <todd@cloudera.com> wrote:

On Thu, Jan 17, 2019 at 1:46 PM Boris Tyukin <boris@boristyukin.com> wrote:

Hi Alexey,

 

it was "single idle Kudu Java client that created so many threads". 20,000 threads in a few days to be precise :)  that code is running non-stop and basically listens to kafka topics, then for every batch from kafka, we create new kudu client instance, upsert data and close client.

 

the part we missed was client.close() in the end of that loop in the code - once we put it in there, problem was solved. 

 

So it is hard to tell if it was Java GC or something else. 

 

But ideally, it would be nice, if Kudu server itself would kill idle connections from clients on a timeout. I think Impala has similar global setting.

 

--rpc_default_keepalive_time_ms  maybe it - I will look into this.

 

I don't think that will help. The Kudu client is built around Netty, which is an async networking framework that decouples threads from connections. That is to say, regardless of the TCP connections, each Kudu client that you create will create N netty worker threads, even when it has no TCP connections open.

 

I do think it would make sense to have some sort of LOG.warn() if the KuduClient detects that there are more than 10 live clients or something, so that would make this issue more obvious.

 

As for your use case, creating a new client for each batch seems somewhat heavyweight. Why are you doing that vs just creating a new session?

 

-Todd

 

On Thu, Jan 17, 2019 at 2:51 PM Alexey Serbin <aserbin@cloudera.com> wrote:

Hi Boris,

 

Kudu servers have a setting for connection inactivity period: idle connections to the servers will be automatically closed after the specified time (--rpc_default_keepalive_time_ms is the flag).  So, from that perspective idle clients is not a big concern to the Kudu server side.

 

As for your question, right now Kudu doesn't have a way to initiate a shutdown of an idle client from the server side.

 

BTW, I'm curious what it was in your case you reported: were there too many idle Kudu client objects around created by the same application?  Or that was something else, like a single idle Kudu Java client that created so many threads?

 

 

Thanks,

 

Alexey

 

On Wed, Jan 16, 2019 at 1:31 PM Boris Tyukin <boris@boristyukin.com> wrote:

sorry it is Java

 

On Wed, Jan 16, 2019 at 3:32 PM Mike Percy <mpercy@apache.org> wrote:

Java or C++ / Python client?

Mike

Sent from my iPhone

> On Jan 16, 2019, at 12:27 PM, Boris Tyukin <boris@boristyukin.com> wrote:
>
> Hi guys,
>
> is there a setting on Kudu server to close/clean-up inactive Kudu clients?
>
> we just found some rogue code that did not close client on code completion and wondering if we can prevent this in future on Kudu server level rather than relying on good developers.
>
> That code caused 22,000 threads opened on our edge node over the last few days.
>
> Boris


 

--

Todd Lipcon
Software Engineer, Cloudera