nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: [DISCUSS] On HBase client retries (NIFI-6197)
Date Wed, 19 Feb 2020 01:31:57 GMT
We could certainly implement some kind of "sanity check" via HBase code, 
but I think the thing that is missing is the logical way to validate 
this in NiFi itself.

Something like...

```
Configuration conf = HBaseConfiguration.create();
conf.setInt("hbase.rpc.timeout", 5000);
try (Connection conn = ConnectionFactory.create(conf)) {
   // do sanity check
}
Configuration conf2 = new Configuration(conf);
conf2.setInt("hbase.rpc.timeout", 25000);
try (Connection conn = ConnectionFactory.create(conf2)) {
   // do real hbase-y stuff
}
```

Maybe instead of requiring an implicit way to do this (requiring NiFi 
code changes), we could solve the problem at the "human level": create 
docs that walk people through how to push a dummy record through the 
service/processor with the low configuration of timeouts and retries? 
Make the "sanity check" a human operation and just expose the ability to 
set timeout/retries via the controller service?

On 2/18/20 4:36 PM, Lars Francke wrote:
> Hi,
> 
> Josh, thanks for bringing it up here again.
> I'm happy to revive the PR with whatever the outcome of this thread is.
> It came up today because another client complained about how "unstable"
> HBase is on NiFi.
> 
> @Josh: As the whole issue is only the initial connect can we have a
> different timeout setting there? I have to admit I don't know.
> 
> Cheers,
> Lars
> 
> On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <pierre.villard.fr@gmail.com>
> wrote:
> 
>> Good point, I don't think we can do that on a controller service.
>>
>> Le mar. 18 févr. 2020 à 11:06, Bryan Bende <bbende@gmail.com> a écrit :
>>
>>> That could make it a little better, but I can't remember, can we
>>> terminate on a controller service?
>>>
>>> The issue here would be on first time enabling the the HBase client
>>> service, so before even getting to a processor.
>>>
>>> On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
>>> <pierre.villard.fr@gmail.com> wrote:
>>>>
>>>> Bryan,
>>>>
>>>> I didn't follow the whole discussion so I apologize if I'm saying
>>> something
>>>> stupid here. Now that we have the possibility to terminate threads in a
>>>> processor, would that solve the issue?
>>>>
>>>> Pierre
>>>>
>>>> Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bbende@gmail.com> a écrit
>> :
>>>>
>>>>> Hi Josh,
>>>>>
>>>>> The problem isn't so much about the retries within the flow, its more
>>>>> about setting up the service for the first time.
>>>>>
>>>>> A common scenario for users was the following:
>>>>>
>>>>> - Create a new HBase client service
>>>>> - Enter some config that wasn't quite correct, possibly hostnames
>> that
>>>>> weren't reachable from nifi as one example
>>>>> - Enable service and enter retry loop
>>>>> - Attempt to disable service to fix config, but have to wait 5+ mins
>>>>> for the retries to finish
>>>>>
>>>>> Maybe a lazy initialization of the connection on our side would help
>>>>> here, although it would just be moving the problem until later (i.e.
>>>>> service immediately enables because nothing is happening, then they
>>>>> find out about config problems later when a flow file hits an hbase
>>>>> processor).
>>>>>
>>>>> I guess the ideal scenario would be to have different logic for
>>>>> initializing the connection vs. using it, so that there wouldn't be
>>>>> retries during initialization.
>>>>>
>>>>> -Bryan
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <josh.elser@gmail.com>
>>> wrote:
>>>>>>
>>>>>> Hiya!
>>>>>>
>>>>>> LarsF brought this up in the apache-hbase slack account and it
>>> caught my
>>>>>> eye. Sending a note here since the PR is closed where this was
>> being
>>>>>> discussed before[1].
>>>>>>
>>>>>> I understand Bryan's concerns that misconfiguration of an HBase
>>>>>> processor with a high number of retries and back-off can create a
>>>>>> situation in which the processing of a single FlowFile will take
a
>>> very
>>>>>> long time to hit the onFailure state.
>>>>>>
>>>>>> However, as an HBase developer, I can confidently state that
>>>>>> hbase.client.retries=1 will create scenarios in which you'll be
>>> pushing
>>>>>> a FlowFile through a retry loop inside of NiFi for things which
>>> should
>>>>>> be implicitly retried inside of the HBase client.
>>>>>>
>>>>>> For example, if a Region is being moved between two RegionServers
>>> and an
>>>>>> HBase processor is trying to read/write to that Region, the client
>>> will
>>>>>> see an exception. This is a "retriable" exception in HBase-parlance
>>>>>> which means that HBase client code would automatically re-process
>>> that
>>>>>> request (looking for the new location of that Region first). In
>> most
>>>>>> cases, the subsequent RPC would succeed and the caller is
>>> non-the-wiser
>>>>>> and the whole retry logic took 1's of milliseconds.
>>>>>>
>>>>>> My first idea was also what Lars' had suggested -- can we come up
>>> with a
>>>>>> sanity check to validate "correct" configuration for the processor
>>>>>> before we throw the waterfall of data at it? I can respect if
>>> processors
>>>>>> don't have a "good" hook to do such a check.
>>>>>>
>>>>>> What _would_ be the ideal semantics from NiFi's? perspective? We
>> have
>>>>>> the ability to implicitly retry operations and also control the
>> retry
>>>>>> backoff values. Is there something more we could do from the HBase
>>> side,
>>>>>> given what y'all have seen from the battlefield?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> - Josh
>>>>>>
>>>>>> [1] https://github.com/apache/nifi/pull/3425
>>>>>
>>>
>>
> 

Mime
View raw message