accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Nolet <cjno...@gmail.com>
Subject Re: Tablet server thrift issue
Date Fri, 22 Aug 2014 21:44:29 GMT
Thanks Josh,

I understand about the session ID completely but the problem I have is that
the exact same client code worked, line for line, just fine in 1.4.4 and
it's acting up in 1.6.0. I also seem to remember the BatchWriter
automatically creating a new session when one expired without an exception
causing it to fail on the client.

I know we've made changes since 1.4.4 but I'd like to troubleshoot the
actual issue of the BatchWriter failing due to the thrift exception rather
than just catching the exception and trying mutations again. The other
issue is that I've already submitted a bunch of mutations to the batch
writer from different threads. Does that mean I need to be storing them off
twice? (once in the BatchWriter's cache and once in my own)

The BatchWriter in my ingester is constantly sending data and the tablet
servers have been given more than enough memory to be able to keep up.
There's no swap being used and the network isn't experiencing any errors.


On Fri, Aug 22, 2014 at 4:54 PM, Josh Elser <josh.elser@gmail.com> wrote:

> If you get an error from a BatchWriter, you pretty much have to throw away
> that instance of the BatchWriter and make a new one. See ACCUMULO-2990. If
> you want, you should be able to catch/recover from this without having to
> restart the ingester.
>
> If the session ID is invalid, my guess is that it hasn't been used
> recently and the tserver cleaned it up. The exception logic isn't the
> greatest (as it just is presented to you as a RTE).
>
> https://issues.apache.org/jira/browse/ACCUMULO-2990
>
>
> On 8/22/14, 4:35 PM, Corey Nolet wrote:
>
>> Eric & Keith, Chris mentioned to me that you guys have seen this issue
>> before. Any ideas from anyone else are much appreciated as well.
>>
>> I recently updated a project's dependencies to Accumulo 1.6.0 built with
>> Hadoop 2.3.0. I've got CDH 5.0.2 deployed. The project has an ingest
>> component which is running all the time with a batch writer using many
>> threads to push mutations into Accumulo.
>>
>> The issue I'm having is a show stopper. At different intervals of time,
>> sometimes an hour, sometimes 30 minutes, I'm getting
>> MutationsRejectedExceptions (server errors) from the
>> TabletServerBatchWriter. Once they start, I need to restart the ingester
>> to
>> get them to stop. They always come back within 30 minutes to an hour...
>> rinse, repeat.
>>
>> The exception always happens on different tablet servers. It's a thrift
>> error saying a message was received out of sequence. In the TabletServer
>> logs, I see an "Invalid session id" exception which happens only once
>> before the client-side batch writer starts spitting out the MREs.
>>
>> I'm running some heavyweight processing in Storm along side the tablet
>> servers. I shut that processing off in hopes that maybe it was the culprit
>> but that hasn't fixed the issue.
>>
>> I'm surprised I haven't seen any other posts on the topic.
>>
>> Thanks!
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message