accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony F <>
Subject Re: data loss around splits when tserver goes down
Date Sun, 26 Jan 2014 22:09:08 GMT
This is pretty much the issue:

Slightly different error message but it's a different version.  Looks like
its fixed in 1.6.0.  I'll probably need to upgrade.

On Sun, Jan 26, 2014 at 4:47 PM, Anthony F <> wrote:

> Thanks, I'll check Jira.  As for versions, Hadoop 2.2.0, Zk 3.4.5, CentOS
> 64bit (kernel 2.6.32-431.el6.x86_64).  Has much testing been done using
> Hadoop 2.2.0?  I tried Hadoop 2.0.0 (CDH 4.5.0) but ran into HDFS-5225/5031
> which basically makes it a non-starter.
> On Sun, Jan 26, 2014 at 4:29 PM, Josh Elser <> wrote:
>> I meant to reply to your original email, but I didn't yet, sorry.
>> First off, if Accumulo is reporting that it found multiple locations for
>> the same extent, this is a (very bad) bug in Accumulo. It might be worth
>> looking at tickets that at marked as "affects 1.5.0" and "fixed in 1.5.1"
>> on Jira. It's likely that we've already encountered and fixed the issue,
>> but, if you can't find a fix that was already made, we don't want to
>> overlook the potential need for one.
>> For both "live" and "bulk" ingest, *neither* should lose any data. This
>> is one thing that Accumulo should never be doing. If you have multiple
>> locations for an extent, it seems plausible to me that you would run into
>> data loss. However, you should focus on trying to determine why you keep
>> running into multiple locations for a tablet.
>> After you take a look at Jira, I would likely go ahead and file a jira to
>> track this since it's easier to follow than an email thread. Be sure to
>> note if there is anything notable about your installation (did you download
>> it directly from the site)? You should also include
>> what OS and version and what Hadoop and ZooKeeper versions you are running.
>> On 1/26/2014 4:10 PM, Anthony F wrote:
>>> I have observed a loss of data when tservers fail during bulk ingest.
>>> The keys that are missing are right around the table's splits indicating
>>> that data was lost when a tserver died during a split.  I am using
>>> Accumulo 1.5.0.  At around the same time, I observe the master logging a
>>> message about "Found two locations for the same extent".  Can anyone
>>> shed light on this behavior?  Are tserver failures during bulk ingest
>>> supposed to be fault tolerant?

View raw message