accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: data loss around splits when tserver goes down
Date Sun, 26 Jan 2014 23:01:28 GMT
Hadoop 2.2.0 wasn't released before Accumulo 1.5.0 so it's impossible to 
have tested that then :)

I've personally done extensive testing of 1.5.1-SNAPSHOT and 
1.6.0-SNAPSHOT with Hadoop 2.2.0. I know others have also been doing the 
same.

On 1/26/2014 4:47 PM, Anthony F wrote:
> Thanks, I'll check Jira.  As for versions, Hadoop 2.2.0, Zk 3.4.5,
> CentOS 64bit (kernel 2.6.32-431.el6.x86_64).  Has much testing been done
> using Hadoop 2.2.0?  I tried Hadoop 2.0.0 (CDH 4.5.0) but ran into
> HDFS-5225/5031 which basically makes it a non-starter.
>
>
> On Sun, Jan 26, 2014 at 4:29 PM, Josh Elser <josh.elser@gmail.com
> <mailto:josh.elser@gmail.com>> wrote:
>
>     I meant to reply to your original email, but I didn't yet, sorry.
>
>     First off, if Accumulo is reporting that it found multiple locations
>     for the same extent, this is a (very bad) bug in Accumulo. It might
>     be worth looking at tickets that at marked as "affects 1.5.0" and
>     "fixed in 1.5.1" on Jira. It's likely that we've already encountered
>     and fixed the issue, but, if you can't find a fix that was already
>     made, we don't want to overlook the potential need for one.
>
>     For both "live" and "bulk" ingest, *neither* should lose any data.
>     This is one thing that Accumulo should never be doing. If you have
>     multiple locations for an extent, it seems plausible to me that you
>     would run into data loss. However, you should focus on trying to
>     determine why you keep running into multiple locations for a tablet.
>
>     After you take a look at Jira, I would likely go ahead and file a
>     jira to track this since it's easier to follow than an email thread.
>     Be sure to note if there is anything notable about your installation
>     (did you download it directly from the accumulo.apache.org
>     <http://accumulo.apache.org> site)? You should also include what OS
>     and version and what Hadoop and ZooKeeper versions you are running.
>
>
>     On 1/26/2014 4:10 PM, Anthony F wrote:
>
>         I have observed a loss of data when tservers fail during bulk
>         ingest.
>         The keys that are missing are right around the table's splits
>         indicating
>         that data was lost when a tserver died during a split.  I am using
>         Accumulo 1.5.0.  At around the same time, I observe the master
>         logging a
>         message about "Found two locations for the same extent".  Can anyone
>         shed light on this behavior?  Are tserver failures during bulk
>         ingest
>         supposed to be fault tolerant?
>
>

Mime
View raw message