accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: data loss around splits when tserver goes down
Date Sun, 26 Jan 2014 22:56:44 GMT
Just because the error message is the same doesn't mean that the root 
cause is also the same.

Without looking more into Eric's changes, I'm not sure if ACCUMULO-2057 
would also affect 1.5.0. We're usually pretty good about checking 
backwards when bugs are found in newer versions, but things slip through 
the cracks, too.

On 1/26/2014 5:09 PM, Anthony F wrote:
> This is pretty much the issue:
> Slightly different error message but it's a different version.  Looks
> like its fixed in 1.6.0.  I'll probably need to upgrade.
> On Sun, Jan 26, 2014 at 4:47 PM, Anthony F <
> <>> wrote:
>     Thanks, I'll check Jira.  As for versions, Hadoop 2.2.0, Zk 3.4.5,
>     CentOS 64bit (kernel 2.6.32-431.el6.x86_64).  Has much testing been
>     done using Hadoop 2.2.0?  I tried Hadoop 2.0.0 (CDH 4.5.0) but ran
>     into HDFS-5225/5031 which basically makes it a non-starter.
>     On Sun, Jan 26, 2014 at 4:29 PM, Josh Elser <
>     <>> wrote:
>         I meant to reply to your original email, but I didn't yet, sorry.
>         First off, if Accumulo is reporting that it found multiple
>         locations for the same extent, this is a (very bad) bug in
>         Accumulo. It might be worth looking at tickets that at marked as
>         "affects 1.5.0" and "fixed in 1.5.1" on Jira. It's likely that
>         we've already encountered and fixed the issue, but, if you can't
>         find a fix that was already made, we don't want to overlook the
>         potential need for one.
>         For both "live" and "bulk" ingest, *neither* should lose any
>         data. This is one thing that Accumulo should never be doing. If
>         you have multiple locations for an extent, it seems plausible to
>         me that you would run into data loss. However, you should focus
>         on trying to determine why you keep running into multiple
>         locations for a tablet.
>         After you take a look at Jira, I would likely go ahead and file
>         a jira to track this since it's easier to follow than an email
>         thread. Be sure to note if there is anything notable about your
>         installation (did you download it directly from the
> <> site)? You
>         should also include what OS and version and what Hadoop and
>         ZooKeeper versions you are running.
>         On 1/26/2014 4:10 PM, Anthony F wrote:
>             I have observed a loss of data when tservers fail during
>             bulk ingest.
>             The keys that are missing are right around the table's
>             splits indicating
>             that data was lost when a tserver died during a split.  I am
>             using
>             Accumulo 1.5.0.  At around the same time, I observe the
>             master logging a
>             message about "Found two locations for the same extent".
>               Can anyone
>             shed light on this behavior?  Are tserver failures during
>             bulk ingest
>             supposed to be fault tolerant?

View raw message