Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8020110606 for ; Mon, 27 Jan 2014 14:24:45 +0000 (UTC) Received: (qmail 38022 invoked by uid 500); 27 Jan 2014 14:24:44 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 37943 invoked by uid 500); 27 Jan 2014 14:24:42 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 37866 invoked by uid 99); 27 Jan 2014 14:24:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jan 2014 14:24:39 +0000 Date: Mon, 27 Jan 2014 14:24:39 +0000 (UTC) From: "Eric Newton (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (ACCUMULO-2261) duplicate locations MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Eric Newton created ACCUMULO-2261: ------------------------------------- Summary: duplicate locations Key: ACCUMULO-2261 URL: https://issues.apache.org/jira/browse/ACCUMULO-2261 Project: Accumulo Issue Type: Bug Components: master, tserver Affects Versions: 1.5.0 Environment: hadoop 2.2.0 and zookeeper 3.4.5 Reporter: Eric Newton Assignee: Eric Newton Priority: Blocker Fix For: 1.5.1 Anthony F reports the following: bq. I have observed a loss of data when tservers fail during bulk ingest. = The keys that are missing are right around the table's splits indicating th= at data was lost when a tserver died during a split. I am using Accumulo 1= .5.0. At around the same time, I observe the master logging a message abou= t "Found two locations for the same extent".=20 And: bq. I'm currently digging through the logs and will report back. Keep in = mind, I'm using Accumulo 1.5.0 on a Hadoop 2.2.0 stack. To determine data = loss, I have a 'ConsistencyCheckingIterator' that verifies each row has the= expected data (it takes a long time to scan the whole table). Below is a = quick summary of what happened. The tablet in question is "d;72~gcm~201304= ". Notice that it is assigned to 192.168.2.233:9997[343bc1fa155242c] at 20= 14-01-25 09:49:36,233. At 2014-01-25 09:49:54,141, the tserver goes away. = Then, the tablet gets assigned to 192.168.2.223:9997[143bc1f14412432] and = shortly after that, I see the BadLocationStateException. The master never = recovers from the BLSE - I have to manually delete one of the offending loc= ations. {noformat} 2014-01-25 09:49:36,233 [master.Master] DEBUG: Normal Tablets assigning tab= let d;72~gcm~201304;72=3D192.168.2.233:9997[343bc1fa155242c] 2014-01-25 09:49:36,233 [master.Master] DEBUG: Normal Tablets assigning tab= let p;18~thm~2012101;18=3D192.168.2.233:9997[343bc1fa155242c] 2014-01-25 09:49:54,141 [master.Master] WARN : Lost servers [192.168.2.233:= 9997[343bc1fa155242c]] 2014-01-25 09:49:56,866 [master.Master] DEBUG: 42 assigned to dead servers:= [d;03~u36~201302;03~thm~2012091@(null,192.168.2.233:9997[343bc1fa155242c],= null), d;06~u36~2013;06~thm~2012083@(null,192.168.2.233:9997[343bc1fa155242= c],null), d;25;24~u36~2013@(null,192.168.2.233:9997[343bc1fa155242c],null),= d;25~u36~201303;25~thm~201209@(null,192.168.2.233:9997[343bc1fa155242c],nu= ll), d;27~gcm~2013041;27@(null,192.168.2.233:9997[343bc1fa155242c],null), d= ;30~u36~2013031;30~thm~2012082@(null,192.168.2.233:9997[343bc1fa155242c],nu= ll), d;34~thm;34~gcm~2013022@(null,192.168.2.233:9997[343bc1fa155242c],null= ), d;39~thm~20121;39~gcm~20130418@(null,192.168.2.233:9997[343bc1fa155242c]= ,null), d;41~thm;41~gcm~2013041@(null,192.168.2.233:9997[343bc1fa155242c],n= ull), d;42~u36~201304;42~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c= ],null), d;45~thm~201208;45~gcm~201303@(null,192.168.2.233:9997[343bc1fa155= 242c],null), d;48~gcm~2013052;48@(null,192.168.2.233:9997[343bc1fa155242c],= null), d;60~u36~2013021;60~thm~20121@(null,192.168.2.233:9997[343bc1fa15524= 2c],null), d;68~gcm~2013041;68@(null,192.168.2.233:9997[343bc1fa155242c],nu= ll), d;72;71~u36~2013@(null,192.168.2.233:9997[343bc1fa155242c],null), d;72= ~gcm~201304;72@(192.168.2.233:9997[343bc1fa155242c],null,null), d;75~thm~20= 12101;75~gcm~2013032@(null,192.168.2.233:9997[343bc1fa155242c],null), d;78;= 77~u36~201305@(null,192.168.2.233:9997[343bc1fa155242c],null), d;90~u36~201= 3032;90~thm~2012092@(null,192.168.2.233:9997[343bc1fa155242c],null), d;91~t= hm;91~gcm~201304@(null,192.168.2.233:9997[343bc1fa155242c],null), d;93~u36~= 2013012;93~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null), m;20;= 19@(null,192.168.2.233:9997[343bc1fa155242c],null), m;38;37@(null,192.168.2= .233:9997[343bc1fa155242c],null), m;51;50@(null,192.168.2.233:9997[343bc1fa= 155242c],null), m;60;59@(null,192.168.2.233:9997[343bc1fa155242c],null), m;= 92;91@(null,192.168.2.233:9997[343bc1fa155242c],null), o;01<@(null,192.168.= 2.233:9997[343bc1fa155242c],null), o;04;03@(null,192.168.2.233:9997[343bc1f= a155242c],null), o;50;49@(null,192.168.2.233:9997[343bc1fa155242c],null), o= ;63;62@(null,192.168.2.233:9997[343bc1fa155242c],null), o;74;73@(null,192.1= 68.2.233:9997[343bc1fa155242c],null), o;97;96@(null,192.168.2.233:9997[343b= c1fa155242c],null), p;08~thm~20121;08@(null,192.168.2.233:9997[343bc1fa1552= 42c],null), p;09~thm~20121;09@(null,192.168.2.233:9997[343bc1fa155242c],nul= l), p;10;09~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null), p;18= ~thm~2012101;18@(192.168.2.233:9997[343bc1fa155242c],null,null), p;21;20~th= m~201209@(null,192.168.2.233:9997[343bc1fa155242c],null), p;22~thm~2012091;= 22@(null,192.168.2.233:9997[343bc1fa155242c],null), p;23;22~thm~2012091@(nu= ll,192.168.2.233:9997[343bc1fa155242c],null), p;41~thm~2012111;41@(null,192= .168.2.233:9997[343bc1fa155242c],null), p;42;41~thm~2012111@(null,192.168.2= .233:9997[343bc1fa155242c],null), p;58~thm~201208;58@(null,192.168.2.233:99= 97[343bc1fa155242c],null)]... 2014-01-25 09:49:59,706 [master.Master] DEBUG: Normal Tablets assigning tab= let d;72~gcm~201304;72=3D192.168.2.223:9997[143bc1f14412432] 2014-01-25 09:50:13,515 [master.EventCoordinator] INFO : tablet d;72~gcm~20= 1304;72 was loaded on 192.168.2.223:9997 2014-01-25 09:51:20,058 [state.MetaDataTableScanner] ERROR: java.lang.Runti= meException: org.apache.accumulo.server.master.state.TabletLocationState$Ba= dLocationStateException: found two locations for the same extent d;72~gcm~2= 01304: 192.168.2.223:9997[143bc1f14412432] and 192.168.2.233:9997[343bc1fa1= 55242c] java.lang.RuntimeException: org.apache.accumulo.server.master.state.TabletL= ocationState$BadLocationStateException: found two locations for the same ex= tent d;72~gcm~201304: 192.168.2.223:9997[143bc1f14412432] and 192.168.2.233= :9997[343bc1fa155242c] {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)