Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 81984 invoked from network); 13 Nov 2009 19:21:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Nov 2009 19:21:01 -0000 Received: (qmail 34373 invoked by uid 500); 13 Nov 2009 19:21:01 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 34327 invoked by uid 500); 13 Nov 2009 19:21:00 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 34317 invoked by uid 99); 13 Nov 2009 19:21:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2009 19:21:00 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE,NORMAL_HTTP_TO_IP,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vpuranik@gmail.com designates 209.85.216.204 as permitted sender) Received: from [209.85.216.204] (HELO mail-px0-f204.google.com) (209.85.216.204) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2009 19:20:56 +0000 Received: by pxi42 with SMTP id 42so2466972pxi.5 for ; Fri, 13 Nov 2009 11:20:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=8qd7Wpu8zUa3+7VRWZos49XYriFTabUtdsIGoJQxYuM=; b=YMwuJ7Xdn4+1D2esKFO8HxiwGuhzkmJ1EETzWkl77QBzCojWPHUGsKxNLR7cidrCIS Q5vSXnz8qbnIIPGG0ASSxJuESCv+8mCLn6WChw8Meplfh71qOvtQgMDXFb7eVKgGTp78 /U1fqHoIMAqnvJXvLdrlccP4IRkaPeM7SYQkM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=RGdBSlA2RqmIqRRzJsRbH8mdbMdwFvkEbjMBHYHF6MYbuT4Q0SVcBEvnYo5+0zu9B5 eKetkf9KI5SWAdDdZKXmDbafLK3Xlbl1VErt2Z6M7jXmQu4pDGQyTdjAsBkVjiH9zmk2 9spGOjuuPf1Vud80PUSb3yn7jtAqmpMGqdsi4= MIME-Version: 1.0 Received: by 10.114.68.17 with SMTP id q17mr194673waa.113.1258140033585; Fri, 13 Nov 2009 11:20:33 -0800 (PST) In-Reply-To: <2eef7bd70911101750h2d60f798vafba410033a683b5@mail.gmail.com> References: <2eef7bd70911091711w4d3db5d0y9bdc999e856fb7af@mail.gmail.com> <7c962aed0911091719w3c091846q3889d5264294124@mail.gmail.com> <2eef7bd70911091840w27598314ja94c7cef78797180@mail.gmail.com> <7c962aed0911101256u808731al3f1b6ac14fdcb038@mail.gmail.com> <2eef7bd70911101411o37317c21tb0135d394f25b590@mail.gmail.com> <7c962aed0911101428u9d4856cw1eb30714331b9673@mail.gmail.com> <2eef7bd70911101448v504aeb10mce588e92ae9cb313@mail.gmail.com> <2eef7bd70911101750h2d60f798vafba410033a683b5@mail.gmail.com> Date: Fri, 13 Nov 2009 11:20:33 -0800 Message-ID: <2eef7bd70911131120j52067447pa78f581db8571f14@mail.gmail.com> Subject: Re: A region full of data is missing From: Vaibhav Puranik To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00504502eacdc793ed0478458d8e --00504502eacdc793ed0478458d8e Content-Type: text/plain; charset=ISO-8859-1 Now that we have resolved this problem and figured out that some data could be missing because of a region having a small empty file, we were wondering if there is any automated way we can check all of our regions for this kind of problem. One obvious way would be to check all the regions for a small (228 bytes) file. But is there any other way or other approach to make sure that all of our regions are intact? Should we be running a script periodically that will notify us whether all of our regions are intact or not? Regards, Vaibhav Puranik Gumgum On Tue, Nov 10, 2009 at 5:50 PM, Vaibhav Puranik wrote: > This problem is resolved. Courtesy Ryan, JD and Stack.Thank you very much! > > For the culprit region there were two data files instead of one data file. > The size of the first data file was around 130 MB. The second file was just > 228 bytes. > Because of a bug this second file gets created during major compaction. > That prevents the region from loading properly. > > As Ryan asked me to do, I deleted the smaller file, closed the region. This > time HBase reopened it properly and the missing data came back up. I could > access the missing data. > > I am not sure whether the bug is > https://issues.apache.org/jira/browse/HBASE-1686 as this bug has a fixed > version of 0.20 but we already have 0.20.0 deployed in production. > But as per Ryan the root cause is the same. > > I guess we need to upgrade our HBase to 0.20.1! > > Thanks again, > Vaibhav Puranik > Gumgum > > > > > > > > > > > On Tue, Nov 10, 2009 at 2:48 PM, Vaibhav Puranik wrote: > >> Region name contains table name, start key and an id. >> Start key is binary. In our case it was a mixture of few longs. Whenever >> printed, it always prints Unicode characters which looks like a junk or >> garbled characters. I am not sure whether shell can interpret it correctly. >> >> I don't know how to give this name on the shell console hence I used the >> HBaseAdmin method. >> >> I kept watching logs while I was doing it. The logs said it closed the >> region and reopened it. It reopened it on the same region server. >> >> I tried accessing data after this, but it didn't work. >> >> .META. table seems to have its entry. The entry looks like: >> >> column=historian:assignment, timestamp=1257889883623, value=Region >> assigned to server >> domU-12-32-38-01-24-F2.z-2.compute-1.internal,60020,1253581834090 >> >> >> column=historian:open, timestamp=1257889886631, value=Region opened on >> server : >> domU-12-32-38-01-24-F2.z-2.compute-1.internal >> >> >> column=info:regioninfo, timestamp=1250406167893, value=REGION => {NAME => >> 'Visits >> \337\347\000\000\000\000\00 >> ,\\x00\\x00\\x01\\x22\\xD2\\x1B\\xDF\\xE7\\x00\\x00\\x00\\x00\\x00\\x02\\xAF\\xFE >> 0\002\257\376,1250406166412 ,1250406166412', STARTKEY => >> '\\x00\\x00\\x01\\x22\\xD2\\x1B\\xDF\\xE7\\x00\\x00\ >> \x00\\x00\\x00\\x02\\xAF\\xFE', ENDKEY => >> '\\x00\\x00\\x01\\x22\\xFC\\x27\\x0F8\\ >> x00\\x00\\x00\\x00\\x00\\x05X:', ENCODED => >> 1887697866, TABLE => {{NAME => 'Visit >> s', FAMILIES => [{NAME => 'data', VERSIONS => >> '3', COMPRESSION => 'NONE', TTL => >> '2147483647', BLOCKSIZE => '65536', IN_MEMORY >> => 'false', BLOCKCACHE => 'true'}]} >> >> } >> >> column=info:server, timestamp=1257889886630, value=10.255.43.0:60020 >> >> >> column=info:serverstartcode, timestamp=1257889886630, >> value=1253581834090 >> >> >> Regards, >> Vaibhav >> >> >> >> >> >> On Tue, Nov 10, 2009 at 2:28 PM, stack wrote: >> >>> You couldn't run the shell? >>> >>> So, region closed and opened somewhere else? Open on another >>> regionserver >>> and you still can't get data out of it? >>> >>> St.Ack >>> >>> >>> On Tue, Nov 10, 2009 at 2:11 PM, Vaibhav Puranik >>> wrote: >>> >>> > Stack, >>> > >>> > I tried doing HBaseAdmin.closeRegion with the binary region name. >>> > >>> > It closed the region and reopened it. But we still can not access the >>> data. >>> > >>> > I guess trying to read it back from the data file is the only option >>> left, >>> > right? >>> > >>> > Regards, >>> > Vaibhav >>> > >>> > On Tue, Nov 10, 2009 at 12:56 PM, stack wrote: >>> > >>> > > On Mon, Nov 9, 2009 at 6:40 PM, Vaibhav Puranik >>> > > wrote: >>> > > >>> > > > Does that mean the region is >>> > > > open and needs to be closed? >>> > > > >>> > > > It means region should be open... especially if its the message the >>> > > regionserver is passing back to the Master reporting successful open. >>> > > Maybe >>> > > check the regionserver log to see if anything happened with the >>> region >>> > > subsequently? >>> > > >>> > > >>> > > >>> > > > All the other regions seems to have one file in their data >>> directory. >>> > > This >>> > > > region has two files in its data directory. >>> > > > Is that right? >>> > > > >>> > > >>> > > Over time, varies. These are the files that carry the data. When >>> number >>> > > hits a threshold, they are compacted into one file. >>> > > >>> > > So, did close work? >>> > > >>> > > If not, you can find the region in the fileystem? If so, if any good >>> w/ >>> > > ruby, see the add_table.rb script in head of the 0.20 branch. See >>> how it >>> > > can read a region and add an entry for it to .META. You might be >>> able to >>> > > hack it up to do the one region if the close doesn't work. >>> > > >>> > > St.Ack >>> > > >>> > >>> >> >> > --00504502eacdc793ed0478458d8e--