Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of vpuranik@gmail.com designates
 209.85.216.204 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=RGdBSlA2RqmIqRRzJsRbH8mdbMdwFvkEbjMBHYHF6MYbuT4Q0SVcBEvnYo5+0zu9B5
         eKetkf9KI5SWAdDdZKXmDbafLK3Xlbl1VErt2Z6M7jXmQu4pDGQyTdjAsBkVjiH9zmk2
         9spGOjuuPf1Vud80PUSb3yn7jtAqmpMGqdsi4=
MIME-Version: 1.0
In-Reply-To: <2eef7bd70911101750h2d60f798vafba410033a683b5@mail.gmail.com>
References: <2eef7bd70911091711w4d3db5d0y9bdc999e856fb7af@mail.gmail.com>
	 <7c962aed0911091719w3c091846q3889d5264294124@mail.gmail.com>
	 <2eef7bd70911091840w27598314ja94c7cef78797180@mail.gmail.com>
	 <7c962aed0911101256u808731al3f1b6ac14fdcb038@mail.gmail.com>
	 <2eef7bd70911101411o37317c21tb0135d394f25b590@mail.gmail.com>
	 <7c962aed0911101428u9d4856cw1eb30714331b9673@mail.gmail.com>
	 <2eef7bd70911101448v504aeb10mce588e92ae9cb313@mail.gmail.com>
	 <2eef7bd70911101750h2d60f798vafba410033a683b5@mail.gmail.com>
Date: Fri, 13 Nov 2009 11:20:33 -0800
Message-ID: <2eef7bd70911131120j52067447pa78f581db8571f14@mail.gmail.com>
Subject: Re: A region full of data is missing
From: Vaibhav Puranik <vpuranik@gmail.com>
To: hbase-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=00504502eacdc793ed0478458d8e

--00504502eacdc793ed0478458d8e
Content-Type: text/plain; charset=ISO-8859-1

Now that we have resolved this problem and figured out that some data could
be missing because of a region having a small empty file, we were wondering
if there is any automated way we can check all of our regions for this kind
of problem.

One obvious way would be to check all the regions for a small (228 bytes)
file. But is there any other way or other approach to make sure that all of
our regions are intact? Should we be running a script periodically that will
notify us whether all of our regions are intact or not?

Regards,
Vaibhav Puranik
Gumgum


On Tue, Nov 10, 2009 at 5:50 PM, Vaibhav Puranik <vpuranik@gmail.com> wrote:

> This problem is resolved. Courtesy  Ryan, JD and Stack.Thank you very much!
>
> For the culprit region there were two data files instead of one data file.
> The size of the first data file was around 130 MB. The second file was just
> 228 bytes.
> Because of a bug this second file gets created during major compaction.
> That prevents the region from loading properly.
>
> As Ryan asked me to do, I deleted the smaller file, closed the region. This
> time HBase reopened it properly and the missing data came back up. I could
> access the missing data.
>
> I am not sure whether the bug is
> https://issues.apache.org/jira/browse/HBASE-1686 as this bug has  a fixed
> version of 0.20 but we already have 0.20.0 deployed in production.
> But as per Ryan the root cause is the same.
>
> I guess we need to upgrade our HBase to 0.20.1!
>
> Thanks again,
> Vaibhav Puranik
> Gumgum
>
>
>
>
>
>
>
>
>
>
> On Tue, Nov 10, 2009 at 2:48 PM, Vaibhav Puranik <vpuranik@gmail.com>wrote:
>
>> Region name contains table name, start key and an id.
>> Start key is binary. In our case it was a mixture of few longs. Whenever
>> printed, it always prints Unicode characters which looks like a junk or
>> garbled characters. I am not sure whether shell can interpret it correctly.
>>
>> I don't know how to give this name on the shell console hence I used the
>> HBaseAdmin method.
>>
>> I kept watching logs while I was doing it. The logs said it closed the
>> region and reopened it. It reopened it on the same region server.
>>
>> I tried accessing data after this, but it didn't work.
>>
>> .META. table seems to have its entry. The entry looks like:
>>
>>   column=historian:assignment, timestamp=1257889883623, value=Region
>> assigned to server
>> domU-12-32-38-01-24-F2.z-2.compute-1.internal,60020,1253581834090
>>
>>
>>  column=historian:open, timestamp=1257889886631, value=Region opened on
>> server :
>> domU-12-32-38-01-24-F2.z-2.compute-1.internal
>>
>>
>> column=info:regioninfo, timestamp=1250406167893, value=REGION => {NAME =>
>> 'Visits
>>  \337\347\000\000\000\000\00
>> ,\\x00\\x00\\x01\\x22\\xD2\\x1B\\xDF\\xE7\\x00\\x00\\x00\\x00\\x00\\x02\\xAF\\xFE
>>  0\002\257\376,1250406166412 ,1250406166412', STARTKEY =>
>> '\\x00\\x00\\x01\\x22\\xD2\\x1B\\xDF\\xE7\\x00\\x00\
>>                              \x00\\x00\\x00\\x02\\xAF\\xFE', ENDKEY =>
>> '\\x00\\x00\\x01\\x22\\xFC\\x27\\x0F8\\
>>                              x00\\x00\\x00\\x00\\x00\\x05X:', ENCODED =>
>> 1887697866, TABLE => {{NAME => 'Visit
>>                              s', FAMILIES => [{NAME => 'data', VERSIONS =>
>> '3', COMPRESSION => 'NONE', TTL =>
>>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY
>> => 'false', BLOCKCACHE => 'true'}]}
>>
>> }
>>
>>  column=info:server, timestamp=1257889886630, value=10.255.43.0:60020
>>
>>
>> column=info:serverstartcode, timestamp=1257889886630,
>> value=1253581834090
>>
>>
>> Regards,
>> Vaibhav
>>
>>
>>
>>
>>
>> On Tue, Nov 10, 2009 at 2:28 PM, stack <stack@duboce.net> wrote:
>>
>>> You couldn't run the shell?
>>>
>>> So, region closed and opened somewhere else?  Open on another
>>> regionserver
>>> and you still can't get data out of it?
>>>
>>> St.Ack
>>>
>>>
>>> On Tue, Nov 10, 2009 at 2:11 PM, Vaibhav Puranik <vpuranik@gmail.com>
>>> wrote:
>>>
>>> > Stack,
>>> >
>>> > I tried doing HBaseAdmin.closeRegion with the binary region name.
>>> >
>>> > It closed the region and reopened it. But we still can not access the
>>> data.
>>> >
>>> > I guess trying to read it back from the data file is the only option
>>> left,
>>> > right?
>>> >
>>> > Regards,
>>> > Vaibhav
>>> >
>>> > On Tue, Nov 10, 2009 at 12:56 PM, stack <stack@duboce.net> wrote:
>>> >
>>> > > On Mon, Nov 9, 2009 at 6:40 PM, Vaibhav Puranik <vpuranik@gmail.com>
>>> > > wrote:
>>> > >
>>> > > >  Does that mean the region is
>>> > > > open and needs to be closed?
>>> > > >
>>> > > > It means region should be open... especially if its the message the
>>> > > regionserver is passing back to the Master reporting successful open.
>>> > >  Maybe
>>> > > check the regionserver log to see if anything happened with the
>>> region
>>> > > subsequently?
>>> > >
>>> > >
>>> > >
>>> > > > All the other regions seems to have one file in their data
>>> directory.
>>> > > This
>>> > > > region has two files in its data directory.
>>> > > > Is that right?
>>> > > >
>>> > >
>>> > > Over time, varies.  These are the files that carry the data.  When
>>> number
>>> > > hits a threshold, they are compacted into one file.
>>> > >
>>> > > So, did close work?
>>> > >
>>> > > If not, you can find the region in the fileystem?  If so, if any good
>>> w/
>>> > > ruby, see the add_table.rb script in head of the 0.20 branch.  See
>>> how it
>>> > > can read a region and add an entry for it to .META.  You might be
>>> able to
>>> > > hack it up to do the one region if the close doesn't work.
>>> > >
>>> > > St.Ack
>>> > >
>>> >
>>>
>>
>>
>

--00504502eacdc793ed0478458d8e--