Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of tomaz.logar@tobonet.com
 designates 195.95.173.245 as permitted sender)
Message-ID: <4EF0EE52.1050808@tobonet.com>
Date: Tue, 20 Dec 2011 21:21:38 +0100
From: Tomaz Logar <tomaz.logar@tobonet.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:8.0) Gecko/20111105 Thunderbird/8.0
MIME-Version: 1.0
To: dev@hbase.apache.org
Subject: Re: Table refuses a scan of old data, but not new
References: <1324320423.62271.YahooMailNeo@web121704.mail.ne1.yahoo.com>
 <47EA5E8AC8F34924896B09C9964C4DF6@china.huawei.com>
 <CALte62wSVuLTz8VrFrOuRi1xNmGqVsMPRvoikA_oJuy3Pv5bXQ@mail.gmail.com>
 <4EF0C9A8.5010507@tobonet.com>
 <CADY20s5rVyMmn6PVuEGcPJJihrVdm3ewXc+xwuRXJXp-bmLXMA@mail.gmail.com>
In-Reply-To: 
 <CADY20s5rVyMmn6PVuEGcPJJihrVdm3ewXc+xwuRXJXp-bmLXMA@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit


Hej, Todd.

The relevant clipping is:
---
Version: 0.90.4-cdh3u2
..........
Number of Tables: 15
Number of live region servers: 8
Number of dead region servers: 0
.ERROR: Version file does not exist in root dir file:/tmp/hbase-ta/hbase
Number of empty REGIONINFO_QUALIFIER rows in .META.: 0

ERROR: Region xxx found in META, but not in HDFS, and deployed on nx 
(times 48)

Summary:
table is okay.
Number of regions: 48
Deployed on: n1, n2, ... n8
---

Summary says the table in question is ok.

For every region I get "found in META, but not in HDFS", but that seems 
a false positive as it is reported for all of them (11k+) and other 
tables work ok. And the files are in HDFS, ofcourse.

No mention of any specific region being broken... :(


T.

Dne 20.12.2011 18:56, pi�e Todd Lipcon:
> Hi Tomaz,
>
> What does "hbase hbck" report? Maybe you have a broken region of sorts?
>
> -Todd
>
> On Tue, Dec 20, 2011 at 9:45 AM, Tomaz Logar<tomaz.logar@tobonet.com>  wrote:
>> Hello, everybody.
>>
>> I hit a strange snag in HBase today. I have a table with 48 regions spread
>> over 8 regionservers. It grows by about one region per day. It's like 6M
>> small (30-100 bytes each) records at the moment, 3.2G of Snappy-encoded data
>> on disks.
>>
>> What happened is that suddenly I can't scan over any previously inserted
>> data in just one table. Freshly put data seems to be ok:
>>
>> ---
>> hbase(main):035:0>  put 'table', "\x00TEST", "*:t", "TEST"
>> 0 row(s) in 0.0300 seconds
>>
>> hbase(main):041:0* scan 'table', {STARTROW=>"\x00TEST", LIMIT=>2}
>> ROW       COLUMN+CELL
>> \x00TEST  column=*:t, timestamp=1324392041600, value=TEST
>> ERROR: java.lang.RuntimeException:
>> org.apache.hadoop.hbase.regionserver.LeaseException:
>> org.apache.hadoop.hbase.regionserver.LeaseException: lease
>> '-1785731371547934030' does not exist
>> ---
>>
>> So scan gets the record I put just before, but times out on old record that
>> comes right after it. :(
>>
>> If I target an old record I don't even get an exception, just a huge
>> timeout, no exception in regionserver log either:
>> ---
>> hbase(main):049:0>  scan 'table', {STARTROW=>"0ua", LIMIT=>1}
>> ROW       COLUMN+CELL
>> 0 row(s) in 146.2210 seconds
>> ---
>>
>> It may be relevant that I'm getting these on another, much bigger (3T
>> Snappy, 7+B records), yet working table:
>> ---
>> 11/12/20 17:50:37 WARN ipc.HBaseServer: IPC Server Responder, call
>> next(-15185895745499515, 1) from 192.168.32.192:64307: output error
>> 11/12/20 17:50:37 WARN ipc.HBaseServer: IPC Server handler 5 on 60020
>> caught: java.nio.channels.ClosedChannelException
>> 11/12/20 17:32:43 WARN snappy.LoadSnappy: Snappy native library is available
>> ---
>> But these scans seem to recover while map-reducing.
>>
>> I'm running hbase-0.90.4-cdh3u2 from Cloudera SCM bundle on mixed nodes (5 *
>> 2 core 4G RAM, 3 * 12 core 16G RAM) with 1.5G RAM allocated for each HBase
>> regionserver.
>>
>>
>> Can anyone share some wisdom? Anyone got a similar half-broken problem
>> solved before?
>>
>>
>> Thanks,
>>
>> T.
>>
>>
>
>