hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject RE: Problem caused by HBASE-617 - Blocks missing from hlog.dat
Date Wed, 21 May 2008 16:29:09 GMT
What is happening is that the region server is trying to open the meta, and failing because
there is an empty file somewhere in the region or in the recovery log. I would advise the
following:

1. shut down HBase (as cleanly as possible)
2. find and delete any zero length files
3. there may be empty, but not zero length, MapFiles. MapFiles are
   stored in a directory that looks like:
   <hbase-root>/<tablename>/<a number>/<columnname>/mapfiles/<a
long number>/

   In this directory there are two files: 'data' and 'index'. An empty
   MapFile (one that has been opened and closed with no data written to
   it) 'data' will have size 110 (bytes) and 'info' will have size 137.

   If you find one of these, you should delete the directory that contains
   them (in the example above <a long number>).


---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: Daniel Leffel [mailto:daniel.leffel@gmail.com]
> Sent: Wednesday, May 21, 2008 8:13 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Problem caused by HBASE-617 - Blocks missing
> from hlog.dat
>
> After letting the hbase attempt to assign regions all night
> long, I awoke to an unassigned meta region. Here is the master log:
>
> Desc: {name: .META., families: {info:={name: info, max versions: 1,
> compression: NONE, in memory: false, max length: 2147483647,
> bloom filter:
> none}}}
> 2008-05-21 11:10:29,560 DEBUG
> org.apache.hadoop.hbase.HMaster: Main processing loop:
> ProcessRegionClose of .META.,,1, true
> 2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster:
> region closed:
> .META.,,1
> 2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster: reassign
> region: .META.,,1
> 2008-05-21 11:10:29,729 INFO org.apache.hadoop.hbase.HMaster:
> assigning region .META.,,1 to server 10.252.242.159:60020
> 2008-05-21 11:10:32,740 DEBUG
> org.apache.hadoop.hbase.HMaster: Received
> MSG_REPORT_PROCESS_OPEN : .META.,,1 from 10.252.242.159:60020
> 2008-05-21 11:10:32,740 DEBUG
> org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE :
> .META.,,1 from 10.252.242.159:60020
> 2008-05-21 11:10:32,740 INFO org.apache.hadoop.hbase.HMaster:
> 10.252.242.159:60020 no longer serving regionname: .META.,,1,
> startKey: <>,
> endKey: <>, encodedName: 1028785192, tableDesc: {name:
> .META., families:
> {info:={name: info, max versions: 1, compression: NONE, in
> memory: false, max length: 2147483647, bloom filter: none}}}
> 2008-05-21 11:10:32,740 DEBUG
> org.apache.hadoop.hbase.HMaster: Main processing loop:
> ProcessRegionClose of .META.,,1, true
> 2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster:
> region closed:
> .META.,,1
> 2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster: reassign
> region: .META.,,1
> 2008-05-21 11:10:34,585 INFO org.apache.hadoop.hbase.HMaster:
> assigning region .META.,,1 to server 10.254.30.79:60020
> 2008-05-21 11:10:37,595 DEBUG
> org.apache.hadoop.hbase.HMaster: Received
> MSG_REPORT_PROCESS_OPEN : .META.,,1 from 10.254.30.79:60020
> 2008-05-21 11:10:37,596 DEBUG
> org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE :
> .META.,,1 from 10.254.30.79:60020
> 2008-05-21 11:10:37,596 INFO org.apache.hadoop.hbase.HMaster:
> 10.254.30.79:60020 no longer serving regionname: .META.,,1,
> startKey: <>,
> endKey: <>, encodedName: 1028785192, tableDesc: {name:
> .META., families:
> {info:={name: info, max versions: 1, compression: NONE, in
> memory: false, max length: 2147483647, bloom filter: none}}}
> 2008-05-21 11:10:37,596 DEBUG
> org.apache.hadoop.hbase.HMaster: Main processing loop:
> ProcessRegionClose of .META.,,1, true
>
>
> and here is the regin server side:
> 2008-05-21 11:12:38,027 INFO org.apache.hadoop.hbase.HRegionServer:
> MSG_REGION_OPEN : .META.,,1
> 2008-05-21 11:12:38,027 DEBUG
> org.apache.hadoop.hbase.HRegion: Opening region .META.,,1/1028785192
> 2008-05-21 11:12:38,035 DEBUG org.apache.hadoop.hbase.HStore:
> loaded
> /hbase/.META./1028785192/info/info/2509658022189995817,
> isReference=false
> 2008-05-21 11:12:38,036 DEBUG org.apache.hadoop.hbase.HStore:
> loaded
> /hbase/.META./1028785192/info/info/8183182393002383771,
> isReference=false
> 2008-05-21 11:12:38,049 DEBUG org.apache.hadoop.hbase.HStore: Loaded 2
> file(s) in hstore 1028785192/info, max sequence id 917793587
> 2008-05-21 11:12:38,116 ERROR
> org.apache.hadoop.hbase.HRegionServer: error opening region
> .META.,,1 java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at java.io.DataInputStream.readFully(DataInputStream.java:152)
>         at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1434)
>         at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
> ava:1411)
>         at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
> ava:1400)
>         at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
> ava:1395)
>         at
> org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263)
>         at
> org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242)
>         at
> org.apache.hadoop.hbase.HStoreFile$HbaseMapFile$HbaseReader.<i
nit>(HStoreFile.java:554)
>         at
> org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.<
init>(HStoreFile.java:609)
>         at
> org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:382)
>         at org.apache.hadoop.hbase.HStore.<init>(HStore.java:849)
>         at org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:431)
>         at
> org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer
> .java:1258)
>         at
> org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer
> .java:1204)
>         at java.lang.Thread.run(Thread.java:619)
>
>
>
> On Tue, May 20, 2008 at 4:19 PM, stack <stack@duboce.net> wrote:
>
> > Daniel Leffel wrote:
> >
> >> After experiencing a region server that would not exit
> (HBASE-617), I
> >> tried to bring back up hbase (after first having shut down and
> >> bringing back up DFS).
> >>
> >> There are around 370 regions. The first 250 were assigned
> to region
> >> servers within 5 minutes of startup. The rest of the
> regions took the
> >> better part of the day to become assigned to a region
> server. A quick
> >> inspection of the regionserver logs were showing messages like the
> >> following:
> >>
> >> 2008-05-20 18:33:46,964 DEBUG org.apache.hadoop.hbase.HMaster:
> >> Received MSG_REPORT_PROCESS_OPEN :
> categories,2864153,1211005494348
> >> from 10.254.26.31:60020
> >>
> >>
> >
> > These messages are sent over to the master by the regionserver as a
> > kind of ping saying "I'm still alive and working on
> whatever it was you gave me".
> >
> > Can you tell what was happening by looking in regionserver logs?
> >
> > Was it that all regions had been given to a single
> regionserver and it
> > was busy replaying edits before bringing the regions online
> (There is
> > a single worker thread per regionserver.  If lots of edits
> to replay,
> > can take seconds to minutes to bring on a region).
> >
> > Did the regions come online gradually or all in a lump?
> >
> >> After waiting for all the regions to be assigned (and an
> absence of
> >> the above message appearing in the log), I started a MapReduce job
> >> that iterates over all regions. Immediately, the above mentioned
> >> region began to show up in the logs again with the above
> message and
> >> the job failed with an IOException because it couldn't
> locate blocks.
> >>
> >> I ran fsck on /hbase and sure enough, blocks are missing from the
> >> following file (although it reports a size of 0 as what's
> missing - I
> >> presume it just doesn't know):
> >>
> >> /hbase/log_10.254.30.79_1211300015031_60020/hlog.dat.000
> >>
> >>
> > The above looks like the innocuous messages described in
> > https://issues.apache.org/jira/browse/HBASE-509.
> >
> > St.Ack
> >
> >
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release
> Date: 5/20/2008 6:45 AM
>
No virus found in this outgoing message.
Checked by AVG.
Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release Date: 5/20/2008 6:45 AM

Mime
View raw message