hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mike anderson <saidthero...@gmail.com>
Subject Re: table contents disappeared
Date Thu, 18 Jun 2009 18:35:19 GMT
0.19.3, hdfs, 10 nodes fully distributed.

Is there a way to rebuild what was lost (even partially)? will this problem
be fixed in 0.20?


On Thu, Jun 18, 2009 at 1:51 PM, stack <stack@duboce.net> wrote:

> You are on what version of hbase?
>
> My guess is its 0.19.x?
>
> How many nodes.  You using hdfs or local fs?
>
> The log below doesn't show issues.
>
> So, as to what happened, I speculate that you loaded up your table and then
> there was some issue -- did you up your file descriptors, xceivers, etc? --
> that caused the hang but uploads, in particular the edits that included
> creation of your table and addition table regions had not been persisted.
> The hungup hbase and your kill -9 -- there is nothing else you can do when
> it won't respond though you could try ./bin/hbase-daemon.sh stop
> regionserver on each of your regionservers to try and bring them down
> nicely
> -- meant the catalog table edits were lost so it appears your table is lost
> (HDFS does not have a working flush/sync/append in hadoop 0.19.x so hbase
> can lose data).
>
> In the head of the 0.19 branch we've done stuff to make the window whereby
> we lose edits narrower (.META. flushes every few k or so).  I need to put
> up
> a 0.19.4 release candidate (I'm held up by my tracing a new issue here on
> our home cluster).
>
> St.Ack
>
>
>
>
>
> On Thu, Jun 18, 2009 at 9:10 AM, mike anderson <saidtherobot@gmail.com
> >wrote:
>
> > I had about 30,000 rows in my table 'cached_parsedtext'.  This morning
> when
> > I checked, Hbase appeared to be down (master server web UI was not
> > responding and the Shell crashed when I tried to count rows). I tried
> doing
> > a nice shutdown via bin/stop-hbase, this hung for about 20 minutes though
> > so
> > I gave up and did a kill -9 on the hbase processes (what else was I
> > supposed
> > to do!?). Upon restarting I discovered that all of the rows were gone. I
> > browsed the filesystem and saw that some of the metadata still existed in
> > hadoop dfs. Is there a way to rebuild the table? (After the force kill I
> > also did a nice restart of hbase and hadoop -- same results)
> >
> > A few of the relevent looking log files are included below for those that
> > speak the language. However, these don't really mean much to me.
> >
> > logs/hbase-pubget-master-carr.domain.com.log:2009-06-18 11:12:42,038 INFO
> > org.apache.hadoop.hba
> > se.master.ServerManager: Received MSG_REPORT_OPEN:
> > cached_parsedtext,,1244838542607: safeMode=false fr
> > om 10.0.16.91:60020
> > logs/hbase-pubget-master-carr.domain.com.log:2009-06-18 11:12:42,038 INFO
> > org.apache.hadoop.hba
> > se.master.ProcessRegionOpen$1: cached_parsedtext,,1244838542607 open on
> > 10.0.16.91:60020
> > logs/hbase-pubget-master-carr.domain.com.log:2009-06-18 11:12:42,039 INFO
> > org.apache.hadoop.hba
> > se.master.ProcessRegionOpen$1: updating row
> > cached_parsedtext,,1244838542607
> > in region .META.,,1 with
> > startcode 1245337882941 and server 10.0.16.91:60020
> > logs/hbase-pubget-master-carr.domain.com.log:2009-06-18 11:31:31,595 INFO
> > org.apache.hadoop.hba
> > se.master.RegionManager: assigning region
> cached_parsedtext,,1244838542607
> > to the only server 10.0.16.
> > 91:60020
> > logs/hbase-pubget-master-carr.domain.com.log:2009-06-18 11:31:34,823 INFO
> > org.apache.hadoop.hba
> > se.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN:
> > cached_parsedtext,,1244838542607: safeMode=
> > false from 10.0.16.91:60020
> >
> >
> >
> >
> > Ideally I'd love to get my table back, but if not, learning how to avoid
> > this in the future would be great.
> >
> >
> > Thanks in advance,
> > Mike
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message