hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: OfflineMetaRepair?
Date Fri, 06 Jan 2012 04:40:14 GMT
Jon,

My question was about "orphaned" data in hdfs in a first place. It looks like
either region splits or table deletes (or both) are not executed correctly (with old data
not being removed
completely).

Our original issue was related to .META. inconsistency (region holes) for one of our internal
system table.
How it occurred is beyond my comprehension, therefore I can't say for sure what was the reason.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Jonathan Hsieh [jon@cloudera.com]
Sent: Thursday, January 05, 2012 5:58 PM
To: dev@hbase.apache.org
Subject: Re: OfflineMetaRepair?

Vlad,

If it is a deleted table, you can just delete those dirs out of hdfs
directly.

The work flow for this first cut of the tool is cautious and requires the
user to make decisions on what do with orphaned data and manually handle
them.  Basically, at the time, I had only encountered this kind of problem
a few times, I didn't want the tool to delete any data, and I wanted push
that decision to the user.

The problem that triggered me to write this tool was a situation where 2300
meta rows were bad and 3 hdfs regiondirs were missing .regioninfo files.
 Manually repairing meta was out of the question.  The likely cause in that
situation was that the hdfs nn died under hbase and hbase likely got
getting confused during recovery.

Other cases where I've encountered similar problems generally have to do
with regionsplits that failed to complete successfully and failed to
rollback properly.

Did you encounter any these kinds events that could have triggered your
problems?

FWIW, I'm in the process of debugging a new version (HBASE-5128) of the
tool that is tries to automatically restore data while online.  Hopefully
this can repair bad region splits in a relatively painless manner.
 Currently the tests cases are good now and I'm testing against a real
cluster that I'm intentionally corrupting.  Hopefully should have a patch
for 0.90.5 ready in a few days (but there may be limitations).

Jon.

On Thu, Jan 5, 2012 at 5:37 PM, Vladimir Rodionov
<vrodionov@carrieriq.com>wrote:

> I cp'ed hdfs-site.xml into HBASE_CONF_DIR and was able tun the tool.
>
> The tool found a lot of abandoned regions:
>
> like this one:
>
> 12/01/06 01:18:15 ERROR util.HBaseFsck: Bailed out due to:
> org.apache.hadoop.hbase.util.HBaseFsck$RegionInfoLoadException: Unable to
> load region info for table TRIAL-DIMENSIONS-1324576713641!  It may be an
> invalid format or version file.  You may want to remove hdfs://
> us01-ciqps1-name01.carrieriq.com:9000/hbase/TRIAL-DIMENSIONS-1324576713641/ff6031e6472d10bac8517314179acb33region
from hdfs and retry.
>        at
> org.apache.hadoop.hbase.util.HBaseFsck.loadTableInfo(HBaseFsck.java:292)
>         at
> org.apache.hadoop.hbase.util.HBaseFsck.rebuildMeta(HBaseFsck.java:402)
>        at
> org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair.main(OfflineMetaRepair.java:90)
>
> There are hundreds of such regions, literally.
>
> Region directories contain only .tmp subdir, like this one:
>
>
> /hbase/M2M-INTEGRATION-MM_ERRORS-1324575562966/fd480b2c39f7d3333308bf1d9a304510/.tmp
>
> No .regioninfo
>
> These dirs are left-overs of a tables which have been deleted already and
> they confuse this tool.  If we delete table we should wipe out the whole
> directory, is not it?
> Is there any scenario which can explain this?
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Todd Lipcon [todd@cloudera.com]
> Sent: Thursday, January 05, 2012 4:50 PM
> To: dev@hbase.apache.org
> Subject: Re: OfflineMetaRepair?
>
> Are you sure you have fs.default.name set properly to hdfs://yournn/
> in your hbase-site.xml?
>
> You shouldn't *have* to do this, but I bet it will fix the issue.
>
> -Todd
>
> On Thu, Jan 5, 2012 at 4:26 PM, Jonathan Hsieh <jon@cloudera.com> wrote:
> > Hey Vlad,
> >
> > I wrote the tool -- and I've used it to repair a fairly messed up META
> > table.  I must of used on a local filesystem copy of META (just got all
> the
> > .regioninfo files in their directory paths), and then shipped the
> repaired
> > version of the .META. dir to the customer.
> >
> > This is definitely a bug.  FIle the jira and I'll try to fix in the next
> > few days.
> >
> > Jon.
> >
> > On Thu, Jan 5, 2012 at 4:16 PM, Vladimir Rodionov
> > <vrodionov@carrieriq.com>wrote:
> >
> >> Ted,
> >>
> >> "fs.default.name" is a standard config property name which is described
> >> here:
> >> http://hadoop.apache.org/common/docs/current/core-default.html
> >>
> >> It is not CDH -specific. If you are right than this tool has never been
> >> tested.
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: vrodionov@carrieriq.com
> >>
> >> ________________________________________
> >> From: Ted Yu [yuzhihong@gmail.com]
> >> Sent: Thursday, January 05, 2012 4:06 PM
> >> To: dev@hbase.apache.org
> >> Subject: Re: OfflineMetaRepair?
> >>
> >> Vlad:
> >> In the future, please drop unrelated discussion from bottom of your
> email.
> >>
> >> I think what you saw was caused by FS default name not being set
> correctly.
> >> In hbck:
> >>        conf.set("fs.defaultFS", conf.get(HConstants.HBASE_DIR));
> >> But cdh3 uses:
> >>    conf.set("fs.default.name", "hdfs://localhost:0");
> >> ./src/test/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java
> >>
> >> You can try adding the following line after line 77 of
> >> OfflineMetaRepair.java:
> >>    conf.set("fs.default.name", path);
> >> and rebuilding hbase 0.90.6 (tip of 0.92 branch)
> >>
> >> If the above works, please file a JIRA.
> >>
> >> Thanks
> >>
> >> On Thu, Jan 5, 2012 at 3:30 PM, Vladimir Rodionov
> >> <vrodionov@carrieriq.com>wrote:
> >>
> >> > 0.90.5
> >> >
> >> > I am trying to repair .META. table using this tool
> >> >
> >> > 1.  HBase cluster was shutdown
> >> >
> >> > Then I ran:
> >> >
> >> > 2. [name01 bin]$ hbase
> >> org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
> >> > -base hdfs://us01-ciqps1-name01.carrieriq.com:9000/hbase -details
> >> >
> >> >
> >> > This is waht I got:
> >> >
> >> > 12/01/05 23:23:15 INFO util.HBaseFsck: Loading HBase regioninfo from
> >> > HDFS...
> >> > 12/01/05 23:23:30 ERROR util.HBaseFsck: Bailed out due to:
> >> > java.lang.IllegalArgumentException: Wrong FS: hdfs://
> >> >
> >>
> us01-ciqps1-name01.carrieriq.com:9000/hbase/M2M-INTEGRATION-MM_TION-1325190318714/0003d2ede27668737e192d8430dbe5d0/.regioninfo
> >> ,
> >> > expected: file:///
> >> >        at
> org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:352)
> >> >        at
> >> >
> >>
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
> >> >        at
> >> >
> >>
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:368)
> >> >        at
> >> >
> >>
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> >> >        at
> >> >
> >>
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
> >> >        at
> >> >
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284)
> >> >        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:398)
> >> >        at
> >> >
> org.apache.hadoop.hbase.util.HBaseFsck.loadMetaEntry(HBaseFsck.java:256)
> >> >        at
> >> >
> org.apache.hadoop.hbase.util.HBaseFsck.loadTableInfo(HBaseFsck.java:284)
> >> >        at
> >> > org.apache.hadoop.hbase.util.HBaseFsck.rebuildMeta(HBaseFsck.java:402)
> >> >        at
> >> >
> >>
> org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair.main(OfflineMetaRepair.java:90)
> >> >
> >> >
> >> > Q: What am I doing wrong?
> >> >
> >> > Best regards,
> >> > Vladimir Rodionov
> >> > Principal Platform Engineer
> >> > Carrier IQ, www.carrieriq.com
> >> > e-mail: vrodionov@carrieriq.com
> >> >
> >> >
> >>
> >> Confidentiality Notice:  The information contained in this message,
> >> including any attachments hereto, may be confidential and is intended
> to be
> >> read only by the individual or entity to whom this message is
> addressed. If
> >> the reader of this message is not the intended recipient or an agent or
> >> designee of the intended recipient, please note that any review, use,
> >> disclosure or distribution of this message or its attachments, in any
> form,
> >> is strictly prohibited.  If you have received this message in error,
> please
> >> immediately notify the sender and/or Notifications@carrieriq.com and
> >> delete or destroy any copy of this message and its attachments.
> >>
> >
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>



--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

Mime
View raw message