hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Fully qualified path names in distributed log splitting.
Date Tue, 05 Feb 2013 07:48:11 GMT
Ah yeah, I pushed this from 0.94.5 to 0.94.6 myself :)  Serves me right.


 From: Elliott Clark <eclark@apache.org>
To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars hofhansl <larsh@apache.org>

Sent: Monday, February 4, 2013 11:39 PM
Subject: Re: Fully qualified path names in distributed log splitting.
HBASE-7723 <https://issues.apache.org/jira/browse/HBASE-7723> attempts to
fix this.  The issue arises when moving from standard nn to HA and back.

On Mon, Feb 4, 2013 at 11:32 PM, lars hofhansl <larsh@apache.org> wrote:

> We just found ourselves in an interesting pickle.
> We were upgrading one of our clusters from HBase 0.94.0 on Hadoop 1.0.4 to
> HBase 0.94.4 on top of Hadoop 2.
> The cluster has been setup a while ago and the old shutdown script had a
> bug and shutdown HBase and HDFS uncleanly.
> Assuming that the log will be replayed we upgraded Hadoop to 2.0.x, and
> verified that from a file system view everything is OK.
> The new HDFS runs with an HA NameNode, so the FS changed from hdfs://<old
> host name> to hdfs://<ha cluster name>
> Then we brought up HBase and found it stuck in splitting logs forever.
> In the log we see messages like these:
> 2013-02-05 06:22:31,045 ERROR
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
> java.lang.IllegalArgumentException:
>  Wrong FS:
> hdfs://<old NN host>/.logs/<rs host>,60020,1358540589323-splitting/<rs
> host>%2C60020%2C1358540589323.1359962644861,
>  expected: hdfs://<ha cluster name>
>         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:547)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:169)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:783)
>         at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:111)
>         at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264)
>         at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:195)
>         at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:163)
>         at java.lang.Thread.run(Thread.java:662)
> So it looks like distributed log splitting stores the full HDFS path name
> including the host, which seems unnecessary.
> This path is stored in ZK.
> So all in all it seems that only can happen if all the following is true:
> unclean shutdown, keeping the same ZK ensemble, changed FS.
> The data is not important, we can just blow it away, but we want to prove
> that we could recover the data if we had to.
> It seems we have three options:
> 1. Blow away the data in ZK under "splitlog", and restart HBase. It should
> restart the split process with the correct pathnames.
> 2. Temporarily change the config for the region server to set the root dir
> to hdfs://<old NN host>, bounce HBase. The log splitting should now be able
> to succeed.
> 3. Downgrade back to the old Hadoop (we kept a copy of the image).
> We're trying option #2, to see whether that would fix it. #1 should work
> too.
> Has anybody else experienced this?
> It seems that would also limit our ability to take a snapshot of a
> filesystem and move it to somewhere else, as the hostnames are hardcoded,
> at least in ZK for log splitting.
> -- Lars
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message