hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse Yates <jesse.k.ya...@gmail.com>
Subject Re: CleanerChore exception
Date Mon, 31 Dec 2012 00:13:45 GMT
Hey,

So the point of all the delete code in the cleaner is to try and delete
each of the files in the directory and then delete the directory, assuming
its empty- it shouldn't leak the IOException if it the directory is found
to be empty and then gets a file added.

This is really odd though, as failures should return false, not throw an
exception (boo HDFS javadocs). Looking at the 0.94 and 0.96 code, it its
just logged, which it annoying, but doesn't mean broken code.

Otherwise, Jean-Marc's analysis looks right. Should be a simple fix. I
filed HBASE-7465 and should have a patch up shortly.

As an aside, this method is actually tested (in a somewhat roundabout way)
in TestCleanerChore#testCleanerDoesNotDeleteDirectoryWithLateAddedFiles
with a spy object that ensures we get this non-error case.

-Jesse
-------------------
Jesse Yates
@jesse_yates
jyates.github.com


On Sun, Dec 30, 2012 at 11:50 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> The Javadoc is saying:
>
> "@return <tt>true</tt> if the directory was deleted, <tt>false</tt>
> otherwise"
>
> So I think the line "return canDeleteThis ? fs.delete(toCheck, false)
> : false;" is still correct. It's retuning false if the directory has
> not been deleted.
>
> There is no exception here. If the TTL for a file had not expired, the
> file can't be deleted and false is returned. I think it's correct
> behaviour.
>
> The idea of not passing "true" for the recursivity is explained on the
> comments:
>     // if all the children have been deleted, then we should try to
> delete this directory. However,
>     // don't do so recursively so we don't delete files that have been
> added since we checked.
> And I think it's good. So the issue is really when the directory is
> empty and listStatus is sending back null. Then if (children == null)
> return true; is simply returning true without deleting the current
> directory.
>
> This should be changed by something like
> if (children == null) return fs.delete(toCheck, false);
> Which will try to delete the current directory, return true or false
> if possible or not, and throw an expection if there is any issue with
> the FS...
>
> I have done some modifications. I'm compiling and will deploy the
> updated version on my local cluster soon. I will keep you posted on
> the result.
>
> JM
>
> 2012/12/30, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
> > Thanks for the confirmation.
> >
> > Also, seems that there is no test class related to
> > checkAndDeleteDirectory. It might be good to add that too.
> >
> > I have extracted 0.94.3 0.94.4RC0 and the trunk and they are all
> > identical for this methode.
> >
> > I will try to do some modifications and see the results...
> >
> > So far there is 2 options. One is to change the "return null" to
> > handle the current empty directory, and another one is to call
> > fs.delete() directly from checkAndDeleteDirectory instead of the
> > existing code.
> >
> > Will wait for Jesse's feedback.
> >
> > JM
> >
> > 2012/12/30, Ted Yu <yuzhihong@gmail.com>:
> >> Thanks for the digging. This concurs with my suspicion in the beginning.
> >>
> >> I am copying Jesse who wrote the code. He should have more insight on
> >> this.
> >>
> >> After his confirmation, you can log a JIRA.
> >>
> >> Cheers
> >>
> >> On Sun, Dec 30, 2012 at 10:59 AM, Jean-Marc Spaggiari <
> >> jean-marc@spaggiari.org> wrote:
> >>
> >>> So. Looking deeper I found few things.
> >>>
> >>> First, why checkAndDeleteDirectory is not "simply" calling
> >>> FSUtils.delete (fs, toCheck, true)? I guess it's doing the same thing?
> >>>
> >>> Also, FSUtils.listStatus(fs, toCheck, null); will return null if there
> >>> is no status. Not just an empty array. And it's returning null, we
> >>> will exit without calling the delete methode.
> >>>
> >>> I tried to manually create a file on one of those directories. The
> >>> exception disapears for 300 seconds because of the TTL for the newly
> >>> created file. After 300 seconds, the file I pushed AND the directory
> >>> got removed. So the issue is really with empty directories.
> >>>
> >>> I will take a look at what is in the trunk and in 0.94.4 to see if
> >>> it's the same issue. But I think we can simple change all this code by
> >>> a call to FSUtils.delete.
> >>>
> >>> I can open a JIRA and submit a patch for that. Just let me know.
> >>>
> >>> JM
> >>>
> >>> 2012/12/30, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
> >>> > Regargind the logcleaner settings, I have not changed anything. It's
> >>> > what came with the initial install. So I don't have anything setup
> for
> >>> > this plugin in my configuration files.
> >>> >
> >>> > For the files on the FS, here is what I have:
> >>> > hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls
> >>> > /hbase/.archive/entry_duplicate
> >>> > Found 30 items
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/00c185bc44b6dcf85a90b83bdda4ec2e
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/0ddf0d1802c6afd97d032fd09ea9e37d
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/18cf7c5c946ddf33e49b227feedfb688
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/2353f10e79dacc5cf201be6a1eb63607
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
> >>> > /hbase/.archive/entry_duplicate/243f4007cf05415062010a5650598bff
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
> >>> > /hbase/.archive/entry_duplicate/287682333698e36cea1670f5479fbf18
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/3742da9bd798342e638e1ce341f27537
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
> >>> > /hbase/.archive/entry_duplicate/435c9c08bc08ed7248a013b6ffaa163b
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/45346b4b4248d77d45e031ea71a1fb63
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/4afe48fe6d8defe569f8632dd2514b07
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/68a4e364fe791a0d1f47febbb41e8112
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/7673d718962535c7b54cef51830f22a5
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
> >>> > /hbase/.archive/entry_duplicate/7df6845ae9d052f4eae4a01e39313d61
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/8c5a263167d1b09f645af8efb4545554
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/8c98d9c635ba30d467d127a2ec1c69f8
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/8dfa96393e18ecca826fd9200e6bf68b
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/8e8f532e91a7197cd53b7626130be698
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/8eca1a325fe442a8546e43ac2f00cfef
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/9ad4c0551b90ea7717d7e3aaec76dc26
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/a135ccbc6f61ce544dbd537dc12489e9
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/a3d0332a6d51a8b15b99d1caca3f355a
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/bd2b8c942af27e541e20e430d506d2c0
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/c10c3a66948bde75fc41349108d86cf9
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
> >>> > /hbase/.archive/entry_duplicate/cbf2f178691bfca8a7e9825115629b8e
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/d14a2546eaceede73b282e444ad1bb40
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
> >>> > /hbase/.archive/entry_duplicate/d570a21a39e04ba2ec896bbe7166423c
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/e943bda56acd6beb35bdd56f0560f87f
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/ef5692ba83aba48d9e7a6b9c2cd0661e
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/fd85dd319c289959a790faed32ef1530
> >>> > drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
> >>> > /hbase/.archive/entry_duplicate/ffcdf6554accda1800e74838b67d3004
> >>> > hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls
> >>> > /hbase/.archive/entry_duplicate/00c185bc44b6dcf85a90b83bdda4ec2e
> >>> > hadoop@node3:~/hadoop-1.0.3$
> >>> >
> >>> > I  have not lookeqd into ALL the subdirectories, but the 10 first are
> >>> > empty.
> >>> >
> >>> > I see that there is some traces on checkAndDeleteDirectory... I will
> >>> > try to activate that and see if there is more details.
> >>> >
> >>> >
> >>> > JM
> >>> >
> >>> > 2012/12/30, Ted Yu <yuzhihong@gmail.com>:
> >>> >> The exception came from this line:
> >>> >>           if (file.isDir()) checkAndDeleteDirectory(file.getPath());
> >>> >> Looking at checkAndDeleteDirectory(), it recursively deletes files
> >>> >> and
> >>> >> directories under the specified path.
> >>> >>
> >>> >> Does /hbase/.archive/entry_duplicate only contain empty directories
> >>> >> underneath it ?
> >>> >>
> >>> >> You didn't modify the logcleaner plugin setting, right ?
> >>> >>   <property>
> >>> >>     <name>hbase.master.logcleaner.plugins</name>
> >>> >>
> >>> >>
> >>>
> <value>org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner</value>
> >>> >>
> >>> >>  </property>
> >>> >>
> >>> >> Cheers
> >>> >>
> >>> >> On Sun, Dec 30, 2012 at 9:53 AM, Jean-Marc Spaggiari <
> >>> >> jean-marc@spaggiari.org> wrote:
> >>> >>
> >>> >>> I was going to move to 0.94.4 today ;) And yes I'm using 0.94.3.
I
> >>> >>> might wait a bit in case some testing is required with my version.
> >>> >>>
> >>> >>> Is this what you are looking for? http://pastebin.com/N8Q0FMba
> >>> >>>
> >>> >>> I will keep the files for now since it seems it's not causing
any
> >>> >>> major issue. That will allow some more testing if required.
> >>> >>>
> >>> >>> JM
> >>> >>>
> >>> >>>
> >>> >>> 2012/12/30, Ted Yu <yuzhihong@gmail.com>:
> >>> >>> > Looks like you're using 0.94.3
> >>> >>> >
> >>> >>> > The archiver is backport of:
> >>> >>> > HBASE-5547, Don't delete HFiles in backup mode
> >>> >>> >
> >>> >>> > Can you provide more the log where the IOE was reported
using
> >>> pastebin
> >>> >>> > ?
> >>> >>> >
> >>> >>> > Thanks
> >>> >>> >
> >>> >>> > On Sun, Dec 30, 2012 at 9:08 AM, Jean-Marc Spaggiari <
> >>> >>> > jean-marc@spaggiari.org> wrote:
> >>> >>> >
> >>> >>> >> Hi,
> >>> >>> >>
> >>> >>> >> I have a "IOException" /hbase/.archive/table_name
is non empty
> >>> >>> >> exception every minute on my logs.
> >>> >>> >>
> >>> >>> >> There is 30 directories under this directory. the
main directory
> >>> >>> >> is
> >>> >>> >> from yesterday, but all sub directories are from December
10th,
> >>> >>> >> all
> >>> >>> >> the same time.
> >>> >>> >>
> >>> >>> >> What does this .archive directory is used for, and
what should I
> >>> >>> >> do?
> >>> >>> >>
> >>> >>> >> Thanks,
> >>> >>> >>
> >>> >>> >> JM
> >>> >>> >>
> >>> >>> >
> >>> >>>
> >>> >>
> >>> >
> >>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message