hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted <yuzhih...@gmail.com>
Subject Re: CleanerChore exception
Date Mon, 31 Dec 2012 00:29:01 GMT
Jean-Marc:
Can you confirm that the Jira Jesse logged reflects your case ?

Thanks

On Dec 30, 2012, at 4:13 PM, Jesse Yates <jesse.k.yates@gmail.com> wrote:

> Hey,
> 
> So the point of all the delete code in the cleaner is to try and delete
> each of the files in the directory and then delete the directory, assuming
> its empty- it shouldn't leak the IOException if it the directory is found
> to be empty and then gets a file added.
> 
> This is really odd though, as failures should return false, not throw an
> exception (boo HDFS javadocs). Looking at the 0.94 and 0.96 code, it its
> just logged, which it annoying, but doesn't mean broken code.
> 
> Otherwise, Jean-Marc's analysis looks right. Should be a simple fix. I
> filed HBASE-7465 and should have a patch up shortly.
> 
> As an aside, this method is actually tested (in a somewhat roundabout way)
> in TestCleanerChore#testCleanerDoesNotDeleteDirectoryWithLateAddedFiles
> with a spy object that ensures we get this non-error case.
> 
> -Jesse
> -------------------
> Jesse Yates
> @jesse_yates
> jyates.github.com
> 
> 
> On Sun, Dec 30, 2012 at 11:50 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
> 
>> The Javadoc is saying:
>> 
>> "@return <tt>true</tt> if the directory was deleted, <tt>false</tt>
>> otherwise"
>> 
>> So I think the line "return canDeleteThis ? fs.delete(toCheck, false)
>> : false;" is still correct. It's retuning false if the directory has
>> not been deleted.
>> 
>> There is no exception here. If the TTL for a file had not expired, the
>> file can't be deleted and false is returned. I think it's correct
>> behaviour.
>> 
>> The idea of not passing "true" for the recursivity is explained on the
>> comments:
>>    // if all the children have been deleted, then we should try to
>> delete this directory. However,
>>    // don't do so recursively so we don't delete files that have been
>> added since we checked.
>> And I think it's good. So the issue is really when the directory is
>> empty and listStatus is sending back null. Then if (children == null)
>> return true; is simply returning true without deleting the current
>> directory.
>> 
>> This should be changed by something like
>> if (children == null) return fs.delete(toCheck, false);
>> Which will try to delete the current directory, return true or false
>> if possible or not, and throw an expection if there is any issue with
>> the FS...
>> 
>> I have done some modifications. I'm compiling and will deploy the
>> updated version on my local cluster soon. I will keep you posted on
>> the result.
>> 
>> JM
>> 
>> 2012/12/30, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
>>> Thanks for the confirmation.
>>> 
>>> Also, seems that there is no test class related to
>>> checkAndDeleteDirectory. It might be good to add that too.
>>> 
>>> I have extracted 0.94.3 0.94.4RC0 and the trunk and they are all
>>> identical for this methode.
>>> 
>>> I will try to do some modifications and see the results...
>>> 
>>> So far there is 2 options. One is to change the "return null" to
>>> handle the current empty directory, and another one is to call
>>> fs.delete() directly from checkAndDeleteDirectory instead of the
>>> existing code.
>>> 
>>> Will wait for Jesse's feedback.
>>> 
>>> JM
>>> 
>>> 2012/12/30, Ted Yu <yuzhihong@gmail.com>:
>>>> Thanks for the digging. This concurs with my suspicion in the beginning.
>>>> 
>>>> I am copying Jesse who wrote the code. He should have more insight on
>>>> this.
>>>> 
>>>> After his confirmation, you can log a JIRA.
>>>> 
>>>> Cheers
>>>> 
>>>> On Sun, Dec 30, 2012 at 10:59 AM, Jean-Marc Spaggiari <
>>>> jean-marc@spaggiari.org> wrote:
>>>> 
>>>>> So. Looking deeper I found few things.
>>>>> 
>>>>> First, why checkAndDeleteDirectory is not "simply" calling
>>>>> FSUtils.delete (fs, toCheck, true)? I guess it's doing the same thing?
>>>>> 
>>>>> Also, FSUtils.listStatus(fs, toCheck, null); will return null if there
>>>>> is no status. Not just an empty array. And it's returning null, we
>>>>> will exit without calling the delete methode.
>>>>> 
>>>>> I tried to manually create a file on one of those directories. The
>>>>> exception disapears for 300 seconds because of the TTL for the newly
>>>>> created file. After 300 seconds, the file I pushed AND the directory
>>>>> got removed. So the issue is really with empty directories.
>>>>> 
>>>>> I will take a look at what is in the trunk and in 0.94.4 to see if
>>>>> it's the same issue. But I think we can simple change all this code by
>>>>> a call to FSUtils.delete.
>>>>> 
>>>>> I can open a JIRA and submit a patch for that. Just let me know.
>>>>> 
>>>>> JM
>>>>> 
>>>>> 2012/12/30, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
>>>>>> Regargind the logcleaner settings, I have not changed anything. It's
>>>>>> what came with the initial install. So I don't have anything setup
>> for
>>>>>> this plugin in my configuration files.
>>>>>> 
>>>>>> For the files on the FS, here is what I have:
>>>>>> hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls
>>>>>> /hbase/.archive/entry_duplicate
>>>>>> Found 30 items
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/00c185bc44b6dcf85a90b83bdda4ec2e
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/0ddf0d1802c6afd97d032fd09ea9e37d
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/18cf7c5c946ddf33e49b227feedfb688
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/2353f10e79dacc5cf201be6a1eb63607
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
>>>>>> /hbase/.archive/entry_duplicate/243f4007cf05415062010a5650598bff
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
>>>>>> /hbase/.archive/entry_duplicate/287682333698e36cea1670f5479fbf18
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/3742da9bd798342e638e1ce341f27537
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
>>>>>> /hbase/.archive/entry_duplicate/435c9c08bc08ed7248a013b6ffaa163b
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/45346b4b4248d77d45e031ea71a1fb63
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/4afe48fe6d8defe569f8632dd2514b07
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/68a4e364fe791a0d1f47febbb41e8112
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/7673d718962535c7b54cef51830f22a5
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
>>>>>> /hbase/.archive/entry_duplicate/7df6845ae9d052f4eae4a01e39313d61
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/8c5a263167d1b09f645af8efb4545554
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/8c98d9c635ba30d467d127a2ec1c69f8
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/8dfa96393e18ecca826fd9200e6bf68b
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/8e8f532e91a7197cd53b7626130be698
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/8eca1a325fe442a8546e43ac2f00cfef
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/9ad4c0551b90ea7717d7e3aaec76dc26
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/a135ccbc6f61ce544dbd537dc12489e9
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/a3d0332a6d51a8b15b99d1caca3f355a
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/bd2b8c942af27e541e20e430d506d2c0
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/c10c3a66948bde75fc41349108d86cf9
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
>>>>>> /hbase/.archive/entry_duplicate/cbf2f178691bfca8a7e9825115629b8e
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/d14a2546eaceede73b282e444ad1bb40
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:38
>>>>>> /hbase/.archive/entry_duplicate/d570a21a39e04ba2ec896bbe7166423c
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/e943bda56acd6beb35bdd56f0560f87f
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/ef5692ba83aba48d9e7a6b9c2cd0661e
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/fd85dd319c289959a790faed32ef1530
>>>>>> drwxr-xr-x   - hbase supergroup          0 2012-12-10 14:39
>>>>>> /hbase/.archive/entry_duplicate/ffcdf6554accda1800e74838b67d3004
>>>>>> hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls
>>>>>> /hbase/.archive/entry_duplicate/00c185bc44b6dcf85a90b83bdda4ec2e
>>>>>> hadoop@node3:~/hadoop-1.0.3$
>>>>>> 
>>>>>> I  have not lookeqd into ALL the subdirectories, but the 10 first
are
>>>>>> empty.
>>>>>> 
>>>>>> I see that there is some traces on checkAndDeleteDirectory... I will
>>>>>> try to activate that and see if there is more details.
>>>>>> 
>>>>>> 
>>>>>> JM
>>>>>> 
>>>>>> 2012/12/30, Ted Yu <yuzhihong@gmail.com>:
>>>>>>> The exception came from this line:
>>>>>>>          if (file.isDir()) checkAndDeleteDirectory(file.getPath());
>>>>>>> Looking at checkAndDeleteDirectory(), it recursively deletes
files
>>>>>>> and
>>>>>>> directories under the specified path.
>>>>>>> 
>>>>>>> Does /hbase/.archive/entry_duplicate only contain empty directories
>>>>>>> underneath it ?
>>>>>>> 
>>>>>>> You didn't modify the logcleaner plugin setting, right ?
>>>>>>>  <property>
>>>>>>>    <name>hbase.master.logcleaner.plugins</name>
>> <value>org.apache.hadoop.hbase.master.cleaner.TimeToLiveLogCleaner</value>
>>>>>>> 
>>>>>>> </property>
>>>>>>> 
>>>>>>> Cheers
>>>>>>> 
>>>>>>> On Sun, Dec 30, 2012 at 9:53 AM, Jean-Marc Spaggiari <
>>>>>>> jean-marc@spaggiari.org> wrote:
>>>>>>> 
>>>>>>>> I was going to move to 0.94.4 today ;) And yes I'm using
0.94.3. I
>>>>>>>> might wait a bit in case some testing is required with my
version.
>>>>>>>> 
>>>>>>>> Is this what you are looking for? http://pastebin.com/N8Q0FMba
>>>>>>>> 
>>>>>>>> I will keep the files for now since it seems it's not causing
any
>>>>>>>> major issue. That will allow some more testing if required.
>>>>>>>> 
>>>>>>>> JM
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2012/12/30, Ted Yu <yuzhihong@gmail.com>:
>>>>>>>>> Looks like you're using 0.94.3
>>>>>>>>> 
>>>>>>>>> The archiver is backport of:
>>>>>>>>> HBASE-5547, Don't delete HFiles in backup mode
>>>>>>>>> 
>>>>>>>>> Can you provide more the log where the IOE was reported
using
>>>>> pastebin
>>>>>>>>> ?
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> On Sun, Dec 30, 2012 at 9:08 AM, Jean-Marc Spaggiari
<
>>>>>>>>> jean-marc@spaggiari.org> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I have a "IOException" /hbase/.archive/table_name
is non empty
>>>>>>>>>> exception every minute on my logs.
>>>>>>>>>> 
>>>>>>>>>> There is 30 directories under this directory. the
main directory
>>>>>>>>>> is
>>>>>>>>>> from yesterday, but all sub directories are from
December 10th,
>>>>>>>>>> all
>>>>>>>>>> the same time.
>>>>>>>>>> 
>>>>>>>>>> What does this .archive directory is used for, and
what should I
>>>>>>>>>> do?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> JM
>> 

Mime
View raw message