hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-420) Fuse-dfs should cache fs handles
Date Fri, 27 May 2011 21:41:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040458#comment-13040458
] 

Todd Lipcon commented on HDFS-420:
----------------------------------

A little confused here -- the doConnectAsUser function does reference counting in the hashtable
entries, but doDisconnect has ifdeffed out the code that decrements the reference counts and
disconnects. So, it seems from the code and the discussion that the code is attempting to
keep a connection around for the lifetime of the fuse mount for each user.

Given that, I think it's better to remove the refcounting code entirely, or ifdef it throughout
with a define like #define USE_FS_REFCOUNTING.

In terms of locking granularity, do we care that one user's connection process may hold up
another user who is already cached?



> Fuse-dfs should cache fs handles
> --------------------------------
>
>                 Key: HDFS-420
>                 URL: https://issues.apache.org/jira/browse/HDFS-420
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: contrib/fuse-dfs
>    Affects Versions: 0.20.2
>         Environment: Fedora core 10, x86_64, 2.6.27.7-134.fc10.x86_64 #1 SMP (AMD 64),
gcc 4.3.2, java 1.6.0 (IcedTea6 1.4 (fedora-7.b12.fc10-x86_64) Runtime Environment (build
1.6.0_0-b12) OpenJDK 64-Bit Server VM (build 10.0-b19, mixed mode)
>            Reporter: Dima Brodsky
>            Assignee: Brian Bockelman
>             Fix For: 0.23.0
>
>         Attachments: fuse_dfs_020_memleaks.patch, fuse_dfs_020_memleaks_v3.patch, fuse_dfs_020_memleaks_v8.patch,
hdfs-420-1.patch
>
>
> Fuse-dfs should cache fs handles on a per-user basis. This significantly increases performance,
and has the side effect of fixing the current code which leaks fs handles.
> The original bug description follows:
> I run the following test:
> 1.  Run hadoop DFS in single node mode
> 2.  start up fuse_dfs
> 3.  copy my source tree, about 250 megs, into the DFS
>      cp -av * /mnt/hdfs/
> in /var/log/messages I keep seeing:
> Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19
to 1229385138/1229963739
> and then eventually
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> and the file system hangs.  hadoop is still running and I don't see any errors in it's
logs.  I have to unmount the dfs and restart fuse_dfs and then everything is fine again. 
At some point I see the following messages in the /var/log/messages:
> ERROR: dfs problem - could not close file_handle(139677114350528) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log
fuse_dfs.c:1464
> Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139676770220176)
for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log
fuse_dfs.c:1464
> Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139677114812832)
for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log
fuse_dfs.c:1464
> Is this a known issue?  Am I just flooding the system too much.  All of this is being
performed on a single, dual core, machine.
> Thanks!
> ttyl
> Dima

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message