hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang Bingjun (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-420) fuse_dfs is unable to connect to the dfs after a copying a large number of files into the dfs over fuse
Date Sat, 29 Aug 2009 04:36:32 GMT

    [ https://issues.apache.org/jira/browse/HDFS-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749075#action_12749075
] 

Zhang Bingjun commented on HDFS-420:
------------------------------------

Hi Craig,

You are right. The fuse-dfs process grows in size when I am writing continuously into HDFS
through FUSE. 

In my test, I installed hadoop-0.20.0 on a 11 node cluster with one namenode host, one secondary
host, and 9 datanode hosts all with ubuntu 9.04 linux. The namenode has 1GB memory. The script
writing file into HDFS through FUSE was also running on the namenode. When 12418 files with
100K each were wrote (just used small files to write as many as possible in a short time),
the fuse-dfs exhausted all the 1G memory and then died out. During the writing, the occupied
memory of fuse-dfs kept growing. 

I guess the issue you mentioned about releasing hdfsFS is one cause. I am not sure whether
there are still other places causing the memory leak. 

I am reading the libhdfs and fuse-dfs code now. And hopefully I can get familiar with the
thing and propose some fixes. I will see. But if anyone also encountered this issue and is
will to discuss how to solve it. Please give your input here. Really hope we can make fuse-dfs
a stable tool to use so that HDFS will appear to linux as a nicely mounted local directory.


Thanks!
Bingjun


> fuse_dfs is unable to connect to the dfs after a copying a large number of files into
the dfs over fuse
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-420
>                 URL: https://issues.apache.org/jira/browse/HDFS-420
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: contrib/fuse-dfs
>         Environment: Fedora core 10, x86_64, 2.6.27.7-134.fc10.x86_64 #1 SMP (AMD 64),
gcc 4.3.2, java 1.6.0 (IcedTea6 1.4 (fedora-7.b12.fc10-x86_64) Runtime Environment (build
1.6.0_0-b12) OpenJDK 64-Bit Server VM (build 10.0-b19, mixed mode)
>            Reporter: Dima Brodsky
>
> I run the following test:
> 1.  Run hadoop DFS in single node mode
> 2.  start up fuse_dfs
> 3.  copy my source tree, about 250 megs, into the DFS
>      cp -av * /mnt/hdfs/
> in /var/log/messages I keep seeing:
> Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19
to 1229385138/1229963739
> and then eventually
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> and the file system hangs.  hadoop is still running and I don't see any errors in it's
logs.  I have to unmount the dfs and restart fuse_dfs and then everything is fine again. 
At some point I see the following messages in the /var/log/messages:
> ERROR: dfs problem - could not close file_handle(139677114350528) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log
fuse_dfs.c:1464
> Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139676770220176)
for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log
fuse_dfs.c:1464
> Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139677114812832)
for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log
fuse_dfs.c:1464
> Is this a known issue?  Am I just flooding the system too much.  All of this is being
performed on a single, dual core, machine.
> Thanks!
> ttyl
> Dima

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message