hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Bockelman (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-420) fuse_dfs is unable to connect to the dfs after a copying a large number of files into the dfs over fuse
Date Tue, 29 Dec 2009 00:14:29 GMT

     [ https://issues.apache.org/jira/browse/HDFS-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brian Bockelman updated HDFS-420:
---------------------------------

    Attachment: HDFS-420.patch

Hey Dima,

First, try applying HDFS-464 to alleviate some memory leaks present in libhdfs.

Then, try the attached file; this releases the reference to the FileSystem.

With this, we have been able to run FUSE-DFS stably for weeks at a time.  I've taken a heap
dump of the resulting system and not found any more obvious leaks (with these two patches,
albeit on Hadoop-0.19.x versions of them, I was able to set the heap size to 4MB and create
tens of thousands of files).

To debug better, repeat your test with fuse_dfs in a second terminal with the "-d" option
to make it stay in the foreground.  In this case, you will be able to capture the error messages
Hadoop spits out.  Alternately, you can muck around with the log4j settings and get these
to spit out to a log file.

Brian

> fuse_dfs is unable to connect to the dfs after a copying a large number of files into
the dfs over fuse
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-420
>                 URL: https://issues.apache.org/jira/browse/HDFS-420
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: contrib/fuse-dfs
>         Environment: Fedora core 10, x86_64, 2.6.27.7-134.fc10.x86_64 #1 SMP (AMD 64),
gcc 4.3.2, java 1.6.0 (IcedTea6 1.4 (fedora-7.b12.fc10-x86_64) Runtime Environment (build
1.6.0_0-b12) OpenJDK 64-Bit Server VM (build 10.0-b19, mixed mode)
>            Reporter: Dima Brodsky
>         Attachments: HDFS-420.patch
>
>
> I run the following test:
> 1.  Run hadoop DFS in single node mode
> 2.  start up fuse_dfs
> 3.  copy my source tree, about 250 megs, into the DFS
>      cp -av * /mnt/hdfs/
> in /var/log/messages I keep seeing:
> Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19
to 1229385138/1229963739
> and then eventually
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs fuse_dfs.c:1037
> and the file system hangs.  hadoop is still running and I don't see any errors in it's
logs.  I have to unmount the dfs and restart fuse_dfs and then everything is fine again. 
At some point I see the following messages in the /var/log/messages:
> ERROR: dfs problem - could not close file_handle(139677114350528) for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log
fuse_dfs.c:1464
> Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139676770220176)
for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log
fuse_dfs.c:1464
> Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close file_handle(139677114812832)
for /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log
fuse_dfs.c:1464
> Is this a known issue?  Am I just flooding the system too much.  All of this is being
performed on a single, dual core, machine.
> Thanks!
> ttyl
> Dima

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message