hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanbo Liang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9041) FileSystem initialization can go into infinite loop
Date Sat, 17 Nov 2012 17:28:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499470#comment-13499470
] 

Yanbo Liang commented on HADOOP-9041:
-------------------------------------

Hi Alejandro,
My previous comment may be not very clear. The detail calling stack is described as follow:
If users register org.apache.hadoop.fs.FsUrlStreamHandlerFactory as the current URLStreamHandlerFactory
before calling FileSystem.getFileSystem()->FileSystem.loadFileSystems() will lead infinite
loop.
1) org.apache.hadoop.fs.FsUrlStreamHandlerFactory has been registered as the current URLStreamHandlerFactory.
2) users call FileSystem.getFileSystem()->ClassFileSystem.loadFileSystems().
3) Because before 2) users have never called  FileSystem.loadFileSystems(), so it will execute
the code of fuction FileSystem.loadFileSystems().
4) In FileSystem.loadFileSystems(), it uses ServiceLoader to load providers of FileSystem
such as hdfs, kfs, s3 and etc.
5) When execute ServiceLoader, it need to read the providers of FileSystem from resource directory
such as jar file on local disk. The ServiceLoader will recognize the jar file as URL.
6) ServiceLoader create URL object and open stream to this URL.
7) The URL need to find handler for a specific protocol such as "file:///" then it will call
URL.getURLStreamHandler() and indirectly call FsUrlStreamHandlerFactory.createURLStreamHandler().
8) At the function of FsUrlStreamHandlerFactory.createURLStreamHandler(), it need to recognize
different file system schemes or protocols according to the providers of FileSystem (If the
jar file is on local disk, it need to know the implementaion of LocalFileSystem). But at this
time the providers of FileSystem had not loaded in memory, it will call FileSystem.getFileSystem("file",conf)->FileSystem.loadFileSystems().
We jump to step 2) and drop into infinite loop.

Because the URL is closely relevent with concrete FileSystem implementations, we need to load
FileSystem implemetations before any URL related operations. I mean to call FileSystem.getFileSystemClass("file",conf)
in the construction of class FsUrlStreamHandlerFactory to solve this problem, because FsUrlStreamHandlerFactory
need ensure to know the FileSystem implementation of scheme "file:///" at least and then it
can work regularly. 

The patch had been attached. Looking forward to your comments.
                
> FileSystem initialization can go into infinite loop
> ---------------------------------------------------
>
>                 Key: HADOOP-9041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9041
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.0.2-alpha
>            Reporter: Radim Kolar
>            Assignee: Yanbo Liang
>            Priority: Critical
>         Attachments: fstest.groovy, HADOOP-9041.patch, HADOOP-9041.patch, HADOOP-9041.patch
>
>
> More information is there: https://jira.springsource.org/browse/SHDP-111
> Referenced source code from example is: https://github.com/SpringSource/spring-hadoop/blob/master/src/main/java/org/springframework/data/hadoop/configuration/ConfigurationFactoryBean.java
> from isolating that cause it looks like if you register: org.apache.hadoop.fs.FsUrlStreamHandlerFactory
before calling FileSystem.loadFileSystems() then it goes into infinite loop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message