hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4044) Create symbolic links in HDFS
Date Mon, 20 Apr 2009 19:41:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700905#action_12700905
] 

Edward Capriolo commented on HADOOP-4044:
-----------------------------------------

I am using/helping the hadoop-hive subproject. I wanted to share a use case for symlinks.

For example suppose a directory inside hadoop:
/user/edward/weblogs/{web1.log,web2.log,web3.log}. I can use a Hive EXTERNAL
table to point to the parent directory. I can then use Hive to query this external table.
This is very powerful. This will work unless another file in this directory with a different
format is also in the directory web_logsummary.csv. (this is my case)

Being able to drop in a 'symlink' where a file would go could be used to create structures
from already existing data. Imagine a user that has a large hadoop deployment and is wishing
to migrate/ start using hive. External table is constrained to one directory. They would need
to recode application paths and or move files. If you had a 'symlink' concept anyone can start
using hive without re-organizing or copying data.

Right now, hive has a lot of facilities to deal with input formats, such as specifying delimiters
etc, but forcing the data either into a warehouse or into an external table is limiting. 'Symlinks'
tied together with hive's current input format capabilities would make hive more versatile.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch,
symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink4.patch,
symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains
a reference to another file or directory in the form of an absolute or relative path and that
affects pathname resolution. Programs which read or write to files named by a symbolic link
will behave as if operating directly on the target file. However, archiving utilities can
handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message