hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4044) Create symbolic links in HDFS
Date Mon, 20 Apr 2009 19:41:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700905#action_12700905

Edward Capriolo commented on HADOOP-4044:

I am using/helping the hadoop-hive subproject. I wanted to share a use case for symlinks.

For example suppose a directory inside hadoop:
/user/edward/weblogs/{web1.log,web2.log,web3.log}. I can use a Hive EXTERNAL
table to point to the parent directory. I can then use Hive to query this external table.
This is very powerful. This will work unless another file in this directory with a different
format is also in the directory web_logsummary.csv. (this is my case)

Being able to drop in a 'symlink' where a file would go could be used to create structures
from already existing data. Imagine a user that has a large hadoop deployment and is wishing
to migrate/ start using hive. External table is constrained to one directory. They would need
to recode application paths and or move files. If you had a 'symlink' concept anyone can start
using hive without re-organizing or copying data.

Right now, hive has a lot of facilities to deal with input formats, such as specifying delimiters
etc, but forcing the data either into a warehouse or into an external table is limiting. 'Symlinks'
tied together with hive's current input format capabilities would make hive more versatile.

> Create symbolic links in HDFS
> -----------------------------
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch,
symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink4.patch,
symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
> HDFS should support symbolic links. A symbolic link is a special type of file that contains
a reference to another file or directory in the form of an absolute or relative path and that
affects pathname resolution. Programs which read or write to files named by a symbolic link
will behave as if operating directly on the target file. However, archiving utilities can
handle symbolic links specially and manipulate them directly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message