hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Warehouse 'symlinks'
Date Sun, 19 Apr 2009 15:24:46 GMT
On Sun, Apr 19, 2009 at 3:19 AM, Dhruba Borthakur <dhruba@gmail.com> wrote:
> HADOOP-4044 is scheduled to finally make it to 0.21 release. And 0.21 is
> still a while away.
> That said, if one imports a data-set (set of files, or directory) into a
> warehouse, isn't it safer to move that dataset into the warehouse itself
> rather than letting it sit outside. For one thing, the target of the symlink
> might not be accessible to all hadoop slave nodes.
> -dhruba
> On Sat, Apr 18, 2009 at 7:41 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>> I was looking at HADOOP-4044. It would be nice to be able to work on
>> files without moving them into the warehouse. Could a SerDe handle a
>> similar task?

Yes it would be safer to move it inside.

The reason I would like to do this is in our deployment map reduce
programs are creating files outside of the warehouse. I do not want to
move them into the warehouse and I do not want to copy them. Being
able to 'symlink' would allow me to assemble virtual tables/ without
moving data changing the flow of an already existing process.

So I am only looking to symlink to other files in the same filesystem.
On the extreme end a symlink to an external resource could be very
useful to but that is not what I was thinking of.

View raw message