hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hammerbacher <ham...@cloudera.com>
Subject Re: Warehouse 'symlinks'
Date Mon, 20 Apr 2009 09:29:08 GMT
Hey Edward,

Can you just treat the files as external tables?

Later,
Jeff

On Sun, Apr 19, 2009 at 8:24 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> On Sun, Apr 19, 2009 at 3:19 AM, Dhruba Borthakur <dhruba@gmail.com>
> wrote:
> > HADOOP-4044 is scheduled to finally make it to 0.21 release. And 0.21 is
> > still a while away.
> >
> > That said, if one imports a data-set (set of files, or directory) into a
> > warehouse, isn't it safer to move that dataset into the warehouse itself
> > rather than letting it sit outside. For one thing, the target of the
> symlink
> > might not be accessible to all hadoop slave nodes.
> >
> > -dhruba
> >
> >
> > On Sat, Apr 18, 2009 at 7:41 PM, Edward Capriolo <edlinuxguru@gmail.com
> >wrote:
> >
> >> I was looking at HADOOP-4044. It would be nice to be able to work on
> >> files without moving them into the warehouse. Could a SerDe handle a
> >> similar task?
> >>
> >
>
> Yes it would be safer to move it inside.
>
> The reason I would like to do this is in our deployment map reduce
> programs are creating files outside of the warehouse. I do not want to
> move them into the warehouse and I do not want to copy them. Being
> able to 'symlink' would allow me to assemble virtual tables/ without
> moving data changing the flow of an already existing process.
>
> So I am only looking to symlink to other files in the same filesystem.
> On the extreme end a symlink to an external resource could be very
> useful to but that is not what I was thinking of.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message