pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Coveney <jcove...@gmail.com>
Subject Can you filter and load at the same time?
Date Wed, 01 Dec 2010 15:57:26 GMT
In order to facilitate more robust loading, I have 2 questions.

1) I know that you can use some wildcards in loading... for example, if you
have 2 files, dog1.txt and dog2.txt, you can load dog*.txt and it will load
more. Is there any way to use regular expressions or anything more powerful
in the actual load? For example, if I want to load 10 different files with a
generally similar name structure but identically structured data, what's the
easiest and fastest way to load them all into the same table?
2) Can you filter as you load? If you do a load then a filter right after
that, it seems wasteful (unless pig/hadoop are smart enough to realize that
it doesn't have to load all the data off the bat)

I appreciate your help
Jon

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message