pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Li <ji...@cs.duke.edu>
Subject Early projection and lazy casting
Date Sat, 03 Dec 2011 00:05:59 GMT
Hi all,

We just figured out Pig 0.9.1 doesn't drop those non-necessary fields asap,
which really affects the performance. Though
http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html#loadfunc_loaderpushdownsaid
that "As part of its optimizations Pig analyzes Pig Latin scripts and
determines what fields in an input it needs at each step in the script. It
uses this information to aggressively drop fields it no longer needs."

We also found that Pig casts the data into the types defined in the schema,
which is usually unnecessary, as most of them will be soon dropped.

To work around these, we have to manually drop those fields and remove the
types in the schema, which are really not interesting.

Jie

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message