hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thejas Nair <te...@yahoo-inc.com>
Subject LoadFunc.skipNext() function for faster sampling ?
Date Wed, 04 Nov 2009 00:28:37 GMT
In the new implementation of SampleLoader subclasses (used by order-by,
skew-join ..) as part of the loader redesign, we are not only reading all
the records input but also parsing them as pig tuples.

This is because the SampleLoaders are wrappers around the actual input
loaders specified in the query. We can make things much faster by having a
skipNext() function (or skipNext(int numSkip) ) which will avoid parsing the
record into a pig tuple.
LoadFunc could optionally implement this (easy to implement) function (which
will be part of an interface) for improving speed of queries such as
order-by.

-Thejas


Mime
View raw message