hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From drd_ <dharmen...@smartzip.com>
Subject PIG bin/labeling relation
Date Fri, 20 Nov 2009 18:18:00 GMT

I am using PIG and this is what I am trying to do:

1) Sort a relation A into B by a field x. The smallest value of x is first.
Just use SORT.

2) Label each tuple in B with a number denoting its order in the sorted
relation. So the first tuple would be labeled with a 1, the second tuple
with a 2, the third with a 3 and so on. Not certain how to do this.

3) Derive a relation C where each row is a bag of tuples. The first row
contains the first n1 tuples from relation B, the second row contains the
tuples from B labeled (n1 + 1) to n2 from, the third row contains the tuples
from B labeled (n2 + 1) to n3 and so on to n100. This step is simple (just
use filter) once we've labeled each tuple in B with a number. 

The question: how do I do step 2).

View this message in context: http://old.nabble.com/PIG-bin-labeling-relation-tp26443615p26443615.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

View raw message