apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shubham Pathak <shub...@datatorrent.com>
Subject Writing Custom Partitioner
Date Mon, 08 Feb 2016 10:12:45 GMT

I need some suggestions / pointers related to defining a custom partitioner.

The operators in my application process a custom tuple class ( lets call it
TUPLE) . This data type has a single field ArrayList.. So each tuple
represents a list of values.

For a typical word count problem, my dag would be

WordGenerator -> <STRING> -> Tokenizer -> <TUPLE> -> Counter ->
 <TUPLE> ->

and  if i were to use  TUPLE, tokenizer will emit TUPLE that contains array
list with contents <word,count>

Now i wish to partition Counter and each instance should receive all tuples
containing same word.

I know that by default , hashCode()  method of custom tuple class would be
used , but in my case custom tuple class is an arrayList  and i wish to
specify that hashCode must be done on just the first field in ArrayList. In
a generic case it could also be on multiple fields in array list.

Do we have any examples that i could refer to ?

Also can this be done at application level by setting an attribute ?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message