hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: PriorityQueueWritable
Date Mon, 15 Oct 2012 18:39:22 GMT
Also, another advantage in trying to make use of the shuffle/sort is that
your sorted list can grow beyond the size of memory.  A risk in trying to
pack this data into a sorted ArrayWritable is that the list would grow too
large to fit in memory.


On Mon, Oct 15, 2012 at 11:37 AM, Chris Nauroth <cnauroth@hortonworks.com>wrote:

> I think it would work, but I'm wondering if it would be easier for your
> application to restructure the keys emitted from the mapper tasks so that
> you can take advantage of the sorting inherently done during the shuffle.
> For each reduce task, your reducer code will receive keys emitted from
> mappers in sorted order.  Therefore, if the keys emitted from your mapper
> contain the item's priority, then the shuffle would provide the sort order
> that you need.  This might lead you down the path of writing a custom
> WritableComparable to use as the map output key, but this is usually pretty
> trivial.
> Also, keep in mind that if you run multiple reduce tasks, then each
> reducer receives a subset of the keys emitted from the mapper.  Depending
> on your application logic, this may or may not be a problem.
> Thanks,
> --Chris
> On Mon, Oct 15, 2012 at 11:07 AM, Aseem Anand <aseem.iiith@gmail.com>wrote:
>> Hi Chris,
>> I had a few PriorityQueue's at the mappers which I wished to send to some
>> reducers. After this each reducer(receiving PriorityQueues from each
>> mapper) would perform some operations on these by removing the top and
>> hence accessing the elements in sorted order(which is very essential to my
>> application). Even I thought of pushing them in an ArrayWritable but was
>> wondering if there would be an existing implementation of PriorityQueue.
>> Would it be advisable to insert elements into ArrayWritable in sorted
>> order and reconstruction of merged PriorityQueues at the other end now ?
>> Thanks,
>> Aseem
>> On Mon, Oct 15, 2012 at 11:07 PM, Chris Nauroth <cnauroth@hortonworks.com
>> > wrote:
>>> Hello Aseem,
>>> I'm aware of nothing in Hadoop or related projects that provides a
>>> PriorityQueueWritable.  You could achieve this by taking some existing
>>> priority queue class and subclassing it or wrapping it to implement the
>>> Writable.write and Writable.readFields methods.
>>> If you could give us some additional context around what you want to
>>> solve, then we might be able to offer some other suggestions.  For example,
>>> depending on the problem, maybe you could sort values and wrap them in
>>> ArrayWritable (which already exists), which would save you the trouble of
>>> coding your own custom Writable.
>>> Thank you,
>>> --Chris
>>> On Mon, Oct 15, 2012 at 9:56 AM, Aseem Anand <aseem.iiith@gmail.com>wrote:
>>>> Hi,
>>>> Is anyone familiar with a PriorityQueueWritable to be used to pass data
>>>> from mapper to reducers ?
>>>> Regards,
>>>> Aseem

View raw message