hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranau <alex.barano...@gmail.com>
Subject Making input in Map iterable
Date Wed, 08 Dec 2010 21:06:02 GMT

I have a data processing logic implemented so that on input it receives
Iterable<Some>. I.e. pretty much the same as reducer's API. But I need to
use this code in Map, where each element is "arrived" as map() method
To solve the problem (at least for now), I'm doing the following:
* run processing code in a thread which I start in setup() and wait for
completion for it in cleanup()
* keep a buffer which I fill with map input items (and feed Iterable object
from this buffer until it has something)
* write to buffer until it is full and only then switch to a thread which
does processing.
(assumption: processing logic always read data from buffer till the end, if
processing fails, then the whole job is marked as failed).

I don't see that it should cause any noticeable performance degradation:
switches between threads are quite rare. Also it looks like the approach is
safe. Could anyone please confirm that? Or in case there's a better
solution, please, let me know.

Btw, the rough cut of implementation you can find here (small class):
It is in working (unit-tests work well at least) state.

Thank you in advance!

Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message