hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: RecordReader and non thread safe JNI libraries
Date Mon, 02 Mar 2009 06:51:43 GMT
It's situation (2). Each map task gets its own JVM instance; this has its
own RecordReader and its own Mapper implementation. There's basically a loop
in each task jvm that says:

while (recordReader.hasNext()) {
  recordReader.getNext(k, v);
  myMapper.map(k, v, output, reporter);
}

If your mapper and the RR use the same library and tread on one another's
state, you're going to have undefined results.

- Aaron


On Sun, Mar 1, 2009 at 8:33 PM, Saptarshi Guha <saptarshi.guha@gmail.com>wrote:

> Hello,
> I am quite confused and my email seems to prove it. My question is
> essentially, I need to use this non thread safe library in the Mapper,
> Reducer and RecordReader. assume, i do not create threads.
> Will I run into any thread safety issues?
>
> In a given JVM, the maps will run sequentially, so will the reduces,
> but will maps run alongside recorder reader?
>
> Hope this is clearer.
> Regards
>
>
> Saptarshi Guha
>
>
>
> On Sun, Mar 1, 2009 at 11:07 PM, Saptarshi Guha
> <saptarshi.guha@gmail.com> wrote:
> > Hello,
> > My RecordReader subclass reads from object X. To parse this object and
> > emit records, i need the use of a C library and a JNI wrapper.
> >
> >        public boolean next(LongWritable key, BytesWritable value) throws
> IOException {
> >            if (leftover == 0) return false;
> >            long wi = pos + split.getStart();
> >            key.set(wi);
> >            value.readFields(X.at( wi);
> >            pos ++; leftover --;
> >            return true;
> >        }
> >
> > X.at uses the JNI lib to read a record number wi
> >
> > My question is who running this?
> > 1) For a given job, is one instance of this running on each
> > tasktracker? reading records and feeding to the mappers on its
> > machine?
> > Or,
> > 2) as I have mapred.tasktracker.map.tasks.maximum == 7, does each jvm
> > launched have one RecordReader running feeding records to the maps its
> > jvm is running.
> >
> > If it's either (1) or (2), I guess I'm safe from threading issues.
> >
> > Please correct me if i'm totally wrong.
> > Regards
> >
> > Saptarshi Guha
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message