hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Deserialization issue.
Date Sat, 28 Jul 2012 00:46:09 GMT
Ah, that may be cause the core-site.xml has the property
io.serializations fully defined for Gora as well? You can do that as
an alternative fix, supply a core-site.xml across tasktrackers that
also carry the serialization class Gora requires. I failed to think of
that as a solution.

On Sat, Jul 28, 2012 at 6:04 AM, Sriram Ramachandrasekaran
<sri.rams85@gmail.com> wrote:
> okay. But this issue didn't present itself when run in standalone mode. :)
>
> On 28 Jul 2012 06:02, "Harsh J" <harsh@cloudera.com> wrote:
>>
>> I find it easier to run jobs via MRUnit (http://mrunit.apache.org,
>> TDD) first, or via LocalJobRunner, for debug purposes.
>>
>> On Sat, Jul 28, 2012 at 5:53 AM, Sriram Ramachandrasekaran
>> <sri.rams85@gmail.com> wrote:
>> > hello harsh,
>> > thanks for your investigations. while we were debugging, I saw the exact
>> > thing. As you pointed out, we suspected it to be a problem. So, we set
>> > the
>> > job conf object directly on Gora's query object.
>> > It goes something like this,
>> > query.setConf..(job.getConfig..())
>> >
>> > And, then I saw that it was not getting into creating a new object at
>> > getOrCreate().
>> >
>> > OTOH, i've not tried the job.xml thing. I should give it a try n I shall
>> > keep the loop posted.
>> >
>> > I would also like to hear about standard practices for debugging
>> > distributed
>> > MR tasks.
>> >
>> > -----
>> > reply from a hh device. Pl excuse typos n lack of formatting.
>> >
>> > On 28 Jul 2012 03:30, "Harsh J" <harsh@cloudera.com> wrote:
>> >>
>> >> Hi Sriram,
>> >>
>> >> I suspect the following in Gora to somehow be causing this issue:
>> >>
>> >> IOUtils source:
>> >>
>> >>
>> >> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
>> >> QueryBase source:
>> >>
>> >>
>> >> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup
>> >>
>> >> Notice that IOUtils.deserialize(…) calls expect a proper Configuration
>> >> object. If not passed (i.e., if null), they call the following.
>> >>
>> >> 68        private static Configuration getOrCreateConf(Configuration
>> >> conf)
>> >> {
>> >> 69          if(conf == null) {
>> >> 70            if(IOUtils.conf == null) {
>> >> 71              IOUtils.conf = new Configuration();
>> >> 72            }
>> >> 73          }
>> >> 74          return conf != null ? conf : IOUtils.conf;
>> >> 75        }
>> >>
>> >> Now QueryBase, has in its readFields method, some
>> >> IOUtils.deserialize(…) calls, that seem to pass a null for the
>> >> configuration object. The IOUtils.deserialize(…) method hence calls
>> >> this above method, and initializes a whole new Configuration object,
>> >> as the passed conf object is null.
>> >>
>> >> If it does that, it would not be loading the "job.xml" file contents,
>> >> which is the job's config file (thats something the map task's config
>> >> set alone loads, and not a file thats loaded by default). So hence,
>> >> custom serializers will disappear the moment it begins using this new
>> >> Configuration object.
>> >>
>> >> This is what you'll want to investigate and fix or notify the Gora
>> >> devs about (why QueryBase#readFields uses a null object, and if it can
>> >> reuse some set conf object). As a cheap hack fix, maybe doing the
>> >> following will make it work in an MR environment?
>> >>
>> >> IOUtils.conf = new Configuration();
>> >> IOUtils.conf.addResource("job.xml");
>> >>
>> >> I haven't tried the above, but let us know how we can be of further
>> >> assistance. An ideal fix would be to only use the MapTask's provided
>> >> Configuration object everywhere, somehow, and never re-create one.
>> >>
>> >> P.s. If you want a thread ref link to share with other devs over Gora,
>> >> here it is: http://search-hadoop.com/m/BXZA4dTUFC
>> >>
>> >> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
>> >> <sri.rams85@gmail.com> wrote:
>> >> > Hello,
>> >> > I have an MR job that talks to HBase. I use Gora to talk to HBase.
>> >> > Gora
>> >> > also
>> >> > provides couple of classes which can be extended to write Mappers and
>> >> > Reducers, if the mappers need input from an HBase store and Reducers
>> >> > need to
>> >> > write it out to an HBase store. This is the reason why I use Gora.
>> >> >
>> >> > Now, when I run my MR job, I get an exception as below.
>> >> > (https://issues.apache.org/jira/browse/HADOOP-3093)
>> >> > java.lang.RuntimeException: java.io.IOException:
>> >> > java.lang.NullPointerException
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
>> >> > at
>> >> >
>> >> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>> >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
>> >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> >> > at java.security.AccessController.doPrivileged(Native Method)
>> >> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>> >> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> >> > Caused by: java.io.IOException: java.lang.NullPointerException
>> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
>> >> > ... 9 more
>> >> > Caused by: java.lang.NullPointerException
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
>> >> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
>> >> > at
>> >> > org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>> >> > at
>> >> >
>> >> >
>> >> > org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>> >> > at
>> >> >
>> >> > org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
>> >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
>> >> > ... 11 more
>> >> >
>> >> > I tried the following things to work through this issue.
>> >> > 0. The stack trace indicates that, when setting up a new Mapper, it
>> >> > is
>> >> > unable to deserialize something. (I could not get to understand where
>> >> > it
>> >> > fails).
>> >> > 1. I looked around the forums and realized that serialization options
>> >> > are
>> >> > not getting passed, so, I tried setting up, io.serializations config
>> >> > on
>> >> > the
>> >> > job.
>> >> >    1.1. I am not setting up the "io.serializations" myself, I use
>> >> > GoraMapReduceUtils.setIOSerializations() to do it. I verified that,
>> >> > the
>> >> > confs are getting proper serializers.
>> >> > 2. I verified in the job xml to see if these confs have got through,
>> >> > they
>> >> > were. But, it failed again.
>> >> > 3. I tried starting the hadoop job runner with debug options turned
>> >> > on
>> >> > and
>> >> > in suspend mode, -XDebug suspend=y and I also set the VM options for
>> >> > mapred
>> >> > child tasks, via the mapred.child.java.opts to see if I can debug the
>> >> > VM
>> >> > that gets spawned newly. Although I get a message on my stdout
>> >> > saying,
>> >> > opening port X and waiting, when I try to attach a remote debugger
on
>> >> > that
>> >> > port, it does not work.
>> >> >
>> >> > I understand that, when SerializationFactory tries to deSerialize
>> >> > 'something', it does not find an appropriate unmarshaller and so it
>> >> > fails.
>> >> > But, I would like to know a way to find that 'something' and I would
>> >> > like to
>> >> > get some idea on how (pseudo) distributed MR jobs should be generally
>> >> > debugged. I tried searching, did not find anything useful.
>> >> >
>> >> > Any help/pointers would be greatly useful.
>> >> >
>> >> > Thanks!
>> >> >
>> >> > --
>> >> > It's just about how deep your longing is!
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Mime
View raw message