hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohammad Kamrul Islam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-1511) Hive plan serialization is slow
Date Wed, 28 Aug 2013 09:30:53 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752239#comment-13752239
] 

Mohammad Kamrul Islam commented on HIVE-1511:
---------------------------------------------

Thanks to [~ashutoshc] and [~brocknoland] for moving it to this far!

I think I isolated the issue in some extent. Looks like it is a bug in Kryo.

At first, I created an XML plan file for the failed case using our existing java based serialization.

Then I wrote (copied from Ashutosh) an independent java class that deserializes the plan XML
in MapRedWork object using XMLDecoder. After that, the code serializes the MapredWork object
using Kryo. At last, deserialize it using Kryo. In this case, serialization with Kryo succeeds
but deserialization with Kryo fails with the following exception. It is important to note
that the simpler version of plan XML succeeds using the same utility.

   
I'm going to attach three files:
1. Independent Java code to test <KryoHiveTest.java>.
2. Script to compile and run <run.sh>. (Run with "run.sh generated_plan.xml")
3. Generated plan in XML <generated_plan.xml> that fails.

 [~romixlev] : do you have any suggestion? I think you are also active in Kryo. Should i send
an email to kayo list?




Exception:
{quote}
Exception in thread "main" com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException:
Index: 12416, Size: 1504
Serialization trace:
rslvMap (org.apache.hadoop.hive.ql.parse.RowResolver)
rr (org.apache.hadoop.hive.ql.parse.OpParseContext)
opParseCtxMap (org.apache.hadoop.hive.ql.plan.MapWork)
mapWork (org.apache.hadoop.hive.ql.plan.MapredWork)
	at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:485)
	at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
	at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:485)
	at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:760)
	at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
	at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
	at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
	at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:485)
	at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
	at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
	at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:485)
	at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657)
	at KryoHiveTest.fun(KryoHiveTest.java:51)
	at KryoHiveTest.main(KryoHiveTest.java:25)
Caused by: java.lang.IndexOutOfBoundsException: Index: 12416, Size: 1504
	at java.util.ArrayList.RangeCheck(ArrayList.java:547)
	at java.util.ArrayList.get(ArrayList.java:322)
	at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
	at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:804)
	at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:728)
	at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:127)
	at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
	at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
	at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
	... 16 more
{quote}
                
> Hive plan serialization is slow
> -------------------------------
>
>                 Key: HIVE-1511
>                 URL: https://issues.apache.org/jira/browse/HIVE-1511
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Ning Zhang
>            Assignee: Mohammad Kamrul Islam
>         Attachments: HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch,
HIVE-1511.8.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch,
HIVE-1511-wip.patch
>
>
> As reported by Edward Capriolo:
> For reference I did this as a test case....
> SELECT * FROM src where
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> ...(100 more of these)
> No OOM but I gave up after the test case did not go anywhere for about
> 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message