Return-Path: X-Original-To: apmail-gora-dev-archive@www.apache.org Delivered-To: apmail-gora-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0243B176FB for ; Wed, 28 Jan 2015 19:56:35 +0000 (UTC) Received: (qmail 19361 invoked by uid 500); 28 Jan 2015 19:56:35 -0000 Delivered-To: apmail-gora-dev-archive@gora.apache.org Received: (qmail 19331 invoked by uid 500); 28 Jan 2015 19:56:35 -0000 Mailing-List: contact dev-help@gora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@gora.apache.org Delivered-To: mailing list dev@gora.apache.org Received: (qmail 19320 invoked by uid 99); 28 Jan 2015 19:56:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jan 2015 19:56:35 +0000 Date: Wed, 28 Jan 2015 19:56:35 +0000 (UTC) From: "Lewis John McGibbney (JIRA)" To: dev@gora.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (GORA-392) Move PersistentSerialization to the top of serializations list MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GORA-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295729#comment-14295729 ] Lewis John McGibbney commented on GORA-392: ------------------------------------------- [~sweiss], do you require help to submit a patch? I didn't see this issue until now. > Move PersistentSerialization to the top of serializations list > -------------------------------------------------------------- > > Key: GORA-392 > URL: https://issues.apache.org/jira/browse/GORA-392 > Project: Apache Gora > Issue Type: Improvement > Components: gora-core > Affects Versions: 0.5 > Reporter: Sergey Weiss > > In a process of making Nutch2 run on Hadoop 2.3.0 + HBase 0.98.1 we encountered java.io.EOFException's like ones described in this mail thread: http://www.mail-archive.com/user%40nutch.apache.org/msg12644.html > We applied a patch mentioned there and got our setup running but being very unstable: it would fail with an ArrayIndexOutOfBounds exception whenever we try to generate a batch of some 50 or more pages to fetch. > We investigated the problem and discovered that in working setup of Nutch2 + Hadoop 1.2.0 + HBase 0.94.14, PersistentDeserializer is used for deserialization during reduce phase, and not AvroSerialization.AvroDeserializer. The reason for this sudden swap of deserializers lies in GoraMapReduceUtils#setIOSerializations method. It uses StringUtils.joinStringArrays and this method uses HashSet under the hood. Two more serializations were added to io.serializations property in Hadoop 2.3.0 compared to Hadoop 1.2.0 and this results in AvroSpecificSerialization being placed on top of serializations list. > After we have patched GoraMapReduceUtils#setIOSerializations, having explicitly set PersistentSerialization to be the top of the list, we have fixed the problem with instability. Moreover, we don't even need to patch Avro now, just one simple change in Gora and everything works like a charm! > So we propose to move PersistentSerialization to the top of serializations list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)