Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 50331194E7 for ; Tue, 12 Apr 2016 17:10:24 +0000 (UTC) Received: (qmail 72519 invoked by uid 500); 12 Apr 2016 17:10:19 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 71963 invoked by uid 500); 12 Apr 2016 17:10:18 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 71937 invoked by uid 99); 12 Apr 2016 17:10:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2016 17:10:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CCAD51A06B5 for ; Tue, 12 Apr 2016 17:10:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.979 X-Spam-Level: * X-Spam-Status: No, score=1.979 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=webgroup-limited-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Cs9hKA2_VM5b for ; Tue, 12 Apr 2016 17:10:16 +0000 (UTC) Received: from mail-yw0-f169.google.com (mail-yw0-f169.google.com [209.85.161.169]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id AEE915FAED for ; Tue, 12 Apr 2016 17:10:15 +0000 (UTC) Received: by mail-yw0-f169.google.com with SMTP id t10so33101778ywa.0 for ; Tue, 12 Apr 2016 10:10:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=webgroup-limited-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=AnWMZNdWONytjVu/xsL5wvMp5KFrcerFEafhqkWjM5k=; b=qKVLwPmgbzETgvqWYxQ/SC9dunKxQvqbMhobT1tabvTmVKbb6FSVXoSvAFdjjynAsN W2pzumugZx8STASU+nkZsrzM+ghvi4v0WNbUdXur3AkpxW6TJfH4wIGBLRF5TW+iMw/b 8VaYIk0ta/J3qs8js7SlnvMgG9crTUhGUX8H386pCY5ZXJC2k1I/hhCE8tW6WHvAnrQX asB7uBiG1tzzBnIHUJjQi+zhmoxw+mXBNLA6G9OatUitqPQr8H0aJHo79XIdRCcmPmTf ikN4JdT/UwSvU5BuGBoPoVsGlWF6yvbxIKiBkuAcgVgrE0WpEVq1ovvds7RXlBlGuP2U 3Djw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=AnWMZNdWONytjVu/xsL5wvMp5KFrcerFEafhqkWjM5k=; b=O3vLAoG8EVSgmto8tehHYWzJXNpOJ2qMHn1uXUESrpsqR98VWR1KrIHWl48ItGZg4V 8IU7YigZS70afmlLAkOZ7DE1lpFFXTq+deMv1E3kpj49obmqAQfWb2uXv1gKb0hrjUBW 5HotzuhpEQWYNNO6rmlH1IEuDpd6JKhQh/GAx2W5XDyjD51T5AMns0/EehGErjElWvpV 9fCJ43QlsSAy1VSFVII2nmG0J4wBfZBSKWcxfNq1w9z4XxXuvoZ7IqcXe8cPAXjSHoTU W+hM4Ue8QEFePoNlTG++ITO1PD63ISVXw/K38COAqp3YHmfCn455AFpdMYtqWhoQ5IFl oBMQ== X-Gm-Message-State: AOPr4FXf6lo4mlrbEB2o06acKoy+Xs6DuOFJCAZ2qqJMAf2dwy05ZKrvM+5OB8IY5Jz8AY2h5FCW0A+33jCb+w== MIME-Version: 1.0 X-Received: by 10.13.219.4 with SMTP id d4mr2254963ywe.239.1460481014864; Tue, 12 Apr 2016 10:10:14 -0700 (PDT) Received: by 10.13.214.206 with HTTP; Tue, 12 Apr 2016 10:10:14 -0700 (PDT) In-Reply-To: <570D2AF1.7060108@webgroup-limited.com> References: <570D2AF1.7060108@webgroup-limited.com> Date: Tue, 12 Apr 2016 19:10:14 +0200 Message-ID: Subject: Re: Issue managing SequenceFiles - Corrupted files ? From: aurelien violette To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a114fab0e17ef3705304cbcf8 --001a114fab0e17ef3705304cbcf8 Content-Type: text/plain; charset=UTF-8 Hi all, I've been struggling with this for a while. I'm pretty sure there is something I miss to make this work correctly. My flow is the following : 1 - I use a MR job to dump an Elasticsearch index to HDFS as a SequenceFile. SequenceFile is . 2 - I use an other job to treat data later. Pretty much any job on any dump sends me the exception : Error: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319) at org.apache.hadoop.io.Text.readFields(Text.java:291) at org.apache.hadoop.io .ArrayWritable.readFields(ArrayWritable.java:96) at org.elasticsearch.hadoop.mr .WritableArrayWritable.readFields(WritableArrayWritable.java:52) at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:188) at org.apache.hadoop.io .serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) at org.apache.hadoop.io .serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) at org.apache.hadoop.io .SequenceFile$Reader.deserializeValue(SequenceFile.java:2247) at org.apache.hadoop.io .SequenceFile$Reader.getCurrentValue(SequenceFile.java:2220) at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:78) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) I can't believe that my disk are corrupted. So my guess, either I have an issue to write the files or to read them. Any idea to investigate the issue? I'm using hadoop 2.7.2. Thank you -- BR, Aurelien Violette --001a114fab0e17ef3705304cbcf8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all,

I've been struggling with this for a while. I'm pretty sure there i= s something I miss to make this work correctly.

My flow is the following :
1 - I use a MR job to dump an Elasticsearch index to HDFS as a SequenceFile= . SequenceFile is <Text, MaWritable>.
2 - I use an other job to treat data later. Pretty much any job on any dump= sends me the exception :

Error: java.io.EOFException
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.io.DataInputStream.readFully(DataInputS= tream.java:197)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.Text.readWithKnownLengt= h(Text.java:319)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.Text.readFields(Text.ja= va:291)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritab= le.java:96)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.elasticsearch.hadoop.mr.WritableArrayWritable.= readFields(WritableArrayWritable.java:52)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.MapWritable.readFields(MapWritable.j= ava:188)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.serializer.WritableSerialization$Wri= tableDeserializer.deserialize(WritableSerialization.java:71)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.serializer.WritableSerialization$Wri= tableDeserializer.deserialize(WritableSerialization.java:42)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue= (SequenceFile.java:2247)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(= SequenceFile.java:2220)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.lib.input.Sequen= ceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:78)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.MapTask$NewTracking= RecordReader.nextKeyValue(MapTask.java:556)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.task.MapContextI= mpl.nextKeyValue(MapContextImpl.java:80)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.lib.map.WrappedM= apper$Context.nextKeyValue(WrappedMapper.java:91)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapreduce.Mapper.run(Mappe= r.java:145)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.MapTask.runNewMappe= r(MapTask.java:787)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.MapTask.run(MapTask= .java:341)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.YarnChild$2.run(Yar= nChild.java:164)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.security.AccessController.doPrivileged(= Native Method)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at javax.security.auth.Subject.doAs(Subject.jav= a:422)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.security.UserGroupInformat= ion.doAs(UserGroupInformation.java:1657)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.hadoop.mapred.YarnChild.main(Yarn= Child.java:158)

I can't believe that my disk are corrupted. So my guess, either I have = an issue to write the files or to read them.

Any idea to investigate the issue? I'm using hadoop 2.7.2.

Thank you

--
BR,
Aurelien Violette
--001a114fab0e17ef3705304cbcf8--