hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Silberstein <a...@trifacta.com>
Subject webhdfs read error after successful pig job
Date Fri, 14 Jun 2013 16:43:05 GMT
I'm having some trouble with webhdfs read after running a Pig job that completed successfully.

Here are some details:

-I am using Hadoop CDH-4.1.3 and the compatible Pig that goes with this (0.10.0 I think)

-The Pig job writes out about 10 files.  I'm programmatically attempting to read each of these
with webhdfs soon after pig notifies me the job is complete.  The reads often all succeed.
 And even in the failure case, most of the reads still succeed, but one may fail.

-I wondered if I was facing a race condition where Pig was reporting success before the file
was truly ready to read.  However, when I run WebHDFS read with curl even hours later, the
request hangs.  In contrast, I can run 'cat' from the DFS command line and the file is output

-I ran fsck over the problem file and it report back totally normal.

-I looked at the namenode to see why my curl request hangs.  I get this error:
ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:ubuntu
(auth:SIMPLE) cause:java.io.IOException: Could not reach the block containing the data. Please
try again
(I'm guessing the permissions aren't really the important thing here, the underlying cause
of not reaching the block seems more reasonable).

-I have a 4 node cluster with replication set to 1.

If anyone has seen this, has diagnostic tips, or best of all, a solution, please let me know!


View raw message