hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <jason.had...@gmail.com>
Subject Re: java.io.IOException: Cannot open filename /user/root/�s�t�e�p�1�/�p�a�r�t�-�0�0�0�0�0
Date Sun, 21 Feb 2010 02:53:32 GMT
something is not doing character set conversion correctly somewhere in
your code path.
If the text was passed in the email correctly each ascii letter is
prefixed by 3 bytes
0xEF,0xBF,0xBD, which is the encoding for \u0FFFD, the utf 8 character
that utf8 decoders use to replace an illegal utf8 byte sequence. see
http://en.wikipedia.org/wiki/Unicode_Specials

It looks like the file data is written in a double byte format,
perhaps utf16 and the reader is not able to correctly recognize the
character encoding, and the first byte of each double byte pair is
being replaced by the replacement character \u0FFFD, while the second
byte, the actual ascii character in your path is passed forward by the
stream reader.

On Thu, Feb 4, 2010 at 12:25 AM, Harshit Kumar <hkumar.arora@gmail.com> wrote:
>
> Hi
>
> I dont understand the reason for this error.
>
> java.io.IOException: Cannot open filename
> /user/root/�s�t�e�p�1�/�p�a�r�t�-�0�0�0�0�0 at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1394)
> at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.(DFSClient.java:1385) at
> org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:338) at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:171)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) at
> org.bike.MakeNPairReduce.reduce(MakeNPairReduce.java:40) at
> org.bike.MakeNPairReduce.reduce(MakeNPairReduce.java:1) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436) at
> org.apache.hadoop.mapred.Child.main(Child.java:158)
>
>
> I have a code that scans a folder step0 to find name of files generated in
> the previous map-reduce phase. Then create another file with the entries for
> ex:
> if scanning finds that there are 2 files produced by 1st map-reduce phase,
> then new created file will have 2 entries step1/part00000 and
> step1/part00001 i.e. one entry for each file.
>
> Now, when I read this file in another map-reduce job, each line is read as
> /user/root/�s�t�e�p�1�/�p�a�r�t�-�0�0�0�0�0 . What it
seems like, a string
> inserted by my code, when read by FSDataInputStream prefix each character of
> the string by a question mark (?). Why is that so?
>
> The file name part-00000 do exist inside folder step1, but reading this
> filename, /user/root/�s�t�e�p�1�/�p�a�r�t�-�0�0�0�0�0
, throws IOException
> which I can undersand that there is no such filename, but why are these ?'s
> infiltraded before each letter.
>
> Really appreciate if some one can help me solve this riddle?
>
> Thanks and Regards
> H. Kumar
> skype: harshit900
> Blog: http://harshitkumar.wordpress.com
> Website: http:/kumarharmuscat.tripod.com



--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
View raw message