hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruba Borthakur <dhr...@gmail.com>
Subject Re: java.io.IOException: Cannot open filename /user/root/�s�t�e�p�1�/�p�a�r�t�-�0�0�0�0�0
Date Mon, 22 Feb 2010 02:43:16 GMT
Is this, by any chance, related to

http://issues.apache.org/jira/browse/HDFS-983?

You should try if you can recreate your problem with the patch from
http://issues.apache.org/jira/browse/HADOOP-6522?

thanks,
dhruba


On Sat, Feb 20, 2010 at 6:53 PM, Jason Venner <jason.hadoop@gmail.com>wrote:

> something is not doing character set conversion correctly somewhere in
> your code path.
> If the text was passed in the email correctly each ascii letter is
> prefixed by 3 bytes
> 0xEF,0xBF,0xBD, which is the encoding for \u0FFFD, the utf 8 character
> that utf8 decoders use to replace an illegal utf8 byte sequence. see
> http://en.wikipedia.org/wiki/Unicode_Specials
>
> It looks like the file data is written in a double byte format,
> perhaps utf16 and the reader is not able to correctly recognize the
> character encoding, and the first byte of each double byte pair is
> being replaced by the replacement character \u0FFFD, while the second
> byte, the actual ascii character in your path is passed forward by the
> stream reader.
>
> On Thu, Feb 4, 2010 at 12:25 AM, Harshit Kumar <hkumar.arora@gmail.com>
> wrote:
> >
> > Hi
> >
> > I dont understand the reason for this error.
> >
> > java.io.IOException: Cannot open filename
> > /user/root/�s�t�e�p�1�/�p�a�r�t�-�0�0�0�0�0 at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1394)
> > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.(DFSClient.java:1385)
> at
> > org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:338) at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:171)
> > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) at
> > org.bike.MakeNPairReduce.reduce(MakeNPairReduce.java:40) at
> > org.bike.MakeNPairReduce.reduce(MakeNPairReduce.java:1) at
> > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436) at
> > org.apache.hadoop.mapred.Child.main(Child.java:158)
> >
> >
> > I have a code that scans a folder step0 to find name of files generated
> in
> > the previous map-reduce phase. Then create another file with the entries
> for
> > ex:
> > if scanning finds that there are 2 files produced by 1st map-reduce
> phase,
> > then new created file will have 2 entries step1/part00000 and
> > step1/part00001 i.e. one entry for each file.
> >
> > Now, when I read this file in another map-reduce job, each line is read
> as
> > /user/root/�s�t�e�p�1�/�p�a�r�t�-�0�0�0�0�0 . What
it seems like, a
> string
> > inserted by my code, when read by FSDataInputStream prefix each character
> of
> > the string by a question mark (?). Why is that so?
> >
> > The file name part-00000 do exist inside folder step1, but reading this
> > filename, /user/root/�s�t�e�p�1�/�p�a�r�t�-�0�0�0�0�0
, throws
> IOException
> > which I can undersand that there is no such filename, but why are these
> ?'s
> > infiltraded before each letter.
> >
> > Really appreciate if some one can help me solve this riddle?
> >
> > Thanks and Regards
> > H. Kumar
> > skype: harshit900
> > Blog: http://harshitkumar.wordpress.com
> > Website: http:/kumarharmuscat.tripod.com
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>



-- 
Connect to me at http://www.facebook.com/dhruba

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message