hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2071) StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
Date Thu, 18 Oct 2007 18:43:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536008

Raghu Angadi commented on HADOOP-2071:

bq. the readimit argument for mark is not honored in these changes. If one calls reset after
more than readlimit bytes have been read after mark, that reset is supposed to throw IOException.

We can just keep track of how many bytes we read and if it is larger than readlimit, we can
throw an IOException, if we want to keep that behavior. Actually we can just throw an exception
if there is no record found within readlimit (instead of reading till there a match or EOF).

Lohit and I looked the code around and it seems to seek-back pretty heavily (pretty much for
every record). Seeking back is pretty inefficient in DFS. It throws away current buffers (both
app and TCP) and starts a new connection in most cases. The current patch does not make this
situation any worse. I wonder what the typical size of these records is..

One problem with using BufferedInputStream() is that current code uses getPos() and seek()
in many place which is specific to FSDataInputStream. So it will need more changes to manage

> StreamXmlRecordReader throws java.io.IOException: Mark/reset exception in hadoop 0.14
> -------------------------------------------------------------------------------------
>                 Key: HADOOP-2071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2071
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.14.3
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>         Attachments: HADOOP-2071-1.patch
> In hadoop 0.14, using -inputreader StreamXmlRecordReader  for streaming jobs throw 
> java.io.IOException: Mark/reset exception in hadoop 0.14
> This looks to be related to (https://issues.apache.org/jira/browse/HADOOP-2067).
> <stack trace>
> Caused by: java.io.IOException: Mark/reset not supported
> 	at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.reset(DFSClient.java:1353)
> 	at java.io.FilterInputStream.reset(FilterInputStream.java:200)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.fastReadUntilMatch(StreamX
> mlRecordReader.java:289)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.readUntilMatchBegin(Stream
> XmlRecordReader.java:118)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.seekNextRecordBoundary(Str
> eamXmlRecordReader.java:111)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.init(StreamXmlRecordReader
> .java:73)
> 	at
> org.apache.hadoop.streaming.StreamXmlRecordReader.(StreamXmlRecordReader.jav
> a:63)
> </stack trace>

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message