hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun A K (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-4519) In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character sequence in delimiter were found missing in certain cases in the Map Output
Date Mon, 06 Aug 2012 11:03:02 GMT
Arun A K created MAPREDUCE-4519:
-----------------------------------

             Summary: In TextInputFormat, while specifying textinputformat.record.delimiter
the character/character sequences in data file similar to starting character/starting character
sequence in delimiter were found missing in certain cases in the Map Output
                 Key: MAPREDUCE-4519
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4519
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 0.20.2
         Environment: Linux- Ubuntu 10.04
            Reporter: Arun A K
             Fix For: 0.20.2


Set textinputformat.record.delimiter as "</entity>"

Suppose the input is a text file with the following content
<entity><id>1</id><name>User1</name></entity><entity><id>2</id><name>User2</name></entity><entity><id>3</id><name>User3</name></entity><entity><id>4</id><name>User4</name></entity><entity><id>5</id><name>User5</name></entity>

Mapper was expected to get value as 

Value 1 - <entity><id>1</id><name>User1</name>
Value 2 - <entity><id>2</id><name>User2</name>
Value 3 - <entity><id>3</id><name>User3</name>
Value 4 - <entity><id>4</id><name>User4</name>
Value 5 - <entity><id>5</id><name>User5</name>

According to this bug Mapper gets value

Value 1 - entity><id>1</id><name>User1</name>
Value 2 - <entity>id>2</id><name>User2</name>
Value 3 - <entity><id>3id><name>User3</name>
Value 4 - <entity><id>4</id><name>User4name>
Value 5 - <entity><id>5</id><name>User5</name>

The pattern shown above need not occur for value 1,2,3 necessarily. The bug occurs at some
random positions in the map input.
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message