hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-8655) In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character sequence in delimiter were found missing in certain cases in the Map Output
Date Mon, 06 Aug 2012 15:26:03 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated HADOOP-8655:
-------------------------------

     Component/s: util
    Release Note:   (was: A few lines of change in LineReader, also incorporaed the MAPREDUCE-4512
patch)
    Hadoop Flags:   (was: Reviewed)

As with MAPREDUCE-4512, I moved this to project Hadoop Common since that's where the patch
needs to be applied.

In the future, please don't set the Reviewed flag unless the patch has been reviewed and approved
by someone in the community. I see no record of that occurring, so I've cleared that flag.
Also the Fix versions field is intended to mark where the patch has been integrated, so please
don't set this field. If you'd like to indicate what versions you'd like to have the patch
committed to, use the Target Versions field instead.

                
> In TextInputFormat, while specifying textinputformat.record.delimiter the character/character
sequences in data file similar to starting character/starting character sequence in delimiter
were found missing in certain cases in the Map Output
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8655
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8655
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.20.2
>         Environment: Linux- Ubuntu 10.04
>            Reporter: Arun A K
>              Labels: hadoop, mapreduce, textinputformat, textinputformat.record.delimiter
>         Attachments: MAPREDUCE-4519.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Set textinputformat.record.delimiter as "</entity>"
> Suppose the input is a text file with the following content
> <entity><id>1</id><name>User1</name></entity><entity><id>2</id><name>User2</name></entity><entity><id>3</id><name>User3</name></entity><entity><id>4</id><name>User4</name></entity><entity><id>5</id><name>User5</name></entity>
> Mapper was expected to get value as 
> Value 1 - <entity><id>1</id><name>User1</name>
> Value 2 - <entity><id>2</id><name>User2</name>
> Value 3 - <entity><id>3</id><name>User3</name>
> Value 4 - <entity><id>4</id><name>User4</name>
> Value 5 - <entity><id>5</id><name>User5</name>
> According to this bug Mapper gets value
> Value 1 - entity><id>1</id><name>User1</name>
> Value 2 - <entity>id>2</id><name>User2</name>
> Value 3 - <entity><id>3id><name>User3</name>
> Value 4 - <entity><id>4</id><name>User4name>
> Value 5 - <entity><id>5</id><name>User5</name>
> The pattern shown above need not occur for value 1,2,3 necessarily. The bug occurs at
some random positions in the map input.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message