pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geza Radics (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4242) For indented xmls with multiline content (e.g. wikipedia) XMLLoader cuts out the begining of every line
Date Sun, 19 Oct 2014 02:31:33 GMT

     [ https://issues.apache.org/jira/browse/PIG-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Geza Radics updated PIG-4242:
-----------------------------
    Description: 
XMLLoader finds the first matching position for the required tag, but applies this offset
for all following lines as well until the closing tag. This causes content losses for indented
xml formats with multiline contents such as the wikipedia xml dump:

--- example input ---
{code:xml}
    <page>You have 
not missed it</page>
{code}

--- ouput ---
{code:xml}
<page>You have missed it</page>
{code}


  was:
XMLLoader finds the first matching position for the required tag, but applies this offset
for all following lines as well until the closing tag. This causes content losses for indented
xml formats with multiline contents such as wikipedia:

--- example input ---
{code:xml}
    <page>You have 
not missed it</page>
{code}

--- ouput ---
{code:xml}
<page>You have missed it</page>
{code}



> For indented xmls with multiline content (e.g. wikipedia) XMLLoader cuts out the begining
of every line
> -------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-4242
>                 URL: https://issues.apache.org/jira/browse/PIG-4242
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>            Reporter: Geza Radics
>         Attachments: XMLLoaderMissingContent.patch
>
>
> XMLLoader finds the first matching position for the required tag, but applies this offset
for all following lines as well until the closing tag. This causes content losses for indented
xml formats with multiline contents such as the wikipedia xml dump:
> --- example input ---
> {code:xml}
>     <page>You have 
> not missed it</page>
> {code}
> --- ouput ---
> {code:xml}
> <page>You have missed it</page>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message