pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Paepcke (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-1924) CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
Date Sun, 20 Mar 2011 23:08:05 GMT

     [ https://issues.apache.org/jira/browse/PIG-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andreas Paepcke updated PIG-1924:
---------------------------------

    Release Note: This module subsumes the current CSVLoader(). However, its syntax for escaping
embedded double quotes is to prepend a second double quote. This syntax is the one honored
by Excel 2007. In addition, this module's default field delimiter is a comma. In part, this
decision is based on Excel behaving inconsistently with newlines embedded in fields when tab
is used as the delimiter. That delimiter default differs from the existing CSVLoader(), which
defaults to tab for delimiting fields.  (was: This module subsumes the current CSVLoader().
However, its syntax for escaping embedded double quotes is to prepend a second double quote.
This syntax is the one honored by Excel 2007. In addition, this module's default field delimiter
is a comma. In part, this decision is based on Excel behaving inconsistently with newlines
embedded in fields when tab is used as the delimiter. )

> CSV Loader/Store that handles newlines in fields, and other Excel CSV features.
> -------------------------------------------------------------------------------
>
>                 Key: PIG-1924
>                 URL: https://issues.apache.org/jira/browse/PIG-1924
>             Project: Pig
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.8.0
>            Reporter: Andreas Paepcke
>         Attachments: CSVExcelStorage.java, TestCSVExcelStorage.java
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> CSVExcelStorage() combines load and store of CSV encoded data. Handles newlines within
fields, escaped double quotes, and double quoting of fields with embedded field delimiters.
Newline handling is optional, and controlled by a parameter. The module also offers an option
to output with Windows style newlines (CRLF, instead of the Unix LF). All CSV related syntax
decisions were made to match Excel 2007.
> The module comes with a test file, and javadoc produces proper documentation files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message