lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1925) CSV Response Writer
Date Thu, 15 Jul 2010 03:00:57 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888679#action_12888679
] 

Chris A. Mattmann commented on SOLR-1925:
-----------------------------------------

{quote}
Excel (at least the version I just tried) handled embedded newlines just fine. 
{quote}

Well not for me. I'm using MS Office 2008, on Mac OS X 10.5.6. I also tried on Office XP SP
2, and same behavior on a Win XP SP2 instance I have running in VMWare. What version are you
looking at?

{quote}
AFAIK, the CSV spec doesn't recommend always using encapsulators.
{quote}

See here: http://en.wikipedia.org/wiki/Comma-separated_values, 1st Paragraph:

bq. Fields that contain a special character (comma, newline, or double quote), must be enclosed
in double quotes.

Since we don't know what the contents of each Field's value is, it's best to just account
for that by encapsulating within double quotes. This doesn't break anything, and arguably
isn't any less uglier than without (that's a judgment call). 

{quote}
Proper escaping is an absolute necessity. You can't represent arbitrary text field values
without it.
{quote}

How would you recommend doing so?

{quote}
If we do things correctly, we should be able to round-trip with http://wiki.apache.org/solr/UpdateCSV
{quote}

What's your rationale that this isn't compatible with that? Have you tried it? Also, I think
that's a good thing to make happen in the end, but not a blocker to getting this into the
sources? My rationale behind that is that, e.g., for instance XML given to Solr doesn't always
round trip to the XMLReponseWriter (especially if the schema weeds out fields, discards them,
etc.)

{quote}
Having a server process act differently on different hosts is bad. We strive to never use
the default locale - it's a recipe for non-portability. All file encodings (stopword lists,
etc) default to UTF-8 instead of the system locale. Date and number formatting is standardized
and does not use the system locale. We missed some of these in the past (and sure enough,
Solr wouldn't work properly when installed on a machine of a certain locale), but Robert cleaned
all that up.
{quote}

Admittedly, I'm not an expert here, so I'll take your word for it. What's the host-independent
way to do System.getProperty("line.separator")?

> CSV Response Writer
> -------------------
>
>                 Key: SOLR-1925
>                 URL: https://issues.apache.org/jira/browse/SOLR-1925
>             Project: Solr
>          Issue Type: New Feature
>          Components: Response Writers
>         Environment: indep. of env.
>            Reporter: Chris A. Mattmann
>            Assignee: Erik Hatcher
>             Fix For: Next
>
>         Attachments: SOLR-1925.Chheng.071410.patch.txt, SOLR-1925.Mattmann.053010.patch.2.txt,
SOLR-1925.Mattmann.053010.patch.3.txt, SOLR-1925.Mattmann.053010.patch.txt, SOLR-1925.Mattmann.061110.patch.txt
>
>
> As part of some work I'm doing, I put together a CSV Response Writer. It currently takes
all the docs resultant from a query and then outputs their metadata in simple CSV format.
The use of a delimeter is configurable (by default if there are multiple values for a particular
field they are separated with a | symbol).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message