lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8
Date Thu, 19 Aug 2010 20:05:16 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900423#action_12900423
] 

Hoss Man commented on SOLR-2034:
--------------------------------

bq. I don't think adding many hoops for back compatibility is worth the trouble. Note that
that does not mean people can not use solrj to talk across different versions - they may have
to use xml though....

Agreed, my chief concern is what happens when someone tries to use SolrJ 1.4 to talk to Solr
3.1 w/javabin (or vice versa).

A) If they get an error: great, i'm totaly fine with that -- we just document that they should
use XML in this case.

B) If the commands succeed, but the string data is _always_ corrupted, that's not ideal --
but not totally horrible since the probably should be immediately obvious and should have
read the documentation and known not to do that.

C) if the commands succeed, but the string data is _sometimes_ corrupted (as i recall, not
every character is different in UTF8 vs Java's  modified UTF8, correct?) then that seems really
bad ... people may start using javabin to update their index and not notice for quite some
time that big hard to identify chunks of their data are corrupted.

as long a someone sanity checks that the situation is either #A or #B before committing, i'm
totally cool with it ... but #C scares the bejesus out of me.

(i'll try to run some tests myself in the next few days if no one else gets a chance)


> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8
(http://www.unicode.org/reports/tr26/)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message