lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1865) ignore byte-order markers in SolrResourceLoader
Date Fri, 07 May 2010 22:22:49 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865336#action_12865336
] 

Hoss Man commented on SOLR-1865:
--------------------------------

Robert: based on my limited understanding, aren't there different BOMs for different encodings?
...

http://unicode.org/faq/utf_bom.html#bom4

The getLInes method modified in your patch could (conceivably) be used to open files in other
encodings, so do we also need to worry about those possibilities as well? (or does InputStreamReader
take care of that for us?)

> ignore byte-order markers in SolrResourceLoader
> -----------------------------------------------
>
>                 Key: SOLR-1865
>                 URL: https://issues.apache.org/jira/browse/SOLR-1865
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: SOLR-1865.patch
>
>
> If you create say a stopwords list with windows notepad or other editors and save as
UTF-8, 
> some of these editors will insert a byte-order marker (zero-width no-break space) as
the first 
> character of the file.
> http://www.lucidimagination.com/search/document/5101871231fc95af/is_this_a_bug_of_the_ressourceloader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message