lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1730) TrecContentSource should use a fixed encoding, rather than system dependent
Date Thu, 02 Jul 2009 14:53:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726507#action_12726507
] 

Mark Miller commented on LUCENE-1730:
-------------------------------------

I think that it makes sense to make the default the encoding the one that trec typically/always
uses, but we should probably make this configurable from the alg file. We don't want to be
locked down to one input encoding. Could be done in another issue though. Should allow that
for the other contentsources as well.

> TrecContentSource should use a fixed encoding, rather than system dependent
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1730
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/benchmark
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1730.patch
>
>
> TrecContentSource opens InputStreamReader w/o a fixed encoding. On Windows, this means
CP1252 (at least on my machine) which is ok. However, when I opened it on a Linux machine
w/ a default of UTF-8, it failed to read the files. The patch changes it to use ISO-8859-1,
which seems to be the right one (and http://mg4j.dsi.unimi.it/man/manual/ch01s04.html mentions
this encoding in its example of a script which reads the data).
> Patch to follow shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message