mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "steven zhuang (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (MAHOUT-748) WikipediaAnalyzer in 0.5 would fail due to lucene3.1's CharArraySet.iterator() returns an "char[]" iterator instead of a "String" iterator
Date Thu, 30 Jun 2011 02:39:28 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057584#comment-13057584
] 

steven zhuang edited comment on MAHOUT-748 at 6/30/11 2:38 AM:
---------------------------------------------------------------

I have created a patch file for this issue, which works for me.
see the attachment for details.
and Sean, sorry for the late response. 


      was (Author: stevenzhuang):
    I have created a patch file for this issue, which works for me.
see the attachment for details.
  
> WikipediaAnalyzer in 0.5 would fail due to lucene3.1's CharArraySet.iterator() returns
an "char[]"  iterator instead of a "String" iterator
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-748
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-748
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5
>            Reporter: steven zhuang
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: WikipediaAnalyzer.java.patch, WikipediaAnalyzer.java_diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> in mahout0.5, the class org.apache.mahout.analysis.WikipediaAnalyzer would fail to be
constructed.
> the statement around WikipediaAnalyzer.java line 38:
>    stopSet = (CharArraySet) StopFilter.makeStopSet(Version.LUCENE_31,
>         StopAnalyzer.ENGLISH_STOP_WORDS_SET.toArray(new String[StopAnalyzer.ENGLISH_STOP_WORDS_SET.size()]));
>   will raise an ArrayStoreException exception due to 
>           StopAnalyzer.ENGLISH_STOP_WORDS_SET.toArray(String[] ) will throw such an exception.
>    the cause is that in lucene3.1, when version number is bigger than 3.0, the CharArraySet.iterator()
method returns an 'char[]' iterator instead of an "String" list.
> see code from CharArraySet.java:
>   @Override @SuppressWarnings("unchecked")
>   public Iterator<Object> iterator() {
>     // use the AbstractSet#keySet()'s iterator (to not produce endless recursion)
>     return map.matchVersion.onOrAfter(Version.LUCENE_31) ?
>       map.originalKeySet().iterator() : (Iterator) stringIterator();
>   }
> so in WikipediaAnalyzer() we may need to make a transform from char[] to String to make
it work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message