lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: [jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
Date Thu, 28 Jun 2007 20:25:48 GMT

On Jun 28, 2007, at 3:47 PM, Doron Cohen (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/LUCENE-848? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12508922 ]
>
> Doron Cohen commented on LUCENE-848:
> ------------------------------------
>
> Steven wrote:
>> I think Mike mentioned not doing the one file per article. I'll  
>> try to look at that ...
>
> Perhaps also (re) consider the "compress and add on-the-fly"  
> approach, similar to what TrecDocmaker is doing?
>
> Grant wrote:
>> I take back my promise to commit, I am getting (after processing  
>> 189500 docs):
>>    [java] Error: cannot execute the algorithm! term out of order  
>> ("docid:disrs".compareTo("docname:disregardle
>>                                                                       
>>                                           &*Ar") <= 0)
>>   [java] org.apache.lucene.index.CorruptIndexException: term out  
>> of order ("docid:disrs".compareTo("docname:disregardle
>>                                                                       
>>                                                    &*Ar") <= 0)
>
> Just to verify that it is not a benchmark issue, could you also  
> post here the executed algorithm (as printed, or, if not printed,  
> the actual file)...?

It is the one in the patch.  I ran "ant enwiki"




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message