lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Bartunov <sbos....@gmail.com>
Subject Re: How to index long words with StandardTokenizerFactory?
Date Fri, 22 Oct 2010 19:18:10 GMT
I'm using Solr 1.4.1. Now I'm successed with replacing lucene-core jar
but maxTokenValue seems to be used in very strange way. Currenty for
me it's set to 1024*1024, but I couldn't index a field with just size
of ~34kb. I understand that it's a little weird to index such a big
data, but I just want to know it doesn't work

On 22 October 2010 20:36, Steven A Rowe <sarowe@syr.edu> wrote:
> Hi Sergey,
>
> I've opened an issue to add a maxTokenLength param to the StandardTokenizerFactory configuration:
>
>        https://issues.apache.org/jira/browse/SOLR-2188
>
> I'll work on it this weekend.
>
> Are you using Solr 1.4.1?  I ask because of your mention of Lucene 2.9.3.  I'm not
sure there will ever be a Solr 1.4.2 release.  I plan on targeting Solr 3.1 and 4.0 for the
SOLR-2188 fix.
>
> I'm not sure why you didn't get the results you wanted with your Lucene hack - is it
possible you have other Lucene jars in your Solr classpath?
>
> Steve
>
>> -----Original Message-----
>> From: Sergey Bartunov [mailto:sbos.net@gmail.com]
>> Sent: Friday, October 22, 2010 12:08 PM
>> To: solr-user@lucene.apache.org
>> Subject: How to index long words with StandardTokenizerFactory?
>>
>> I'm trying to force solr to index words which length is more than 255
>> symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene
>> StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag
>> in schema configuration XML. Specifying the maxTokenLength attribute
>> won't work.
>>
>> I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src
>> and changed the DEFAULT_MAX_TOKEN_LENGTH to 1000000, built it to jar
>> and replaced original lucene-core jar in solr /lib. But seems like
>> that it had bring no effect.
>

Mime
View raw message