lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: How to index long words with StandardTokenizerFactory?
Date Sat, 23 Oct 2010 13:45:55 GMT
Did you delete the folder Jetty_0_0_0_0_8983_solr.war_** under apache-solr-1.4.1\example\work?

--- On Sat, 10/23/10, Sergey Bartunov <sbos.net@gmail.com> wrote:

> From: Sergey Bartunov <sbos.net@gmail.com>
> Subject: Re: How to index long words with StandardTokenizerFactory?
> To: solr-user@lucene.apache.org
> Date: Saturday, October 23, 2010, 3:56 PM
> Here are all the files: http://rghost.net/3016862
> 
> 1) StandardAnalyzer.java, StandardTokenizer.java - patched
> files from
> lucene-2.9.3
> 2) I patch these files and build lucene by typing "ant"
> 3) I replace lucene-core-2.9.3.jar in solr/lib/ by my
> lucene-core-2.9.3-dev.jar that I'd just compiled
> 4) than I do "ant compile" and "ant dist" in solr folder
> 5) after that I recompile solr/example/webapps/solr.war
> with my new
> solr and lucene-core jars
> 6) I put my schema.xml in solr/example/solr/conf/
> 7) then I do "java -jar start.jar" in solr/example
> 8) index big_post.xml
> 9) trying to find this document by "curl
> http://localhost:8983/solr/select?q=body:big*"
> (big_post.xml contains
> a long word bigaaaaa...aaaa)
> 10) solr returns nothing
> 
> On 23 October 2010 02:43, Steven A Rowe <sarowe@syr.edu>
> wrote:
> > Hi Sergey,
> >
> > What does your ~34kb field value look like?  Does
> StandardTokenizer think it's just one token?
> >
> > What doesn't work?  What happens?
> >
> > Steve
> >
> >> -----Original Message-----
> >> From: Sergey Bartunov [mailto:sbos.net@gmail.com]
> >> Sent: Friday, October 22, 2010 3:18 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: How to index long words with
> StandardTokenizerFactory?
> >>
> >> I'm using Solr 1.4.1. Now I'm successed with
> replacing lucene-core jar
> >> but maxTokenValue seems to be used in very strange
> way. Currenty for
> >> me it's set to 1024*1024, but I couldn't index a
> field with just size
> >> of ~34kb. I understand that it's a little weird to
> index such a big
> >> data, but I just want to know it doesn't work
> >>
> >> On 22 October 2010 20:36, Steven A Rowe <sarowe@syr.edu>
> wrote:
> >> > Hi Sergey,
> >> >
> >> > I've opened an issue to add a maxTokenLength
> param to the
> >> StandardTokenizerFactory configuration:
> >> >
> >> >        https://issues.apache.org/jira/browse/SOLR-2188
> >> >
> >> > I'll work on it this weekend.
> >> >
> >> > Are you using Solr 1.4.1?  I ask because of
> your mention of Lucene
> >> 2.9.3.  I'm not sure there will ever be a Solr
> 1.4.2 release.  I plan on
> >> targeting Solr 3.1 and 4.0 for the SOLR-2188 fix.
> >> >
> >> > I'm not sure why you didn't get the results
> you wanted with your Lucene
> >> hack - is it possible you have other Lucene jars
> in your Solr classpath?
> >> >
> >> > Steve
> >> >
> >> >> -----Original Message-----
> >> >> From: Sergey Bartunov [mailto:sbos.net@gmail.com]
> >> >> Sent: Friday, October 22, 2010 12:08 PM
> >> >> To: solr-user@lucene.apache.org
> >> >> Subject: How to index long words with
> StandardTokenizerFactory?
> >> >>
> >> >> I'm trying to force solr to index words
> which length is more than 255
> >> >> symbols (this constant is
> DEFAULT_MAX_TOKEN_LENGTH in lucene
> >> >> StandardAnalyzer.java) using
> StandardTokenizerFactory as 'filter' tag
> >> >> in schema configuration XML. Specifying
> the maxTokenLength attribute
> >> >> won't work.
> >> >>
> >> >> I'd tried to make the dirty hack: I
> downloaded lucene-core-2.9.3 src
> >> >> and changed the DEFAULT_MAX_TOKEN_LENGTH
> to 1000000, built it to jar
> >> >> and replaced original lucene-core jar in
> solr /lib. But seems like
> >> >> that it had bring no effect.
> 


      

Mime
View raw message