lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: How to index long words with StandardTokenizerFactory?
Date Sat, 23 Oct 2010 14:53:57 GMT
I think you should replace your new lucene-core-2.9.3-dev.jar in \apache-solr-1.4.1\lib and
then create a new solr.war under \apache-solr-1.4.1\dist. And copy this new solr.war to solr/example/webapps/solr.war

--- On Sat, 10/23/10, Sergey Bartunov <sbos.net@gmail.com> wrote:

> From: Sergey Bartunov <sbos.net@gmail.com>
> Subject: Re: How to index long words with StandardTokenizerFactory?
> To: solr-user@lucene.apache.org
> Date: Saturday, October 23, 2010, 5:45 PM
> Yes. I did. Won't help.
> 
> On 23 October 2010 17:45, Ahmet Arslan <iorixxx@yahoo.com>
> wrote:
> > Did you delete the folder
> Jetty_0_0_0_0_8983_solr.war_** under
> apache-solr-1.4.1\example\work?
> >
> > --- On Sat, 10/23/10, Sergey Bartunov <sbos.net@gmail.com>
> wrote:
> >
> >> From: Sergey Bartunov <sbos.net@gmail.com>
> >> Subject: Re: How to index long words with
> StandardTokenizerFactory?
> >> To: solr-user@lucene.apache.org
> >> Date: Saturday, October 23, 2010, 3:56 PM
> >> Here are all the files: http://rghost.net/3016862
> >>
> >> 1) StandardAnalyzer.java, StandardTokenizer.java -
> patched
> >> files from
> >> lucene-2.9.3
> >> 2) I patch these files and build lucene by typing
> "ant"
> >> 3) I replace lucene-core-2.9.3.jar in solr/lib/ by
> my
> >> lucene-core-2.9.3-dev.jar that I'd just compiled
> >> 4) than I do "ant compile" and "ant dist" in solr
> folder
> >> 5) after that I recompile
> solr/example/webapps/solr.war
> >> with my new
> >> solr and lucene-core jars
> >> 6) I put my schema.xml in solr/example/solr/conf/
> >> 7) then I do "java -jar start.jar" in
> solr/example
> >> 8) index big_post.xml
> >> 9) trying to find this document by "curl
> >> http://localhost:8983/solr/select?q=body:big*"
> >> (big_post.xml contains
> >> a long word bigaaaaa...aaaa)
> >> 10) solr returns nothing
> >>
> >> On 23 October 2010 02:43, Steven A Rowe <sarowe@syr.edu>
> >> wrote:
> >> > Hi Sergey,
> >> >
> >> > What does your ~34kb field value look like?
>  Does
> >> StandardTokenizer think it's just one token?
> >> >
> >> > What doesn't work?  What happens?
> >> >
> >> > Steve
> >> >
> >> >> -----Original Message-----
> >> >> From: Sergey Bartunov [mailto:sbos.net@gmail.com]
> >> >> Sent: Friday, October 22, 2010 3:18 PM
> >> >> To: solr-user@lucene.apache.org
> >> >> Subject: Re: How to index long words
> with
> >> StandardTokenizerFactory?
> >> >>
> >> >> I'm using Solr 1.4.1. Now I'm successed
> with
> >> replacing lucene-core jar
> >> >> but maxTokenValue seems to be used in
> very strange
> >> way. Currenty for
> >> >> me it's set to 1024*1024, but I couldn't
> index a
> >> field with just size
> >> >> of ~34kb. I understand that it's a little
> weird to
> >> index such a big
> >> >> data, but I just want to know it doesn't
> work
> >> >>
> >> >> On 22 October 2010 20:36, Steven A Rowe
> <sarowe@syr.edu>
> >> wrote:
> >> >> > Hi Sergey,
> >> >> >
> >> >> > I've opened an issue to add a
> maxTokenLength
> >> param to the
> >> >> StandardTokenizerFactory configuration:
> >> >> >
> >> >> >        https://issues.apache.org/jira/browse/SOLR-2188
> >> >> >
> >> >> > I'll work on it this weekend.
> >> >> >
> >> >> > Are you using Solr 1.4.1?  I ask
> because of
> >> your mention of Lucene
> >> >> 2.9.3.  I'm not sure there will ever be
> a Solr
> >> 1.4.2 release.  I plan on
> >> >> targeting Solr 3.1 and 4.0 for the
> SOLR-2188 fix.
> >> >> >
> >> >> > I'm not sure why you didn't get the
> results
> >> you wanted with your Lucene
> >> >> hack - is it possible you have other
> Lucene jars
> >> in your Solr classpath?
> >> >> >
> >> >> > Steve
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: Sergey Bartunov [mailto:sbos.net@gmail.com]
> >> >> >> Sent: Friday, October 22, 2010
> 12:08 PM
> >> >> >> To: solr-user@lucene.apache.org
> >> >> >> Subject: How to index long words
> with
> >> StandardTokenizerFactory?
> >> >> >>
> >> >> >> I'm trying to force solr to
> index words
> >> which length is more than 255
> >> >> >> symbols (this constant is
> >> DEFAULT_MAX_TOKEN_LENGTH in lucene
> >> >> >> StandardAnalyzer.java) using
> >> StandardTokenizerFactory as 'filter' tag
> >> >> >> in schema configuration XML.
> Specifying
> >> the maxTokenLength attribute
> >> >> >> won't work.
> >> >> >>
> >> >> >> I'd tried to make the dirty
> hack: I
> >> downloaded lucene-core-2.9.3 src
> >> >> >> and changed the
> DEFAULT_MAX_TOKEN_LENGTH
> >> to 1000000, built it to jar
> >> >> >> and replaced original
> lucene-core jar in
> >> solr /lib. But seems like
> >> >> >> that it had bring no effect.
> >>
> >
> >
> >
> >
> 


      

Mime
View raw message