lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Jones>
Subject Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)
Date Wed, 02 Nov 2005 17:20:23 GMT
> Ah, so the fact that "1" actually appears many times in the string you
> give Lucene is important.  Neat application!
> Sounds like the custom Analyzer (really a custom TokenStream) approach
> suggested by others may be the way for you to go.  If the information
> you get from the MySQL profiles is an artistid and count, you could code
> up an Analyzer to emit N "1" tokens in a row, rather than concatenate N
> "1"s together into a single string and then let QueryParser, et. al.,
> parse them back apart again.  Even if you get all N of the "1"s from
> MySQL, a custom Analyzer could emit them, one at a time, rather than the
> concatenate, parse-apart cycle.

Okay, this sounds like a good lead :) I'm going to delve a little deeper into 
the steps lucene takes to build the index and give this a try. 

Will post to this thread again when i've had a chance to give it a try!

Thanks for all the pointers

> --MDC
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message