lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MitchK <mitc...@web.de>
Subject Re: Minimum Should Match the other way round
Date Fri, 09 Apr 2010 12:57:08 GMT

Hoss,

before I ran into some missunderstandings, I want to come back to topic
first. I will have a look at some classes later, to find out whether some
other ideas which are not directly related to this topic (like the
multiword-synonyms at query-time) will work or not. I'm sorry for beeing
off-topic.

Chris Hostetter-3 wrote:
> 
> where the analyzer matters is in creating that numeric field at index time 
> ... hence my suggestion of having an analyzer chain that exactly matches 
> the field you are interested in, but ending with a TokenCountingFilter -- 
> it can take care of creating the "numeric-ish" (padded) field value when 
> the docs are indexed.
> 

Okay, as I have understood you mean something like this:

			<tokenizer class="solr.WhitespaceTokenizerFactory"/>
			<filter class="solr.LowerCaseFilterFactory"/>
			<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="false"/>
			<filter class="solr.WordDelimiterFilterFactory" 
					generateWordParts="1" 
					generateNumberParts="1" 
					catenateWords="0" 
					catenateNumbers="0" 
					catenateAll="0" 
					splitOnCaseChange="1"/>
                        <filter class="my.TokenCountingTokenFilter"/>

This fieldType should "store" (or let's say index) the number of tokens as
something like "005" for 5 token, right?

My problem is that I don't know how to query this field. 
I know what you mean with appending the query with "Add +titleLen:[* TO
MAX_LEN]" - but I don't know how to retrive the MAX_LEN information for a
specific query, since it depends in some cases of what an analyzer-chain
will be used at the tokenLen-field.

For example: I think it makes sense to use a WordDelimiterFilter at the end
of my TokenFilter-chain.
If my document is something like "The secrets of the iPhone 3G", than I want
to index it as "The secrets of the iPhone 3 G" (3G is going to be indexed as
two tokens).
This means, that the document length is increased by one token.

However, maybe I missunderstood your point:
"- Pick MAX_LEN Based On Number Of Query Clauses From Super" 
since I thought, that the number of query clauses depends on the number of
whitespaces in my query. If I am wrong, and it depends on the result of my
analyzer-chain, there is no problem. But I am not sure, if this is the case
or not.

Thank you for help.

- Mitch
-- 
View this message in context: http://n3.nabble.com/Minimum-Should-Match-the-other-way-round-tp694867p708264.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message