lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Cowan <>
Subject Re: Searching in same position across multiple fields
Date Wed, 17 Dec 2008 03:20:23 GMT
Hi Hoss,

Thanks for the reply. I've created a JIRA issue to track this --

> the initial thought was that just removing the 
> term1.field=term2.field assertion would allow something liek this to work, 
> but i don't think anyone every tried creating a patch w/tests to verify 
> it.
> I think it would be a great idea.

Great. I've implemented this in the first patch attached to the JIRA 
issue, including a test case. Rather than removing the assertion, I've 
brought in a specialized (very lightweight) subclass of SpanNearQuery -- 
I think the Javadoc should make it clear why (supporting multiple fields 
does screw with the semantics a little).

> couldn't this be solved by an Analyzer that counts the token per fieldname 
> and implements getPositionIncrementGap as..
> 	int result - SOME_BIG_NUM - tokensSeenMap.get(fieldname);
> 	tokensSeenMap.put(fieldname, 0);
> 	return result;

It could, and we could always fall back to this. I've taken my approach 
and put that, also, as a patch against LUCENE-1494. If you're not happy 
with the implementation (it's quite lightweight, and shouldn't break 
Analyzer implementors) then we can do this in our analyzer, as you 
suggest above.

The question is, though (I can't find any Javadoc etc. on this) -- is 
there an implicit assumption that, once set up, Analyzers are (or should 
be) thread-safe? Your suggestion would be hard to do in a threadsafe 
fashion without ThreadLocal maps or some such fun. Most Analyzers seem 
to be 'semi-threadsafe' or better -- i.e. Analyzer itself uses a 
ThreadLocal for the tokenStreams, KeywordAnalyzer keeps no state, 
StandardAnalyzer has state but it's once set up it stays static (though 
there are no publication guarantees around it... hmm), etc. Bringing 
that level of state into an Analyzer seems risky.

Anyway, please do check out the JIRA issue and let me know what you 
think. I think both issues are addressed relatively cleanly.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message