lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rene Hackl-Sommer <rene.a.ha...@gmx.de>
Subject Re: Increase number of available positions?
Date Mon, 15 Mar 2010 13:59:51 GMT

> Is your entire corpus a single document? Because I'm having trouble
> imagining a single document where this would be a problem, unless
> your increment gap is huge. The term positions are relative to
> a single document...
>    

It is getting pretty huge, yes (see below). The term positions are also 
relative to a single field, aren't they?

>> <MyField>
>> <Level_1>
>> <Level_2>
>> <Level_3>
>>
>>      
Let me plug in some figures to help clarify. On Level 3 there are 
hundreds of tokens. So to be able to search two or more terms in MyField 
in the same Level_3, I put a position gap of 1000 between all Level_3's. 
Per Level_2 there might be hundreds of Level_3 entries. As I want to 
restrict the search to all Level_3 entries of a Level_2, I set the 
position increment gap for Level_2 at 1000x1000 = 1,000,000 (1000 for 
the Tokens in Level_3 and 1000 for the Level_3 entries in Level_2).

This done, Level_1 still needs to be accomodated. If you're looking at 
500 Level_2 entries, a gap of 1,000,000x500 is needed per Level_1 entry, 
to be able to search only within each of the Level_1 elements.That way 
only four Level_1 entries can be included before the maximum value is 
reached.

Queries I am looking to support might look like this in an easy case:

Search in MyField: Terms T1 and T2 on Level_2 and T3, T4, and T5 on 
Level_3, which should both be in the same Level_1.

Sorry if this is confusing, what with all these levels going on. I think 
what it comes down to is whether the integer based position counting 
might be replaced by long. Can this be done at all? Are performance or 
other implications conceivable? Or is the current implementation too 
deeply wired to Lucene core workings to make this a reasonable endeavour?

Cheers
Rene

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message