jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Kiehl ...@sulu3000.de>
Subject Re: improving the scalability in searching
Date Wed, 15 Aug 2007 07:29:02 GMT
Marcel Reutegger wrote:
> Ard Schrijvers wrote:
>> IMO, we should index more (derived) data about a documents properties 
>> (I'll
>> return to this in a mail about IndexingConfiguration which I think we 
>> can add
>> some features that might tackle this) if we want to be able to query 
>> fast.
>> For this specific problem, the solution would be very simple:
>>
>> I suggest to add
>>
>> /** * Name of the field that contains all available properties that 
>> present
>> for a certain node */ public static final String PROPERTIES_SET =
>> "_:PROPERTIES_SET".intern();
>>
>> and when indexing a node, each property name of that node is added to its
>> index (few lines of code in NodeIndexer):
>>
>> Then, when searching for all nodes that have a property, is one single
>> docs.seek(terms); and set the docFilter. This approach scales to 
>> millions of
>> documents easily with times close to 0 ms. WDOT? Ofcourse, I can 
>> implement
>> this in the trunk.
> 
> I agree with you that the current implementation is not optimized for 
> queries that check the existence of a property. Your proposed solution 
> seems reasonable, I would implement it the same way. There's just one 
> minor obstacle, how do we implement this change in a backward compatible 
> way? an existing index without this additional field should still work.

We could use IndexReader.getFieldNames() at startup to check if such a 
field already exists which means we have an index in the new format and 
then use this information in MatchAllScorer to decide which 
implementation to use.

Cheers,
Christoph


Mime
View raw message