lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SUJIT PAL <sujit....@comcast.net>
Subject Re: Statically store sub-collections for search (faceted search?)
Date Fri, 12 Apr 2013 18:08:33 GMT
Hi Carsten,

Why not use your idea of the BooleanQuery but wrap it in a Filter instead? Since you are not
doing any scoring (only filtering), the max boolean clauses limit should not apply to a filter.

-sujit

On Apr 12, 2013, at 7:34 AM, Carsten Schnober wrote:

> Dear list,
> I would like to create a sub-set of the documents in an index that is to
> be used for further searches. However, the criteria that lead to the
> creation of that sub-set are not predefined so I think that faceted
> search cannot be applied my this use case.
> 
> For instance:
> A user searches for documents that contain token 'A' in a field 'text'.
> These results form a set of documents that is persistently stored (in a
> database). Each document in the index has a field 'id' that identifies
> it, so these "external" IDs are stored in the database.
> 
> Later on, a user loads the document IDs from the database and wants to
> execute another search on this set of documents only. However,
> performing a search on the full index and subsequently filtering the
> results against that list of documents takes very long if there are many
> matches. This is obvious as I have to retrieve the external id from each
> matching document and check whether it is part of the desired sub-set.
> Constructing a BooleanQuery in the style "id:Doc1 OR id:Doc2 ..." is not
> suitable either because there could be thousands of documents exceeding
> any limit for Boolean clauses.
> 
> Any suggestions how to solve this? I would have gone for the Lucene
> document numbers and store them as a bit set that I could use as a
> filter during later searches, but I read that the document numbers are
> ephemeral.
> 
> One possible way out seems to be to create another index from the
> documents that have matched the initial search, but this seems quite an
> overkill, especially if there are plenty of them...
> 
> Thanks for any hint!
> Carsten
> 
> -- 
> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> Projekt KorAP                 | http://korap.ids-mannheim.de
> Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
> Korpusanalyseplattform der nächsten Generation
> Next Generation Corpus Analysis Platform
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message