lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Maggon <mag...@gmail.com>
Subject Re: Faceted search with OpenBitSet/SortedVIntList
Date Sat, 07 Feb 2009 19:18:20 GMT
Did you look at Solr? It provides faceted search out of the box and is  
built on top of Lucene.

Sameer.

On Feb 7, 2009, at 10:57 AM, Raffaella Ventaglio  
<r.ventaglio@gmail.com> wrote:

> Hi,
>
> I am trying to implement a kind of faceted search using Lucene 2.4.0.
>
> I have a list of configuration rules that tell me how to generate this
> facets and the corresponding queries (that can range from simple term
> queries to complex boolean queries).
>
> When my application starts, it creates the whole set of facets  
> objects and
> initializes them.
> For each facet:
> - I create the query according to the configured rule;
> - I ask the reader for the bitset corresponding to that query and I  
> store it
> in the Facet object;
> - I get the cardinality of the bitset and I save it in the Facet  
> object as
> its "initial count".
>
> When the user does a search I have to update the "counts" associated  
> to each
> Facet:
> - I get the bitset corresponding to the "query + filter" generated  
> by the
> user search;
> - I get the cardinality of the ("search bitset" AND "facet bitset")  
> and I
> save it as the updated count.
>
>
> In my first solution, I used only "OpenBitSetDISI" objects, both for  
> Facet
> bitset and for search bitset.
> So I could use "intersectionCount" method to get updated counts  
> after user
> search.
>
> This works very well and it is very fast, but when the number of  
> documents
> in the index and the number of facets grows it is too memory  
> consuming.
>
>
> So I tried a different solution: when I create facet bitsets I use  
> the same
> rule applied in ChainedFilter/BooleanFilter to decide if I have to  
> store an
> OpenBitSet or a SortedVIntList.
> When I have to calculate updated counts:
> - if the facet has an OpenBitSet, I use the "intersectionCount" method
> directly;
> - if the facet has a SortedVIntList, I first create a new  
> OpenBitSetDISI
> using the SortedVIntList.iterator and then I use the  
> "intersectionCount"
> method.
>
> In this way, I use a smaller amount of memory at initialization  
> time, but
> for each user search I create a large number of objects (that I  
> suddenly
> throw away) and this affects application performance because it  
> wastes a lot
> of time doing GC.
>
> So my question is: is there a better way to accomplish this task?
>
> I think, it would be fine if I could calculate "intersectionCount"  
> directly
> on SortedVIntList objects, but I have not found nothing like that in  
> Lucene
> 2.4 JavaDoc.
> Am I missing something?
>
>
> As a reference, now my index contains more than 500.000 documents  
> and I have
> to create/manage up to 50.000 facets.
> Using "second solution", at initialization time my facets structure  
> requires
> more or less 120MB (and this is good enough), while updating counts  
> it uses
> even 2GB of memory (and this is very bad).
>
> Thanks in advance,
> Raf

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message