lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <ysee...@gmail.com>
Subject Re: Aggregating category hits
Date Sat, 10 Jun 2006 20:19:17 GMT
On 6/10/06, zzzzz shalev <zzzzz_shalev@yahoo.com> wrote:
>   1. could you let me know what kind of response time you were getting with solr (as
well as the size of data and result sizes)

A can tell you a little bit about ours... on one CNET faceted browsing
implementation using Solr, the number of facets to check per request
average somewhere between 100 and 200 (the total number of unique
facets is much larger though).  The median request time is 3ms (and I
don't think the majority of that time is calculating set
intersections).

We actually don't have the LRUCaches set large enough to achieve a
100% hit rate, but performance is still fine.

>   2. i took a really really quick look at DocSetHitCollector  and saw the dreaded
>
>   if (bits==null) bits = new BitSet(maxDoc);

Yes, DocSets can be memory intensive.  A BitSet is only used when the
number of results gets larger than a threshold... below that, a
HashDocSet is used that is O(n) rather than O(maxDoc).  So the memory
footprint also depends on the cardinality of the sets.

>   since i rewrote some lucene code to support 64-bit search instances i have indexes
that may reach quite a few GB's ,

GBs of index size, or actually billions of documents.  It's the number
of documents that matters in this case.

> allocating bitset's (arrays of long's is quite expensive memory wise and i am still a
little
> skeptical about performance with large result sets)

I just checked in a replacement for BitSet that takes intersection
counts much faster.

>   i did some testing of my facet impl and after an overnight webload session received
about a 500 milli response time average for full faceting (with result sets from a few thousand
to over 100,000)

How many documents was that with, and how many facets per document?

I certainly am interested in more memory efficient faceted browsing,
and have been meaning to try some alternatives.  So far, we've had
good results using cached DocSets though.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message