lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henrik Hjalmarsson" <henrik.hjalmars...@raa.se>
Subject Sv: Re: Facets
Date Thu, 05 Nov 2009 16:03:30 GMT
>>> Toke Eskildsen  09-11-03 14:43 >>>
On Tue, 2009-11-03 at 10:23 +0100, Henrik Hjalmarsson wrote:
> I have gotten a demand for an API method that returns an XML response,
> listing all the indexes in this application and the number of unique
> values these indexes can have, filtered by a query that is recieved in
> the method request.

We've had the same request a number of times, but when we discuss the
scenarios in detail, they can always be scaled down to "The first X
values", instead of "all values", where X is < 1000.


While you can build efficient handling of the faceting on fields with
many terms, simply returning the Strings for the terms (ignoring all the
grunt work of extraction) poses problems.

5 million values of 20 characters each takes up about 
 5M * (20 * 2 + ~40) bytes ~ 400MByte
of RAM. If you wrap that in nice XML and send it using SOAP, memory
usage goes through the roof. Streaming, as Ian suggests, seems to be the
answer here.

> The application contains a large amount of indexes and some indexes
> contains a very large amount of unique values. Is there some way to
> achive this in an effective way?

It is definitely possible in the case where you limit the number of
returned values. Well, at least we've tested it for 1000M unique values
in 100M documents. But before we go there, it would help to know what
you mean by "large".

Ok. To be honest, I don't know what is considered "large" for Lucene. But its significanly
larger than what examples I've managed to find so far. Its roughly around 100 different indexes
at the moment and every index (from what I know) can have everything from 2 unique values
up to 50 000 or 200 000 unique values.

Regards,
Toke Eskildsen


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message