lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Shalyminov <ishalymi...@yandex-team.ru>
Subject Re: How to use concurrency efficiently
Date Tue, 02 Apr 2013 19:58:06 GMT
These are not document hits but text hits (to be more specific, spans).
For the search result it is necessary to have the precise number of document and text hits
and a relatively small number of matched text snippets.

I've tried several approaches to optimize the search algorithm but they didn't help - for
the specific types of queries there is indeed a great amount of data to be retrieved from
the index.
At the moment I'm thinking about in-RAM caching of posting lists. Is it possible in Lucene?

-- 
Igor

02.04.2013, 20:44, "Adrien Grand" <jpountz@gmail.com>:
> On Tue, Apr 2, 2013 at 4:39 PM, Igor Shalyminov
> <ishalyminov@yandex-team.ru> wrote:
>
>>  Yes, the number of documents is not too large (about 90 000), but the queries are
very hard. Although they're just boolean, a typical query can produce a result with tens of
millions of hits.
>
> How can there be tens of millions of hits with only 90000 docs?
>
>>  Single-threadedly such a query runs ~20 seconds, which is too slow. therefore,
multithreading is vital for this task.
>
> Indeed, that's super slow. Multithreading could help a little, but
> maybe there is something to do to better index your data so that
> queries get faster?
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message