lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: BitSet in Filters
Date Tue, 12 Aug 2014 15:11:37 GMT
bq: Unless, I can cache these filters in memory, the cost of constructing
this filter at run time per query is not practical

Why do you say that? Do you have evidence? Because lots and lots of Solr
installations do exactly this and they run fine.

So I suspect there's something you're not telling us about your setup. Are
you, say, soft committing often? Do you have autowarming specified?

You're not going to be able to keep your filters based on some other field
in the document. Internally, Lucene uses the internal doc ID as an index
into the bitset. That's baked in to very low levels and isn't going to
change AFAIK.

Best,
Erick


On Mon, Aug 11, 2014 at 11:53 PM, Sandeep Khanzode <
sandeep_khanzode@yahoo.com.invalid> wrote:

> Hi,
>
> The current usage of BitSets in filters in Lucene is limited to applying
> only on docIDs i.e. I can only construct a filter out of a BitSet if I have
> the DocumentIDs handy.
>
> However, with every update/delete i.e. CRUD modification, these will
> change, and I have to again redo the whole process to fetch the latest
> docIDs.
>
> Assume a scenario where I need to tag millions of documents with a tag
> like "Finance", "IT", "Legal", etc.
>
> Unless, I can cache these filters in memory, the cost of constructing this
> filter at run time per query is not practical. If I could map the documents
> to a numeric long identifier and put them in a BitMap, I could then cache
> them because the size reduces drastically. However, I cannot use this
> numeric long identifier in Lucene filters because it is not a docID but
> another regular field.
>
> Please help with this scenario. Thanks,
>
> -----------------------
> Thanks n Regards,
> Sandeep Ramesh Khanzode

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message