lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Khanzode <>
Subject BitSet in Filters
Date Tue, 12 Aug 2014 06:53:08 GMT
The current usage of BitSets in filters in Lucene is limited to applying only on docIDs i.e.
I can only construct a filter out of a BitSet if I have the DocumentIDs handy.

However, with every update/delete i.e. CRUD modification, these will change, and I have to
again redo the whole process to fetch the latest docIDs. 

Assume a scenario where I need to tag millions of documents with a tag like "Finance", "IT",
"Legal", etc.

Unless, I can cache these filters in memory, the cost of constructing this filter at run time
per query is not practical. If I could map the documents to a numeric long identifier and
put them in a BitMap, I could then cache them because the size reduces drastically. However,
I cannot use this numeric long identifier in Lucene filters because it is not a docID but
another regular field.

Please help with this scenario. Thanks,

Thanks n Regards,
Sandeep Ramesh Khanzode
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message