lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geebee Coder <>
Subject Using Lucene to model ownership of documents
Date Wed, 15 Jun 2016 13:25:24 GMT
Hi there,
I would like to use Lucene to solve the following problem:

1.We have about 100k customers and we have 25 millions of documents.

2.When a customer performs a text search on the document space, we want to
return only documents that the customer has access to.

3.The # of documents a customer owns varies a lot. some have close to 23
million, some have close to 10k and some own a third of the documents etc.

What is an efficient way to use Lucene in this scenario in terms of
performance and indexing?
We have tried a number of solutions such as

 a)100k boolean fields per document that indicates whether a customer has
access to the document.
 b)A single text field that has a list of customers who owns the document
e.g. (customers field : "abc abd cfx...")
c) the above option with shards by customers

The search&index performance for a was bad. b,c performed better for search
but lengthened the time needed for indexing & index size.
We are also thinking about using a custom filter but we are concerned about
the memory requirements.

Any ideas/suggestions would be really appreciated.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message