lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Bazhenov <>
Subject Re: Using Lucene to model ownership of documents
Date Fri, 17 Jun 2016 01:27:12 GMT
The speed for a and b, should be the same, at least from conceptual point of view. The number
of terms generated for each scenario is equal. Therefore, index size and vocabulary size should
be the same.

I’m wondering why there is difference. It seems like there is some penalty for writing/reading
terms for different fields, but I can’t elaborate on that. Could you provide index size
for scenarios a and b?

Scenario c could be the fastest in terms of search and indexing speed, but it’s far more
complex and make sense only if you have a need for scaling your system. Which imply you can’t
solve problem on the single box.

So, if there is no need for scaling, I’d go with b because of simplicity.

> On Jun 15, 2016, at 23:25, Geebee Coder <> wrote:
> Hi there,
> I would like to use Lucene to solve the following problem:
> 1.We have about 100k customers and we have 25 millions of documents.
> 2.When a customer performs a text search on the document space, we want to
> return only documents that the customer has access to.
> 3.The # of documents a customer owns varies a lot. some have close to 23
> million, some have close to 10k and some own a third of the documents etc.
> What is an efficient way to use Lucene in this scenario in terms of
> performance and indexing?
> We have tried a number of solutions such as
> a)100k boolean fields per document that indicates whether a customer has
> access to the document.
> b)A single text field that has a list of customers who owns the document
> e.g. (customers field : "abc abd cfx...")
> c) the above option with shards by customers
> The search&index performance for a was bad. b,c performed better for search
> but lengthened the time needed for indexing & index size.
> We are also thinking about using a custom filter but we are concerned about
> the memory requirements.
> Any ideas/suggestions would be really appreciated.

Denis Bazhenov <>

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message