lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Using Lucene to model ownership of documents
Date Thu, 16 Jun 2016 10:36:13 GMT
I'd definitely go for b).  The index will of course be larger for every
extra bit of data you store but it doesn't sound like this would make much
difference.  Likewise for speed of indexing.


--
Ian.


On Wed, Jun 15, 2016 at 2:25 PM, Geebee Coder <g.b.coder@gmail.com> wrote:

> Hi there,
> I would like to use Lucene to solve the following problem:
>
> 1.We have about 100k customers and we have 25 millions of documents.
>
> 2.When a customer performs a text search on the document space, we want to
> return only documents that the customer has access to.
>
> 3.The # of documents a customer owns varies a lot. some have close to 23
> million, some have close to 10k and some own a third of the documents etc.
>
> What is an efficient way to use Lucene in this scenario in terms of
> performance and indexing?
> We have tried a number of solutions such as
>
>  a)100k boolean fields per document that indicates whether a customer has
> access to the document.
>  b)A single text field that has a list of customers who owns the document
> e.g. (customers field : "abc abd cfx...")
> c) the above option with shards by customers
>
> The search&index performance for a was bad. b,c performed better for search
> but lengthened the time needed for indexing & index size.
> We are also thinking about using a custom filter but we are concerned about
> the memory requirements.
>
> Any ideas/suggestions would be really appreciated.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message