lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geebee Coder <g.b.co...@gmail.com>
Subject Re: Using Lucene to model ownership of documents
Date Thu, 16 Jun 2016 15:13:35 GMT
Thank you all.
Michael, do you mean grouping customers by categories? (e.g. customer A has
premium access and so does customer B so they will have access to same set
of documents)
if that's the case, unfortunately, we don't have such categories of
customers, their access rights are over specific documents and not tiers.


On Thu, Jun 16, 2016 at 9:37 AM, Michael Wilkowski <mw@silenteight.com>
wrote:

> Definitely b). I would also suggest groups and expanding user groups at
> user sign in time.
>
> MW
>
> On Thu, Jun 16, 2016 at 12:36 PM, Ian Lea <ian.lea@gmail.com> wrote:
>
> > I'd definitely go for b).  The index will of course be larger for every
> > extra bit of data you store but it doesn't sound like this would make
> much
> > difference.  Likewise for speed of indexing.
> >
> >
> > --
> > Ian.
> >
> >
> > On Wed, Jun 15, 2016 at 2:25 PM, Geebee Coder <g.b.coder@gmail.com>
> wrote:
> >
> > > Hi there,
> > > I would like to use Lucene to solve the following problem:
> > >
> > > 1.We have about 100k customers and we have 25 millions of documents.
> > >
> > > 2.When a customer performs a text search on the document space, we want
> > to
> > > return only documents that the customer has access to.
> > >
> > > 3.The # of documents a customer owns varies a lot. some have close to
> 23
> > > million, some have close to 10k and some own a third of the documents
> > etc.
> > >
> > > What is an efficient way to use Lucene in this scenario in terms of
> > > performance and indexing?
> > > We have tried a number of solutions such as
> > >
> > >  a)100k boolean fields per document that indicates whether a customer
> has
> > > access to the document.
> > >  b)A single text field that has a list of customers who owns the
> document
> > > e.g. (customers field : "abc abd cfx...")
> > > c) the above option with shards by customers
> > >
> > > The search&index performance for a was bad. b,c performed better for
> > search
> > > but lengthened the time needed for indexing & index size.
> > > We are also thinking about using a custom filter but we are concerned
> > about
> > > the memory requirements.
> > >
> > > Any ideas/suggestions would be really appreciated.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message