lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "howard chen" <howac...@gmail.com>
Subject Re: [Interesting Question] How to implement Indexes Grouping?
Date Sat, 16 Dec 2006 14:22:21 GMT
On 12/16/06, Erick Erickson <erickerickson@gmail.com> wrote:
> I'd start with just one big index and test <G>. My point is that you can't
> speculate. The first question you have to answer is "is searching the whole
> index fast enough given my architecture?" and we can't answer that. Nor can
> you until you try.......
>
> We especially can't speculate since you've provided no clue how many users
> you're talking about. 10? 1,000,000? How many books do you expect them to
> own? 10? 100,000? I can't imagine separate indexes for 1M users each owning
> all 1000 books. I can imagine it for 10 users owning 100 books.....
>
> Assuming that you get decent performance in a single index, I'd create a
> filter at query time for a user. The filter has the bits turned on for the
> books the user owns and include the filter as part of a BooleanQuery when I
> searched the text. The filters could even be permanently stored rather than
> created each time, but I'd save that refinement for later.....
>
> Note that if you do store a filter, they are quite small. 1 bit per book (+
> very small overhead)....
>
> Best
> Erick
>
> On 12/16/06, howard chen <howachen@gmail.com> wrote:
> >
> > Consider the following interesting situation,
> >
> > A library has around 100K book, and want to be indexed by Lucene, this
> > seems to be straight forward, but....
> >
> > The target is:
> >
> > 0. You can search all books in the whole library [easy, just index it]
> >
> > 1. users in this system can own a numbers of books in their personal
> > bookshelf, the users might only want to search book in their bookshelf
> > ONLY.
> >
> > 2. if each users own a copy of the index of their personal bookshelf,
> > this seems to be waste of storage space as books are shared by many
> > users.
> >
> > 3. If no matter users own what book, the whole indexes is to be
> > searched, this seems to be waste of computation power if he just own a
> > few books only.
> >
> >
> > In this situation, how would you design a indexing + search system?
> >
> > Any idea can share?
> >
> > :)
> >
> > Thanks.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>

I agree that filter is a way of implement it. My concern is that with
such big index, say 100K book full text indexed, this will become the
bottom neck and it is difficult to distribute the indexing and
searching.

My initial thinking is to group the index by Call. No, say to divide
100K books into 20 subgroups, and when user search it, it will create
20 threads to search for the book in different servers.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message