lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Using separate index for each user
Date Tue, 16 Sep 2008 15:17:46 GMT
The main arguments against using many separate indexes are
1> search warmup time. That is, each time you open an index
     the first few queries take much longer than subsequent searches.
2> Managing a bazillion indexes is non-trivial.


That said, in your particular case these may not apply. I guess the
piece of information that really counts is "how often do you expect
to update/search a given index"? You could avoid the warmup issue
by keeping an index open for some period of time after the first
search on the assumption that the user is going to make multiple
searches rather than just one. I'm sure there are other tricks
you can try.

So, how often do you expect
1> users to backup date
2> users to query data?
and what is acceptable search response time? and are your
users willing to live with a significant delay on the first couple
of queries?

I'd only be comfortable with choosing an approach if I tried
it out with a single computer's content and generated a few
stats....

Best
Erick

On Tue, Sep 16, 2008 at 10:55 AM, Tobias Larsson Hult <
tobias.larsson.hult@findwise.se> wrote:

> Hi,
>
> We're thinking of using Lucene to integrate search in a backup service
> application. The background is that we have a bunch of users using a backup
> service, and we want them to be able to search their own, and only their
> own, backups.
>
> The total amount of data that's being backed up is very large (size in
> terabyte). Even though the index will probably be smaller due to only
> indexing relevant fields, it is still to much to incorporate in one index.
> But since a user will only search in his/her own files we're thinking of
> creating one index for each user. There will be a lot of indexes of course
> but each index will not span to more than a couple of gigabytes at the most.
>
> So when a user searches or adds new content to the backup we will open up
> his/her index and to a search/update in that particular index. That way,
> each query/update should not be so performance intense.
>
> Does this sound like a reasonable solution?  Of course this means creating
> a lot of IndexReaders/Writers but I prefer that to searching in a huge index
> everytime when a user only wants to search in a slice of the total index.
>
> Best Regards,
> Tobias Larsson Hult
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message