lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shai deljo" <shai.de...@gmail.com>
Subject Re: Using Lucene - Design Question
Date Tue, 20 Feb 2007 19:05:25 GMT
Hi,
Thanks for the reply.
* Regarding hardware I'll use something similar to: Core 2 Duo -
2.66GHz, 2x300 GB disk drives, 4 GB RAM running on one of the Linux
distributions.
* Regarding response time I'm looking to be ~300 milliseconds for at
least 80% of queries and ~500 milliseconds for 95% of queries.
* Will MultiSearcher (and it's parallel cosine :) ) allow me to search
indices cross multiple servers or is the assumption is that all
indices are on 1 server?
Thanks


On 2/20/07, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
> Hi Shi,
>
> Nobody will be able to give you the precise answer, obviously.  The best way is to try.
> You didn't say what response time is desirable nor what kind of hardware you will be
using.
>
> I wouldn't bother with the Berkeley DB-backed Lucene index for now, just use the regular
one (maybe use non-compound format).
> If you need to partition your index, MultiSearcher will help you search all your indices,
and its Parallel cousin will let you parallelize those searches.
> It sounds like rsync will work, but you'll have to make sure that the segments file gets
rsynced last.
>
> Otis
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>
> ----- Original Message ----
> From: shai deljo <shai.deljo@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Tuesday, February 20, 2007 5:51:13 AM
> Subject: Using Lucene - Design Question
>
> Hi,
> I have no experience with Lucene and I'm trying to collect some
> information in order to determine what solution is best for me.
> I need to index ~50M documents (starting with 10M), the size of each
> document is ~2k-~5k and I'll index a couple of fields per document. I
> expect ~20 queries per seconds and each query is ~4 terms. Update rate
> - not sure what is best and/or possible strategy based on performance,
> i.e. incremental indexing vs. pushing a full index but as far as the
> product is concerned most data can be updated daily, the head (let's
> say 20%) needs hourly (or at least on the order of hours) update.
> I also need to be able to override the scoring/ranking and inject my
> own logic and of course  my main concern is response time, especially
> since i have additional computation on the hits before returning the
> results.
>
> BTW, for the additional ranking/computation i will need to retrieve
> values that are mapped by a term-field key, i.e. i can't know the key
> until i have the result and the query in my hands. i figured i would
> use Oracle Berkeley DB Java edition in order to keep the calls as much
> as possible in the memory -> any advise on this as well ?
>
> For these requirements, do i need to worry about partitioning the
> Index? If i do partition it, is there a solution to merge the results
> back or do i need to do it on my own (does Solr do it for me and if it
> does, can i override the scoring there)?
> AS far as serving multiple users, will a simple rsync of the index
> between multiple nodes running the same index (i am not that sensitive
> to data integrity) work or do i need to look at something like
> terracotta?
>
> In short, i am looking for the simplest solution.
>
> Thanks in advance.
> Shi
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message