lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shai deljo" <shai.de...@gmail.com>
Subject Re: Using Lucene - Design Question
Date Wed, 21 Feb 2007 01:09:28 GMT
I considered getting  Lucene in action but figured I'll wait for the
DVD to come out ;).
Seriously though, they write about RemoteSearchable and use RMI, Is
this the recommended solution? does it scale well?
Thanks

On 2/20/07, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
> Well, there is also a Remote cousin there.  That will let you distribute your indices
over N severs (sounds like you'll need multiple).  You should really take a stroll through
Lucene's javadoc, it's incredibly nice now in winter time.  Or ... clears throat.... you could
get a book ;)
>
> Otis
>  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>
> ----- Original Message ----
> From: shai deljo <shai.deljo@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Tuesday, February 20, 2007 2:05:25 PM
> Subject: Re: Using Lucene - Design Question
>
> Hi,
> Thanks for the reply.
> * Regarding hardware I'll use something similar to: Core 2 Duo -
> 2.66GHz, 2x300 GB disk drives, 4 GB RAM running on one of the Linux
> distributions.
> * Regarding response time I'm looking to be ~300 milliseconds for at
> least 80% of queries and ~500 milliseconds for 95% of queries.
> * Will MultiSearcher (and it's parallel cosine :) ) allow me to search
> indices cross multiple servers or is the assumption is that all
> indices are on 1 server?
> Thanks
>
>
> On 2/20/07, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
> > Hi Shi,
> >
> > Nobody will be able to give you the precise answer, obviously.  The best way is
to try.
> > You didn't say what response time is desirable nor what kind of hardware you will
be using.
> >
> > I wouldn't bother with the Berkeley DB-backed Lucene index for now, just use the
regular one (maybe use non-compound format).
> > If you need to partition your index, MultiSearcher will help you search all your
indices, and its Parallel cousin will let you parallelize those searches.
> > It sounds like rsync will work, but you'll have to make sure that the segments file
gets rsynced last.
> >
> > Otis
> >
> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> > Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
> >
> > ----- Original Message ----
> > From: shai deljo <shai.deljo@gmail.com>
> > To: java-user@lucene.apache.org
> > Sent: Tuesday, February 20, 2007 5:51:13 AM
> > Subject: Using Lucene - Design Question
> >
> > Hi,
> > I have no experience with Lucene and I'm trying to collect some
> > information in order to determine what solution is best for me.
> > I need to index ~50M documents (starting with 10M), the size of each
> > document is ~2k-~5k and I'll index a couple of fields per document. I
> > expect ~20 queries per seconds and each query is ~4 terms. Update rate
> > - not sure what is best and/or possible strategy based on performance,
> > i.e. incremental indexing vs. pushing a full index but as far as the
> > product is concerned most data can be updated daily, the head (let's
> > say 20%) needs hourly (or at least on the order of hours) update.
> > I also need to be able to override the scoring/ranking and inject my
> > own logic and of course  my main concern is response time, especially
> > since i have additional computation on the hits before returning the
> > results.
> >
> > BTW, for the additional ranking/computation i will need to retrieve
> > values that are mapped by a term-field key, i.e. i can't know the key
> > until i have the result and the query in my hands. i figured i would
> > use Oracle Berkeley DB Java edition in order to keep the calls as much
> > as possible in the memory -> any advise on this as well ?
> >
> > For these requirements, do i need to worry about partitioning the
> > Index? If i do partition it, is there a solution to merge the results
> > back or do i need to do it on my own (does Solr do it for me and if it
> > does, can i override the scoring there)?
> > AS far as serving multiple users, will a simple rsync of the index
> > between multiple nodes running the same index (i am not that sensitive
> > to data integrity) work or do i need to look at something like
> > terracotta?
> >
> > In short, i am looking for the simplest solution.
> >
> > Thanks in advance.
> > Shi
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message