Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 94399 invoked from network); 22 Feb 2007 22:40:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Feb 2007 22:40:31 -0000 Received: (qmail 76652 invoked by uid 500); 22 Feb 2007 22:40:34 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 75874 invoked by uid 500); 22 Feb 2007 22:40:32 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 75863 invoked by uid 99); 22 Feb 2007 22:40:32 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Feb 2007 14:40:32 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [66.104.95.4] (HELO listing.marketingbrokers.com) (66.104.95.4) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 22 Feb 2007 14:40:20 -0800 Received: from ip66-104-95-21.z95-104-66.customer.algx.net ([66.104.95.21]) by listing.marketingbrokers.com (JAMES SMTP Server 2.2.0) with SMTP ID 142 for ; Thu, 22 Feb 2007 14:39:38 -0800 (PST) Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: <9073976.post@talk.nabble.com> References: <17f11d00702200251u4838a842p5dd27c472dc96953@mail.gmail.com> <12149.61556.qm@web50314.mail.yahoo.com> <17f11d00702201709r3bd00dau835f457b6d76bedb@mail.gmail.com> <9073976.post@talk.nabble.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: "Peter W." Subject: Re: Using Lucene - Design Question Date: Thu, 22 Feb 2007 14:39:39 -0800 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org Hello, If you have experience using XML and doing web services requests Solr is what you need. It's production quality code and evolving quickly. It has a remarkable amount of extra functionality. For CORBA type programmers, go with terracotta. It looks to go a step further beyond sharing objects to sharing/clustering JVMs. The RMI capabilities of RemoteSearchable within Lucene seem to have been developed before Solr gained traction. I tried taking some working RMI code and writing an inner class with Lucene but it didn't feel robust. Research on the mailing lists brings up older file copying techniques based on synching the indexes with rsync. Probably still in use, it looks to be an old-school solution better addressed by Solr. If you are mirroring your index in a database, there are some combined Lucene/db update methods available: 1. mysql replication - data on the master is continuously updated and replicates behind the scenes to remote slaves. Lucene/db indexing code on each remote slave is a cron job. 2. Lucene indexing application on remote boxes makes network call to central database, getting/indexing new data and reloading it's own local ramdir. For someone trying to get work done, use incremental updates to one local index first. Then explore writing to multiple indexes and reading them using MultiSearcher. Afterward, use HTTP-based updates/requests with Solr to scale out. Hope that helps. Peter W. On Feb 20, 2007, at 5:29 PM, orion wrote: > > If you'd like to try using Terracotta, we (Terracotta) would be > glad to help > you out. If you want more info, you can email me directly (orion at > terracotta.org) or you can use our web forums (http:// > forums.terracotta.org) > or our user mailing list (http://lists.terracotta.org/) > > Cheers, > Orion > > > > shai deljo wrote: >> >> I considered getting Lucene in action but figured I'll wait for the >> DVD to come out ;). >> Seriously though, they write about RemoteSearchable and use RMI, Is >> this the recommended solution? does it scale well? >> Thanks >> >> On 2/20/07, Otis Gospodnetic wrote: >>> Well, there is also a Remote cousin there. That will let you >>> distribute >>> your indices over N severs (sounds like you'll need multiple). You >>> should really take a stroll through Lucene's javadoc, it's >>> incredibly >>> nice now in winter time. Or ... clears throat.... you could get >>> a book >>> ;) >>> >>> Otis >>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . >>> Simpy -- http://www.simpy.com/ - Tag - Search - Share >>> >>> ----- Original Message ---- >>> From: shai deljo >>> To: java-user@lucene.apache.org >>> Sent: Tuesday, February 20, 2007 2:05:25 PM >>> Subject: Re: Using Lucene - Design Question >>> >>> Hi, >>> Thanks for the reply. >>> * Regarding hardware I'll use something similar to: Core 2 Duo - >>> 2.66GHz, 2x300 GB disk drives, 4 GB RAM running on one of the Linux >>> distributions. >>> * Regarding response time I'm looking to be ~300 milliseconds for at >>> least 80% of queries and ~500 milliseconds for 95% of queries. >>> * Will MultiSearcher (and it's parallel cosine :) ) allow me to >>> search >>> indices cross multiple servers or is the assumption is that all >>> indices are on 1 server? >>> Thanks >>> >>> >>> On 2/20/07, Otis Gospodnetic wrote: >>>> Hi Shi, >>>> >>>> Nobody will be able to give you the precise answer, obviously. The >>> best way is to try. >>>> You didn't say what response time is desirable nor what kind of >>> hardware you will be using. >>>> >>>> I wouldn't bother with the Berkeley DB-backed Lucene index for now, >>> just use the regular one (maybe use non-compound format). >>>> If you need to partition your index, MultiSearcher will help you >>>> search >>> all your indices, and its Parallel cousin will let you >>> parallelize those >>> searches. >>>> It sounds like rsync will work, but you'll have to make sure >>>> that the >>> segments file gets rsynced last. >>>> >>>> Otis >>>> >>>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . >>>> Simpy -- http://www.simpy.com/ - Tag - Search - Share >>>> >>>> ----- Original Message ---- >>>> From: shai deljo >>>> To: java-user@lucene.apache.org >>>> Sent: Tuesday, February 20, 2007 5:51:13 AM >>>> Subject: Using Lucene - Design Question >>>> >>>> Hi, >>>> I have no experience with Lucene and I'm trying to collect some >>>> information in order to determine what solution is best for me. >>>> I need to index ~50M documents (starting with 10M), the size of >>>> each >>>> document is ~2k-~5k and I'll index a couple of fields per >>>> document. I >>>> expect ~20 queries per seconds and each query is ~4 terms. >>>> Update rate >>>> - not sure what is best and/or possible strategy based on >>>> performance, >>>> i.e. incremental indexing vs. pushing a full index but as far as >>>> the >>>> product is concerned most data can be updated daily, the head >>>> (let's >>>> say 20%) needs hourly (or at least on the order of hours) update. >>>> I also need to be able to override the scoring/ranking and >>>> inject my >>>> own logic and of course my main concern is response time, >>>> especially >>>> since i have additional computation on the hits before returning >>>> the >>>> results. >>>> >>>> BTW, for the additional ranking/computation i will need to retrieve >>>> values that are mapped by a term-field key, i.e. i can't know >>>> the key >>>> until i have the result and the query in my hands. i figured i >>>> would >>>> use Oracle Berkeley DB Java edition in order to keep the calls >>>> as much >>>> as possible in the memory -> any advise on this as well ? >>>> >>>> For these requirements, do i need to worry about partitioning the >>>> Index? If i do partition it, is there a solution to merge the >>>> results >>>> back or do i need to do it on my own (does Solr do it for me and >>>> if it >>>> does, can i override the scoring there)? >>>> AS far as serving multiple users, will a simple rsync of the index >>>> between multiple nodes running the same index (i am not that >>>> sensitive >>>> to data integrity) work or do i need to look at something like >>>> terracotta? >>>> >>>> In short, i am looking for the simplest solution. >>>> >>>> Thanks in advance. >>>> Shi >>>> >>>> ------------------------------------------------------------------- >>>> -- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------- >>>> -- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>> >>> -------------------------------------------------------------------- >>> - >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------- >>> - >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> > > -- > View this message in context: http://www.nabble.com/Using-Lucene--- > Design-Question-tf3259160.html#a9073976 > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org