Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 89449 invoked from network); 18 Dec 2009 16:43:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Dec 2009 16:43:19 -0000 Received: (qmail 60720 invoked by uid 500); 18 Dec 2009 16:43:17 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 60630 invoked by uid 500); 18 Dec 2009 16:43:17 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 60619 invoked by uid 99); 18 Dec 2009 16:43:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Dec 2009 16:43:17 +0000 X-ASF-Spam-Status: No, hits=0.1 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Dec 2009 16:43:09 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1NLfuj-0002by-4V for java-user@lucene.apache.org; Fri, 18 Dec 2009 08:42:49 -0800 Message-ID: <26846129.post@talk.nabble.com> Date: Fri, 18 Dec 2009 08:42:49 -0800 (PST) From: frer To: java-user@lucene.apache.org Subject: Re: Query joining 2 indexes In-Reply-To: <359a92830912180652p40ab2c50y6e3601b2fc00b1e2@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Nabble-From: francois.eric@polymtl.ca References: <4B2B8CF6.4080004@polymtl.ca> <359a92830912180652p40ab2c50y6e3601b2fc00b1e2@mail.gmail.com> Thanks for your answer, I didn't think that using: Document doc irHourly.doc(); would be much faster than using the searcher. I will try that. I have one question though: what is the you reffer to. Since I have searched in the daily index, what is the corresponding hit on the hourly index if i haven't searched in the hourly index yet? Thanks for your help, Francois PS: Concerning your idea to merge the hourly and daily index, I did think about that but the quantity of info I have is already way too large in both indexes (about 100 fields in each) to repeat that information so many times= .=20 My Index is already of 10GB. Erick Erickson wrote: >=20 > Well, making a large OR clause is definitely more efficient than making > N different requests, but you would have to search the results. It doesn'= t > sound very performant. >=20 > Could you go to 50,000 ids? yes, but you have to fiddle with > setMaxClauseCount > because Lucene defaults to a max of 1,024. >=20 > There's no way I know of to emulate a DB join. Usually, whenever I try > the answer is to flatten my data but I don't see a good way to do that > in this case. >=20 > Hmmm, what do you think would happen if you opened an IndexReader > into your Hourly index straight from your Daily query? Something like: >=20 > IndexReader irHourly =3D . > for (each hit in Daily) { > for (each id in this Hourly doc) { > Document doc irHourly.doc(); > addToResponse(doc); // your method here..... > } > } >=20 >=20 > You *probably* want to keep the Hourly reader open between requests, but > since the above isn't searching, you *might* be able to open it each time= . > I'd > go for keeping it open between requests if at all possible. >=20 > And here's a wild and crazy idea. Remember that Lucene documents don't > require that *any* fields be in common. It might make your management > easier if you *combined* the indexes. Crudely, prefix each Daily field > with > "D_" and each hourly field with "H_" and put 'em all in the same index. > I'm > not claiming that's a good solution in your case, but I thought I'd > mention > it > as a possibility. You still can't do joins on them though..... >=20 > Erick >=20 >=20 >=20 > On Fri, Dec 18, 2009 at 9:08 AM, Fran=C3=A7ois Eric > wrote: >=20 >> Hello, >> >> I have a performance problem and would need expert advice on how to go >> about fixing it: >> >> I currently have 2 indexes: Daily and Hourly. The Daily index contains >> about 1,000,000 documents and my Hourly index approximately: 24,000,000 >> documents. My Daily index contains many fields and some of them are IDs >> to >> my Hourly Index. >> What I want to do is fetch data in one request (if possible). >> Right now I do it in many requests: >> 1- Get the matching Daily documents (say it returns 500 documents) >> 2- For each of these documents, locate the Hourly Index Id and fetch it. >> >> Therefore I make 501 requests to lucene. This causes some performance >> issues I guess because of the overhead to making a request to Lucene. >> >> Is it possible to do this in 1 request? I'm thinking no because I'm not >> sure what the result set would be but maybe I'm missing something. >> >> If not I guess it would be possible to build a query with my 500 hourly >> ids >> and make a OR between them to make it in 2 requests....but then I have t= o >> find the matching documents. Will this overflow if I have 50000 ids in >> my >> query? >> >> Anyway, I just want advice on how one would address this situation. >> >> Thank you very much, >> >> Fran=C3=A7ois >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >=20 >=20 --=20 View this message in context: http://old.nabble.com/Query-joining-2-indexes= -tp26843980p26846129.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org