Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 43869 invoked from network); 28 Apr 2008 18:39:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Apr 2008 18:39:43 -0000 Received: (qmail 49732 invoked by uid 500); 28 Apr 2008 18:39:37 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 49601 invoked by uid 500); 28 Apr 2008 18:39:37 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 49590 invoked by uid 99); 28 Apr 2008 18:39:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Apr 2008 11:39:37 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [63.65.184.134] (HELO birexchange.BIRPLAZA.local) (63.65.184.134) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Apr 2008 18:38:52 +0000 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Subject: search performance & caching X-MimeOLE: Produced By Microsoft Exchange V6.5.6944.0 Date: Mon, 28 Apr 2008 14:38:56 -0400 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: search performance & caching Thread-Index: AcipMW3UxBnJpr43ShCpEs1cV8TccQAKQWrA From: "Beard, Brian" To: X-Virus-Checked: Checked by ClamAV on apache.org I'm using lucene 2.2.0 & have two questions: 1) Should search times be linear wrt number of queries hitting a single searcher? I've run multiple search threads against a single searcher, and the search times are very linear - 10x slower for 10 threads vs 1 thread, etc. I'm using a paralle multi-searcher with a custom hit collector. 2) I'm performing some field caching during search warmup. For an index of 3.4 million doc's and 7GB, it's taking up to 30 minutes to execute the code snippet below. Most of this time is involved with the multireader.document call (where it says "THIS TAKES THE MOST TIME"). I want to know if anyone has any ideas for speeding this up. There are multiple documents containing the same recordId. I want to figure out which two documents with the same recordId also have a documentName of CORE or WL. Then for each document in the index I store three pieces of information: - it's associated recordId - the CORE doc number for this recordId. - the WL doc number for this recordId Ideally, since the multiReader.document call is taking the most time, I'd like to not have to perform this. Although I can't figure out how to get around needing to read in the recordId. What I really need is something like a two dimensional termEnum I could iterate over - for the recordId and documentName fields. Any ideas are appreciated. // Now loop through all documents in the indexes and set the cache values. TermDocs termDocs =3D multiReader.termDocs(); TermEnum termEnum =3D multiReader.terms (new Term ("RECORD_ID", "")); try { FieldSelector fieldSelector =3D getFieldSelector(); List docList =3D new ArrayList(); int regularCoreDocId =3D -1; int wlCoreDocId =3D -1; int docId =3D -1; Document document =3D null; String documentName =3D null; =20 // Loop through each RECORD_ID with termEnums do { docList.clear(); regularCoreDocId =3D -1; wlCoreDocId =3D -1; =09 Term term =3D termEnum.term(); if (term =3D=3D null || term.field() !=3D field) { break; } String recordId =3D term.text(); =20 // Now loop through all documents with the same recordId // using the termDocs. termDocs.seek(termEnum); while (termDocs.next()) { docId =3D termDocs.doc(); docList.add(Integer.valueOf(docId)); // THIS TAKES THE MOST TIME document =3D multiReader.document(docId, fieldSelector); documentName =3D document.get("DOCUMENT_NAME"); if ("CORE".equals(documentName)) { regularCoreDocId =3D docId; } else if ("WL".equals(documentName)) { wlCoreDocId =3D docId; } } =20 // Map all docId's associated with this recordId for (Integer i : docList) { doc2RecordId [i] =3D recordId; } =20 // Map from the docId to the coreData docId for =20 // regular core and wl core documents. for (Integer i : docList) { doc2RegularCoreDoc[i] =3D regularCoreDocId; wlCoreDocId [i] =3D wlCoreDocId; } } while (termEnum.next()); } finally { termDocs.close(); termEnum.close(); } --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org