Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 88097 invoked from network); 14 May 2008 06:39:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 May 2008 06:39:04 -0000 Received: (qmail 38002 invoked by uid 500); 14 May 2008 06:38:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 37956 invoked by uid 500); 14 May 2008 06:38:56 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 37945 invoked by uid 99); 14 May 2008 06:38:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 May 2008 23:38:56 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stephane.nicoll@gmail.com designates 209.85.132.244 as permitted sender) Received: from [209.85.132.244] (HELO an-out-0708.google.com) (209.85.132.244) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 May 2008 06:38:11 +0000 Received: by an-out-0708.google.com with SMTP id c37so696799anc.49 for ; Tue, 13 May 2008 23:38:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=f24O4owizFzrJrO5SHK1ioGexJl92Wkrjmxc0MP+gl8=; b=u0YCd3miY/7YooWYKHEk3b3JYmlyElQ9ZIekjDWmprCxJL8qg5z6ZOmbqxg/1tDCVfkaz9y5Ac/S083iCNSTFV6MHQnH7Mt+0snn1/++s4ViS1NTEHXSA+cZgk/JB//qFL17CSCnXzVf2s7+eI/vICo8RUQWNd9S/Q2+S53g4do= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Y58Aw1HdEEvQvh6RqUgcy/XZAYRYpQuttjwBz79pfe/wWDN8ghqvGJSJv0NQqwpw4x6OJ6RZrNmGx0YhC8Ihuc1bh4vutBJwNzG8TUHNr6zv6pYz4226WkSw1fu22JEeL8mwwn7eSKKyeXZUIVahUOTflFfjIBISafFUOB44rv4= Received: by 10.100.213.4 with SMTP id l4mr719950ang.53.1210747105414; Tue, 13 May 2008 23:38:25 -0700 (PDT) Received: by 10.100.178.12 with HTTP; Tue, 13 May 2008 23:38:25 -0700 (PDT) Message-ID: <541f12ca0805132338t7e4819ebw27f4211371e49d28@mail.gmail.com> Date: Wed, 14 May 2008 08:38:25 +0200 From: "Stephane Nicoll" To: java-user@lucene.apache.org Subject: Re: confused about an entry in the FAQ In-Reply-To: <541f12ca0805120313y302ece2cu374d25e378bc45e2@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <541f12ca0805100635sf9629abs4cb9b8c0f41f6a79@mail.gmail.com> <48b038c60805101105q767968a2sc9de677130100742@mail.gmail.com> <541f12ca0805120313y302ece2cu374d25e378bc45e2@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ping. Sorry for the long email but I prefer to provide all information firs= t. On Mon, May 12, 2008 at 12:13 PM, Stephane Nicoll wrote: > I tried all this and I am confused about the result. I am trying to > implement an hybrid query handler where I fetch the IDs from a > database criteria and the IDs from a full text lucene query and I > intersect them to return the result to the user. The database query > and the intersection works fine even with high load. However the > lucene query is much slower when the number of concurrent users > raises. > > Here is what I am doing on the lucene side > > final QueryParser queryParser =3D new > QueryParser(criteria.getDefaultField(), analyzer); > final Query q =3D queryParser.parse(criteria.getFullTextQuery()); > // Index Searcher is shared for all threads and is not > reopened during the load test > final IndexSearcher indexSearcher =3D getIndexSearcher(); > final Set result =3D new TreeSet(); > indexSearcher.search(q, new HitCollector() { > public void collect(int i, float v) { > try { > final Document d =3D > indexSearcher.getIndexReader().document(i, new FieldSelector() { > public FieldSelectorResult accept(String s) { > if (s.equals(CatalogItem.ATTR_ID)) { > return FieldSelectorResult.LOAD; > } else { > return FieldSelectorResult.NO_LOAD; > } > } > }); > result.add(Long.parseLong(d.get(CatalogItem.ATTR_ID))= ); > } catch (IOException e) { > throw new RuntimeException("Could not collect > lucene IDs", e); > } > } > }); > return result; > > > When running with one thread, I have the following figures per test: > > Database query is done in[125 msecs] (size=3D598] > Lucene query is done in[80 msecs (size=3D15204] > Intersect is done in[4 msecs] (size=3D103] > Hybrid query is done in[97 msecs] > > -> 327 msec / user > > When running with ten threads, I have the following figures per user per= test: > > Database query is done in[222 msecs] (size=3D94] > Lucene query is done in[2364 msecs (size=3D15367] > Intersect is done in[0 msecs] (size=3D12] > Hybrid query is done in[18 msecs] > > -> 2.5 sec / user !! > > I am just wondering how I can improve this. Clearly there is something > wrong in my code since it's much slower with multiple threads running > concurrently on the same index. The size of the index is 5Mb, I only > store: > > * an "id" field (which is the primary key of the related object in the d= b > * a "class" field which is the class nazme of the related object > (Hibernate search does that for me) > > The "keywords" field is indexed but not stored as it is a > representation of other data stored in the db. The searches are > performed on the keywords field only ("foo AND bar" is a typical > query) > > Any help is appreciated. If you also know a Spring bean that could > take care of opening/closing the index readers properly, let me know. > Hibernate Search introduces deadlock with multiple threads and the > lucene integration in spring modules does not seeem to do what I want. > > Thanks, > St=E9phane > > > > > On Sat, May 10, 2008 at 8:05 PM, Patrick Turcotte wro= te: > > Did you try the IndexSearcher.doc(int i, FieldSelector fieldSelector) = method? > > > > Could be faster because Lucene don't have do "prepare" the whole docu= ment. > > > > Patrick > > > > > > On Sat, May 10, 2008 at 9:35 AM, Stephane Nicoll > > wrote: > > > > > > > From the FAQ: > > > > > > "Don't iterate over more hits than needed. > > > Iterating over all hits is slow for two reasons. Firstly, the searc= h() > > > method that returns a Hits object re-executes the search internally > > > when you need more than 100 hits. Solution: use the search method t= hat > > > takes a HitCollector instead." > > > > > > I had a look to HitCollector but it returns the documentId and the > > > javadoc recommends not fetching the original query there. > > > > > > I have to return *one* indexed field from the query result and > > > currently I am iterating on all results and it's slow. Can you expl= ain > > > a bit more how I could improve this? > > > > > > Thanks, > > > St=E9phane > > > > > > > > > -- > > > Large Systems Suck: This rule is 100% transitive. If you build one, > > > you suck" -- S.Yegge > > > > > > > > --------------------------------------------------------------------= - > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > -- > > > Large Systems Suck: This rule is 100% transitive. If you build one, > you suck" -- S.Yegge > --=20 Large Systems Suck: This rule is 100% transitive. If you build one, you suck" -- S.Yegge --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org