Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of stephane.nicoll@gmail.com
 designates 209.85.132.244 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
        b=Y58Aw1HdEEvQvh6RqUgcy/XZAYRYpQuttjwBz79pfe/wWDN8ghqvGJSJv0NQqwpw4x6OJ6RZrNmGx0YhC8Ihuc1bh4vutBJwNzG8TUHNr6zv6pYz4226WkSw1fu22JEeL8mwwn7eSKKyeXZUIVahUOTflFfjIBISafFUOB44rv4=
Message-ID: <541f12ca0805132338t7e4819ebw27f4211371e49d28@mail.gmail.com>
Date: Wed, 14 May 2008 08:38:25 +0200
From: "Stephane Nicoll" <stephane.nicoll@gmail.com>
To: java-user@lucene.apache.org
Subject: Re: confused about an entry in the FAQ
In-Reply-To: <541f12ca0805120313y302ece2cu374d25e378bc45e2@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
References: <541f12ca0805100635sf9629abs4cb9b8c0f41f6a79@mail.gmail.com>
	 <48b038c60805101105q767968a2sc9de677130100742@mail.gmail.com>
	 <541f12ca0805120313y302ece2cu374d25e378bc45e2@mail.gmail.com>

ping. Sorry for the long email but I prefer to provide all information firs=
t.

On Mon, May 12, 2008 at 12:13 PM, Stephane Nicoll
<stephane.nicoll@gmail.com> wrote:
> I tried all this and I am confused about the result. I am trying to
>  implement an hybrid query handler where I fetch the IDs from a
>  database criteria and the IDs from a full text lucene query and I
>  intersect them to return the result to the user. The database query
>  and the intersection works fine even with high load. However the
>  lucene query is much slower when the number of concurrent users
>  raises.
>
>  Here is what I am doing on the lucene side
>
>         final QueryParser queryParser =3D new
>  QueryParser(criteria.getDefaultField(), analyzer);
>         final Query q =3D queryParser.parse(criteria.getFullTextQuery());
>         // Index Searcher is shared for all threads and is not
>  reopened during the load test
>         final IndexSearcher indexSearcher =3D getIndexSearcher();
>         final Set<Long> result =3D new TreeSet<Long>();
>         indexSearcher.search(q, new HitCollector() {
>             public void collect(int i, float v) {
>                 try {
>                     final Document d =3D
>  indexSearcher.getIndexReader().document(i, new FieldSelector() {
>                         public FieldSelectorResult accept(String s) {
>                             if (s.equals(CatalogItem.ATTR_ID)) {
>                                 return FieldSelectorResult.LOAD;
>                             } else {
>                                 return FieldSelectorResult.NO_LOAD;
>                             }
>                         }
>                     });
>                     result.add(Long.parseLong(d.get(CatalogItem.ATTR_ID))=
);
>                 } catch (IOException e) {
>                     throw new RuntimeException("Could not collect
>  lucene IDs", e);
>                 }
>             }
>         });
>         return result;
>
>
>  When running with one thread, I have the following figures per test:
>
>  Database query is done in[125 msecs] (size=3D598]
>  Lucene query is done in[80 msecs (size=3D15204]
>  Intersect is done in[4 msecs] (size=3D103]
>  Hybrid query is done in[97 msecs]
>
>  -> 327 msec / user
>
>  When running with ten threads, I have the following figures per user per=
 test:
>
>  Database query is done in[222 msecs] (size=3D94]
>  Lucene query is done in[2364 msecs (size=3D15367]
>  Intersect is done in[0 msecs] (size=3D12]
>  Hybrid query is done in[18 msecs]
>
>  -> 2.5 sec / user !!
>
>  I am just wondering how I can improve this. Clearly there is something
>  wrong in my code since it's much slower with multiple threads running
>  concurrently on the same index. The size of the index is 5Mb, I only
>  store:
>
>  * an "id" field (which is the primary key of the related object in the d=
b
>  * a "class" field which is the class nazme of the related object
>  (Hibernate search does that for me)
>
>  The "keywords" field is indexed but not stored as it is a
>  representation of other data stored in the db. The searches are
>  performed on the keywords field only ("foo AND bar" is a typical
>  query)
>
>  Any help is appreciated. If you also know a Spring bean that could
>  take care of opening/closing the index readers properly, let me know.
>  Hibernate Search introduces deadlock with multiple threads and the
>  lucene integration in spring modules does not seeem to do what I want.
>
>  Thanks,
>  St=E9phane
>
>
>
>
>  On Sat, May 10, 2008 at 8:05 PM, Patrick Turcotte <patrek@gmail.com> wro=
te:
>  > Did you try the IndexSearcher.doc(int i, FieldSelector fieldSelector) =
 method?
>  >
>  >  Could be faster because Lucene don't have do "prepare" the whole docu=
ment.
>  >
>  >  Patrick
>  >
>  >
>  >  On Sat, May 10, 2008 at 9:35 AM, Stephane Nicoll
>  >  <stephane.nicoll@gmail.com> wrote:
>  >
>  >
>  > > From the FAQ:
>  >  >
>  >  > "Don't iterate over more hits than needed.
>  >  > Iterating over all hits is slow for two reasons. Firstly, the searc=
h()
>  >  > method that returns a Hits object re-executes the search internally
>  >  > when you need more than 100 hits. Solution: use the search method t=
hat
>  >  > takes a HitCollector instead."
>  >  >
>  >  > I had a look to HitCollector but it returns the documentId and the
>  >  > javadoc recommends not fetching the original query there.
>  >  >
>  >  > I have to return *one* indexed field from the query result and
>  >  > currently I am iterating on all results and it's slow. Can you expl=
ain
>  >  > a bit more how I could improve this?
>  >  >
>  >  > Thanks,
>  >  > St=E9phane
>  >  >
>  >  >
>  >  > --
>  >  > Large Systems Suck: This rule is 100% transitive. If you build one,
>  >  > you suck" -- S.Yegge
>  >  >
>  >
>  > > --------------------------------------------------------------------=
-
>  >  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  >
>  > > For additional commands, e-mail: java-user-help@lucene.apache.org
>  >  >
>  >  >
>  >
>  >
>  > ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  >
>  >
>  > For additional commands, e-mail: java-user-help@lucene.apache.org
>  >
>  >
>
>
>
>  --
>
>
> Large Systems Suck: This rule is 100% transitive. If you build one,
>  you suck" -- S.Yegge
>


--=20
Large Systems Suck: This rule is 100% transitive. If you build one,
you suck" -- S.Yegge

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org