lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Re: speed of BooleanQueries on 2.9
Date Thu, 16 Jul 2009 15:26:53 GMT

We did it for us, gave something back to community... all happy... open source works just
fine here in lucene land :)

Re, 10% 
I did not expect that much, but our index is quite dense, a lot of documents and not too many
unique terms, omitTf ... so it is really hard pressure on DocIDSetIterator and Scorers. 
 
I cannot wait to see P4, pulsing index... in action... 
We are alo going to try to cache postings for Top N high freq. terms in plain old ConstanScoreQuery
via OpenBitSet ... with zipfian distribution this should reduce VInt decoding to 50% with
just a few hundred terms... having TF independent score, we just need to adjust constant score
value based on idf()... so no loss in quality! expected huge performance benefit (said optimist
without numbers to prove it). 

Cheers, Eks 








----- Original Message ----
> From: Michael McCandless <lucene@mikemccandless.com>
> To: java-user@lucene.apache.org
> Sent: Thursday, 16 July, 2009 16:23:57
> Subject: Re: speed of BooleanQueries on 2.9
> 
> Super, thanks for testing!
> 
> And, the 10% speedup overall is good progress...
> 
> Mike
> 
> On Thu, Jul 16, 2009 at 9:16 AM, eks devwrote:
> >
> > and one final touch, 4X slow down does not exist with new Lucene...
> > I did not verify it again on the old one, but hey, who cares. Trunk is clean 
> and, at least so far, our favourite QA team has nothing to complain about ...
> >
> > They will keep it under stress for a while... so if somethings comes up you 
> will hear from me...
> > Thanks again to all.
> >
> > Cheers, Eks
> >
> >
> >
> > ----- Original Message ----
> >> From: eks dev 
> >> To: java-user@lucene.apache.org
> >> Sent: Thursday, 16 July, 2009 14:40:26
> >> Subject: Re: speed of BooleanQueries on 2.9
> >>
> >>
> >> ok new facts, less chaos :)
> >>
> >> - LUCENE-1744 fixed it definitely; I have it confirmed
> >> Also, we found another example of the Query that was stuck (t1 t2 t3)~2 ...

> this
> >> is also fixed with LUCENE-1744
> >>
> >>
> >> Re:  "some queries are 4X slower  than before".  Was that a different issue?
> >> (Because this issue is "the query runs forever").
> >>
> >> Maybe :) I do not know.
> >> When I wrote this email about "the query runs forever" I did not know if this
> >> slowdown is the same or different issue... I have just reported some unusual
> >> observation (4 times slower) and was later convinced that this stuck Query
> >> confirms the same problem ....
> >>
> >> Now, I do not know  if that was the same effect, or wrong measurement, or
> >> something else lurking ... Good point, will try to repeat test on this
> >> slowdown...
> >>
> >> Just a reminder This 4_times_slower Query is different:
> >> +(a b c) +(x y z)
> >>
> >> +((NAME:hans NAME:hahns^0.23232001 NAME:hams^0.27648002 NAME:hamz^0.25392
> >> NAME:hanas^0.18722998 NAME:hanbs^0.18722998 NAME:hanfs^0.18722998
> >> NAME:hangs^0.18722998 NAME:hanhs^0.24030754 NAME:hanis^0.18722998
> >> NAME:hanjs^0.18722998 NAME:hanks^0.18722998 NAME:hanms^0.18722998
> >> NAME:hanos^0.18722998 NAME:hanrs^0.18722998 NAME:hansb^0.20172001
> >> NAME:hansd^0.20172001 NAME:hansf^0.20172001 NAME:hansg^0.20172001
> >> NAME:hansi^0.20172001 NAME:hansj^0.20172001 NAME:hansk^0.20172001
> >> NAME:hansl^0.20172001 NAME:hansn^0.20172001 NAME:hanso^0.20172001
> >> NAME:hansp^0.20172001 NAME:hanst^0.20172001 NAME:hansu^0.20172001
> >> NAME:hansw^0.20172001 NAME:hansy^0.20172001 NAME:hansz^0.20172001
> >> NAME:hants^0.18722998 NAME:hanus^0.18722998 NAME:hanws^0.18722998
> >> NAME:hehns^0.20172001 NAME:hens^0.2736075 NAME:hins^0.24843 NAME:hons^0.24843
> >> NAME:huhns^0.1801875 NAME:huns^0.24843)^2.0)
> >> +(((ZIPS:berlin ZIPS:barlin^0.28227 ZIPS:berien^0.25947002
> >> ZIPS:berling^0.23232001 ZIPS:perlin^0.26133335))^1.2)
> >>
> >>
> >> Thanks!
> >>
> >>
> >>
> >>
> >>
> >> ----- Original Message ----
> >> > From: Michael McCandless
> >> > To: java-user@lucene.apache.org
> >> > Sent: Thursday, 16 July, 2009 13:52:06
> >> > Subject: Re: speed of BooleanQueries on 2.9
> >> >
> >> > On Thu, Jul 16, 2009 at 6:38 AM, eks devwrote:
> >> >
> >> > > and this String has exactly that form
> >> > > (x OR y OR z) OR (a OR b OR c),
> >> > > That is exactly how I construct the Query, have a look at brackets
on 
> this
> >> > toString result .
> >> >
> >> > Duh!  OK, I had missed that your large query actually had 2 clauses at
> >> > the top!  Sigh.
> >> >
> >> > OK, that part of the puzzle now at least makes sense.  The rewrite()
> >> > of your query will not reduce to a single OR query (as I previously
> >> > thought).
> >> >
> >> > So in fact you have a BS at the top (because you called
> >> > setAllowDocsOutOfOrder(true)), with 2 clauses, and each of those
> >> > clauses uses BS2 to score.
> >> >
> >> > I think advance() is not involved, but LUCENE-1744 could very well
> >> > have fixed this, because BS calls sub.scorer.docID() when interacting
> >> > with its sub-scorers, and due to LUCENE-1744, that would always return
> >> > -1 from a BS2, so BS could enter an infinite loop.
> >> >
> >> > If you run w/o the fix for LUCENE-1744, with my instrumentation, I can
> >> > confirm this.  But I think likely this is it.
> >> >
> >> > Also: you started this thread by saying "some queries are 4X slower
> >> > than before".  Was that a different issue?  (Because this issue is
> >> > "the query runs forever").
> >> >
> >> > Mike
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message