lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen" <cdor...@gmail.com>
Subject Re: document deletion problem
Date Thu, 20 Dec 2007 12:57:08 GMT
On Dec 20, 2007 8:31 AM, Tushar B <snowhow@sbcglobal.net> wrote:

> Hi Doron,
>
> Just filed an issue in JIRA.


Thanks!


>
>
> Here are the requested stats:
> Index size-> around 11 million documents
> Query      -> fieldname:[009 TO 999] (using CSRQ)


ConstantScoreRangeQuery, right?


>
> Result      -> 11475 documents
> Delete      -> All the returned documents (11475)
>
> I can get the time statistics for you if that helps.
>

Yes this would be interesting, thanks - especially, how
long does it take with the deletions, vs. how long it takes to
just traverse the ids. Also, if the range query expands to
huge number of terms this can affect performance, so
it is interesting to see your query.rewrite().toString(),
or, if you can't reveal the actual tokens, just share
how many terms are there in the rewritten query.


>
> And, btw, I can still see the terms from the deleted documents when I do
> the top terms etc... when will they be gone?
>
> thanks
>
> ----- Original Message ----
> > From: Doron Cohen <cdoronc@gmail.com>
> > To: java-user@lucene.apache.org
> > Sent: Wednesday, December 19, 2007 1:13:56 PM
> > Subject: Re: document deletion problem
> >
> > On Dec 19, 2007 5:45 PM, Tushar B wrote:
> >
> > > Hi Doron,
> > >
> > > I was just playing around with deletion because I wanted to delete
> > > documents due to spurious entries in one particular field. Could you
> tell me
> > > how do I file a JIRA issue?
> > >
> >
> > See Lucene's wiki, at page "HowToContribute".
> >
> >
> > >
> > > The two workarounds I was using are neither great in perfromance.
> Provided
> > > here just FYI:
> > >
> > > 1) Have the "for" loop in a "do while" loop, Handle the
> Array...Exception,
> > > resubmit query
> > > 2) Use HitCollector (as also suggested by you)
> >
> >
> > The HitCollector should work reasonably - can you tell us how many docs
> did
> > you delete, from how big an index, with what query, and how long did it
> > take?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message