lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Lucene 2.9 status (to port to Lucene.Net)
Date Sun, 26 Apr 2009 13:54:27 GMT
Some status update:

> > George, did you mean LUCENE-1516 below?  (LUCENE-1313 is a further
> > improvement to near real-time search that's still being iterated on).
> >
> > In general I would say 2.9 seems to be in rather active development
> still
> > ;)
> >
> > I too would love to hear about production/beta use of 2.9.  George
> > maybe you should re-ask on java-user?
> Here! I updated to Lucene-trunk today (because of
> incomplete
> hashcode in TrieRangeQuery)... Works perfect, but I do not use the
> realtime
> parts. And 10 days before the same, no problems :-)
> Currently I rewrite parts of my code to Collector to go away from
> HitCollector (without score, so optimizations)! The reopen() and sorting
> is
> fine, almost no time is consumed for sorted searches after reopening
> indexes
> every 20 minutes with just some new and small segments with changed
> documents. No extra warming is needed.

I rewrote my collectors now to use the new API. Even through the number of
methods to overwrite in the new collector is 3 instead of 1, the code got
shorter (because the collect methods now can throw IOExceptions, great!!!).
What is also perfect is the way how to use a FieldCache: Just retrieve the
FieldCache array (e.g. getInts()) in the setNextReader() method and use the
value array in the collect() method with the docid as index. Now I am able
to e.g. retrieve cached values even after an index reopen without warming
(same with sort). In the past you had to use a cache array for the whole
index. The docBase is not used in my code, as I directly access the index
readers. So users now have both possibilities: use the supplied reader or
use the docBase as index offset into the searcher/main reader. Really cool!

The overhead of score calculation can be left out, if not needed, also cool!

One of my collectors is used retrieve the database ids (integers) for
building up a SQL "IN (...)" from the field cache based on the collected
hits. In the past this was very complicated, because FieldCache was slow
after reopening and getting stored fields (the ids) is also very slow (inner
search loop). Now it's just 10 lines of code and no score is involved.

The new code is working now in production at PANGAEA.

> Another change to be done here is Field.Store.COMPRESS and replace by
> manually compressed binary stored fields, but this is only to get rid of
> the
> deprecated warnings. But this cannot be done without complete reindexing.
> Uwe
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message