lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Lucene 2.9 status (to port to Lucene.Net)
Date Tue, 28 Apr 2009 12:10:17 GMT
Hi Mike,

> This is great feedback on the new Collector API, Uwe.  Thanks!

- Likewise.

> It's awesome that you no longer have to warm your searchers... but be
> careful when a large segment merge commits.

I know this, but in our case (e.g. creating a IN-SQL list, collecting
measurement parameters from the documents) the warming is not really needed,
it would only be a problem if it is very often (the index is updated every
20 minutes) and it must reload the whole field cache (takes 3-5 seconds on
our machine). So a large merge taking 1-2 seconds for cache reloading is no
problem (the users have the same problem with sorted results). If our index
gets bigger, I will add warming in my search/cache implementation after
reopening, for that it would be nice, to have the list of reopened segments
(I think there was a issue about it, or is there an implementation?).
In our case, most time takes the query in the SQL data warehouse after it,
so 1 second additionally for building the SQL query is not much.
 
> Did you hit any snags/problems/etc. that we should fix before releasing
> 2.9?

Until now, I have not seen any further problems. What I have seen befor is
already implemented in Lucene with our active issue communication and all
these issues :-)

I still wait for the step towards moving trie (and also the new automaton
regex query) to core and the modularization (hopefully before 2.9, to not
create new APIs that change/deprecate later).

Uwe

> Mike
> 
> On Sun, Apr 26, 2009 at 9:54 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> > Some status update:
> >
> >> > George, did you mean LUCENE-1516 below?  (LUCENE-1313 is a further
> >> > improvement to near real-time search that's still being iterated on).
> >> >
> >> > In general I would say 2.9 seems to be in rather active development
> >> still
> >> > ;)
> >> >
> >> > I too would love to hear about production/beta use of 2.9.  George
> >> > maybe you should re-ask on java-user?
> >>
> >> Here! I updated www.pangaea.de to Lucene-trunk today (because of
> >> incomplete
> >> hashcode in TrieRangeQuery)... Works perfect, but I do not use the
> >> realtime
> >> parts. And 10 days before the same, no problems :-)
> >>
> >> Currently I rewrite parts of my code to Collector to go away from
> >> HitCollector (without score, so optimizations)! The reopen() and
> sorting
> >> is
> >> fine, almost no time is consumed for sorted searches after reopening
> >> indexes
> >> every 20 minutes with just some new and small segments with changed
> >> documents. No extra warming is needed.
> >
> > I rewrote my collectors now to use the new API. Even through the number
> of
> > methods to overwrite in the new collector is 3 instead of 1, the code
> got
> > shorter (because the collect methods now can throw IOExceptions,
> great!!!).
> > What is also perfect is the way how to use a FieldCache: Just retrieve
> the
> > FieldCache array (e.g. getInts()) in the setNextReader() method and use
> the
> > value array in the collect() method with the docid as index. Now I am
> able
> > to e.g. retrieve cached values even after an index reopen without
> warming
> > (same with sort). In the past you had to use a cache array for the whole
> > index. The docBase is not used in my code, as I directly access the
> index
> > readers. So users now have both possibilities: use the supplied reader
> or
> > use the docBase as index offset into the searcher/main reader. Really
> cool!
> >
> > The overhead of score calculation can be left out, if not needed, also
> cool!
> >
> > One of my collectors is used retrieve the database ids (integers) for
> > building up a SQL "IN (...)" from the field cache based on the collected
> > hits. In the past this was very complicated, because FieldCache was slow
> > after reopening and getting stored fields (the ids) is also very slow
> (inner
> > search loop). Now it's just 10 lines of code and no score is involved.
> >
> > The new code is working now in production at PANGAEA.
> >
> >> Another change to be done here is Field.Store.COMPRESS and replace by
> >> manually compressed binary stored fields, but this is only to get rid
> of
> >> the
> >> deprecated warnings. But this cannot be done without complete
> reindexing.
> >>
> >> Uwe
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message