lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: [GSoC] Question about LUCENE-3892
Date Fri, 23 Mar 2012 18:25:25 GMT

One quick question up front: are you subscribed to the dev list?  If
not, you may have missed my response to your last email with GSoC

Answers below....:

On Fri, Mar 23, 2012 at 2:09 PM, Han Jiang <> wrote:

> I scanned through some discussions and codes around PForDelta, like
> LUCENE-1410, LUCENE-2903, ConversationBetweenMichaelAndLiLi. It is great to
> see so much information, and PForDelta seems to be a promising target. But
> as I look into the codes in branch-bulkpostings, it seems that most of the
> algorithms had already been implemented. Then, what is required to do for
> LUCENE-3892 , is the main target be the performance improvement,
> intergration with trunk version, or another implementation from the bottom
> up?

We can work out the scope... but I think success would be a useful
codec committed to 4.0?  Ideally, and I think likely, it shows faster
performance than our current default codec, in which case we may want
to change our default, depending on other factors...

Ie, you'd need to bring forward those old patches/branches to the
current codec APIs, do performance testing to understand where they do
well / poorly, whether more disk space is used, etc.  Perhaps iterate
on their implementations to improve performance...

If the project succeeds in building a committable PForDelta codec that
would be awesome!

If that somehow winds up being too little, you can explore other
intblock codecs as well...

> And another question about development. I am quite curious that some classes
> such as StandardAnalyzer were not found in the trunk or branch-bulkpostings,
> but replaced with Mock ones. Then how can I test my old codes, if I want to
> intergrate these classes with trunk library?

We've moved all "real" analyzers to the module/analysis... what's in
trunk are test analyzers, which you should use for new tests since
they have more thorough checks.

Mike McCandless

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message