lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: MoreLikeThis Interface changes
Date Thu, 22 Sep 2011 00:58:45 GMT
On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith <ssmith@mainstreamdata.com> wrote:
> I'm updating my lucene code from 3.0 to 3.4.  There's a change in the MLT interface
I'm confused about.  I used the MLT.like(InputStream) method.  It now appears I should change
to the MLT.like(InputStreamReader, fieldname) method.  Easy enough to create an InputStreamReader
from an InputStream.

Yes, requiring a reader is to ensure that MLT is using the encoding you want

>
> So, my question is regarding the addition of the fieldname parameter.  There's also
a call called MLT.setFieldNames(String[]).  This would seem to be redundant except the setFieldNames()
allows you to specify multiple fields and like() doesn't.  Am I allowed to specify null as
the fieldname in like() (documentation doesn't say you can).  It seems like you shouldn't
need to do both.  But there's a difference in functionality between the two (since one allows
multiple fields and the other doesn't).

A Reader has no fields :)
The fieldName is only for passing to the Analyzer (@param fieldName
field passed to the analyzer to use when analyzing the content)
This is because some Analyzers (e.g. PerFieldAnalyzerWrapper) analyze
content differently according to different fields.

Previously, MoreLikeThis would use what was in the setFieldNames
parameter, iteratively like this:
for (field : fieldNames) {
  analyzer.analyze(field, reader);
}

However, MoreLikeThis also had a bug where it would never close() the
reader As you can see this logic was completely bogus, as you can only
consume the field once.

Effectively the reader would be analyzed by fieldNames[0], then MLT
would analyze an exhausted reader with fieldNames[1]...fieldNames[n].

When we fixed MLT to close its resources correctly (around 3.2), it
exposed this second bug, If you tried to pass a reader with multiple
values in fieldNames you would get an IOException because it tried to
re-consume a closed reader.

Now, instead when supplying a reader, you should pass in this
fieldName explicitly so that it analyzes the content the way you want.
For backwards compatibility with the deprecated method, it uses
fieldNames[0] only.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message