lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: Use of hyphens in StandardAnalyzer
Date Mon, 25 Oct 2010 02:31:17 GMT
Sorry, releases are not scheduled.

There is a general feeling that a 3.1 release could happen fairly soon, though.

Currently, there is a push to improve test coverage and fix bugs that shake out as a result.

As another measure of how close the release is, you can check here to see how many issues
remain targeting the 3.1 release - once these go to zero, a release is likely imminent:

Lucene open/reopened fix for 3.1: <https://issues.apache.org/jira/secure/IssueNavigator.jspa?sorter/field=priority&resolution=-1&pid=12310110&fixfor=12314822>
 
Solr open/reopened fix for 3.1: 
<https://issues.apache.org/jira/secure/IssueNavigator.jspa?sorter/field=priority&resolution=-1&pid=12310230&fixfor=12314371>

My estimate of when a release will occur: sometime in the next two or three months.

The 3.X branch (where the 3.1 release will be cut from) is quite stable - you should consider
using it even pre-release.

Steve

> -----Original Message-----
> From: Martin O'Shea [mailto:appy74@dsl.pipex.com]
> Sent: Sunday, October 24, 2010 5:29 PM
> To: java-user@lucene.apache.org
> Subject: FW: Use of hyphens in StandardAnalyzer
> 
> A good suggestion. But I'm using Lucene 3.0.2 and the constructor for a
> StandardAnalyzer has Version_30 as its highest value. Do you know when 3.1
> is due?
> 
> -----Original Message-----
> From: Steven A Rowe [mailto:sarowe@syr.edu]
> Sent: 24 Oct 2010 21 31
> To: java-user@lucene.apache.org
> Subject: RE: Use of hyphens in StandardAnalyzer
> 
> Hi Martin,
> 
> StandardTokenizer and -Analyzer have been changed, as of future version
> 3.1 (the next release) to support the Unicode segmentation rules in
> UAX#29.  My (untested) guess is that your hyphenated word will be kept as
> a single token if you set the version to 3.1 or higher in the constructor.
> 
> Steve
> 
> > -----Original Message-----
> > From: Martin O'Shea [mailto:appy74@dsl.pipex.com]
> > Sent: Sunday, October 24, 2010 3:59 PM
> > To: java-user@lucene.apache.org
> > Subject: Use of hyphens in StandardAnalyzer
> >
> > Hello
> >
> > I have a StandardAnalyzer working which retrieves words and frequencies
> > from a single document using a TermVectorMapper which is populating a
> > HashMap.
> >
> > But if I use the following text as a field in my document, i.e.
> >
> > addDoc(w, "lucene Lawton-Browne Lucene");
> >
> > The word frequencies returned in the HashMap are:
> >
> > browne 1
> > lucene 2
> > lawton 1
> >
> > The problem is the words 'lawton' and 'browne'. If this is an actual
> > 'double-barreled' name, can Lucene recognise it as 'Lawton-Browne' where
> > the name is actually a single word?
> >
> > I've tried combinations of:
> >
> > addDoc(w, "lucene \"Lawton-Browne\" Lucene");
> >
> > And single quotes but without success.
> >
> > Thanks
> >
> > Martin O'Shea.

Mime
View raw message