lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kainth, Sachin" <Sachin.Kai...@atkinsglobal.com>
Subject RE: 'a', 's' and 't' don't index properly
Date Thu, 08 Feb 2007 15:28:12 GMT
Thanks Erik,

Is there a .NET version of Solr?

Cheers

Sachin 

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 08 February 2007 15:26
To: java-user@lucene.apache.org
Subject: Re: 'a', 's' and 't' don't index properly

>From the javadoc...

public final class *SimpleAnalyzer*extends
Analyzer<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/analysis/Ana
lyzer.html>

An Analyzer that filters LetterTokenizer with LowerCaseFilter.


On 2/8/07, Kainth, Sachin <Sachin.Kainth@atkinsglobal.com> wrote:
>
> Thanks Erik,
>
> Do you know of an analyzer which doesn't remove the characters 'a',
's'
> and 't'.
>
> Sachin
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 08 February 2007 13:54
> To: java-user@lucene.apache.org
> Subject: Re: 'a', 's' and 't' don't index properly
>
> This really should be posted on the dotlucene list, but....
>
> Your indexing analyzer is probably removing them. For instance, 
> StandardAnalyzer uses a default set of stop words, and a, s, and t are

> definitely among them. You need to use a different analyzer than you 
> are using.
>
> These will also be removed from queries if you use QueryParser with 
> one of several analyzers that remove stop words.
>
> StandardAnalyzer, for instance, also lower-cases tokens, removes most 
> puncutation, etc, so take some care to understand the analyzers and 
> what they do.
>
> Oh, and get a copy of Luke if you haven't already. It'll let you 
> examine your index, see the results of using various analyzers etc.
>
> Best
> Erick
>
> On 2/8/07, Kainth, Sachin <Sachin.Kainth@atkinsglobal.com> wrote:
> >
> > > Hello,
> > >
> > > I have a database of tracks, artists and albums and I'm indexing 
> > > these
> > > 3 attributes plus also the first letter of the track thus 
> > > (incidently I'm using dotlucene but the implementation of 
> > > dotlucene is similar to the Java one):
> > >
> > >    Document Doc = new Document();
> > >    String Album = ...
> > >    String Artist = ...
> > >    String Track = ...
> > >    Doc.Add(Field.Text("album", Album));
> > >    Doc.Add(Field.Text("artist", Artist));
> > >    Doc.Add(Field.Text("track", Track));
> > >    Doc.Add(Field.Text("firstletter", Track.Substring(0,1)));
> > >
> > > Problem is I don't think certain first letters are being indexed 
> > > properly or at all, either that or there is some problem
elsewhere.
>
> > > I have noticed that the letters 'a', 's' and 't' (there may be
> > > others) cause me problems.  I shall explain the problem I have.
> > > When I search for the documents I perform a sorting operation on 
> > > the
>
> > > firstletter field but where the firstletter was 'a', 's' or 't' 
> > > the returned list does not contain those records in sorted order 
> > > (all other records are sorted correctly).
> > >
> > > Here is my search command:
> > >
> > > Hits hits = searcher.Search(query, new Sort(new SortField[] { new 
> > > SortField("firstletter", SortField.STRING)}));
> > >
> > > What I don't know is whether the fault lies in the indexing or in 
> > > this or other code.  Does anyone know what could have happened.
> > >
> > > Thanks
> > >
> > > Sachin
> >
> >
> > This email and any attached files are confidential and copyright 
> > protected. If you are not the addressee, any dissemination of this 
> > communication is strictly prohibited. Unless otherwise expressly 
> > agreed in writing, nothing stated in this communication shall be
> legally binding.
> >
> > The ultimate parent company of the Atkins Group is WS Atkins plc.
> > Registered in England No. 1885586.  Registered Office Woodcote 
> > Grove, Ashley Road, Epsom, Surrey KT18 5BW.
> >
> > Consider the environment. Please don't print this e-mail unless you 
> > really need to.
> >
>
>
> This message has been scanned for viruses by MailControl - (see
> http://bluepages.wsatkins.co.uk/?4318150)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message