lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kainth, Sachin" <Sachin.Kai...@atkinsglobal.com>
Subject RE: 'a', 's' and 't' don't index properly
Date Thu, 08 Feb 2007 13:59:55 GMT
Thanks Erik,

Do you know of an analyzer which doesn't remove the characters 'a', 's'
and 't'.

Sachin 

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 08 February 2007 13:54
To: java-user@lucene.apache.org
Subject: Re: 'a', 's' and 't' don't index properly

This really should be posted on the dotlucene list, but....

Your indexing analyzer is probably removing them. For instance,
StandardAnalyzer uses a default set of stop words, and a, s, and t are
definitely among them. You need to use a different analyzer than you are
using.

These will also be removed from queries if you use QueryParser with one
of several analyzers that remove stop words.

StandardAnalyzer, for instance, also lower-cases tokens, removes most
puncutation, etc, so take some care to understand the analyzers and what
they do.

Oh, and get a copy of Luke if you haven't already. It'll let you examine
your index, see the results of using various analyzers etc.

Best
Erick

On 2/8/07, Kainth, Sachin <Sachin.Kainth@atkinsglobal.com> wrote:
>
> > Hello,
> >
> > I have a database of tracks, artists and albums and I'm indexing 
> > these
> > 3 attributes plus also the first letter of the track thus 
> > (incidently I'm using dotlucene but the implementation of dotlucene 
> > is similar to the Java one):
> >
> >    Document Doc = new Document();
> >    String Album = ...
> >    String Artist = ...
> >    String Track = ...
> >    Doc.Add(Field.Text("album", Album));
> >    Doc.Add(Field.Text("artist", Artist));
> >    Doc.Add(Field.Text("track", Track));
> >    Doc.Add(Field.Text("firstletter", Track.Substring(0,1)));
> >
> > Problem is I don't think certain first letters are being indexed 
> > properly or at all, either that or there is some problem elsewhere.

> > I have noticed that the letters 'a', 's' and 't' (there may be 
> > others) cause me problems.  I shall explain the problem I have.  
> > When I search for the documents I perform a sorting operation on the

> > firstletter field but where the firstletter was 'a', 's' or 't' the 
> > returned list does not contain those records in sorted order (all 
> > other records are sorted correctly).
> >
> > Here is my search command:
> >
> > Hits hits = searcher.Search(query, new Sort(new SortField[] { new 
> > SortField("firstletter", SortField.STRING)}));
> >
> > What I don't know is whether the fault lies in the indexing or in 
> > this or other code.  Does anyone know what could have happened.
> >
> > Thanks
> >
> > Sachin
>
>
> This email and any attached files are confidential and copyright 
> protected. If you are not the addressee, any dissemination of this 
> communication is strictly prohibited. Unless otherwise expressly 
> agreed in writing, nothing stated in this communication shall be
legally binding.
>
> The ultimate parent company of the Atkins Group is WS Atkins plc.  
> Registered in England No. 1885586.  Registered Office Woodcote Grove, 
> Ashley Road, Epsom, Surrey KT18 5BW.
>
> Consider the environment. Please don't print this e-mail unless you 
> really need to.
>


This message has been scanned for viruses by MailControl - (see
http://bluepages.wsatkins.co.uk/?4318150)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message