lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Banga <iamrohitba...@gmail.com>
Subject Re: Lucene fields not analyzed
Date Tue, 09 Feb 2010 09:12:06 GMT
i'll try using Luke.

how i want to use Lucene?

there is a sentence that may contain the names of some people from among
those in a list. the names may be incomplete or may have spelling mistakes.

so i created a lucene index, with each person as a document.

eg.

Mr. Arun Kumar

with a hit highlighter i get

<B>Mr</B>. <B>Arun</B> <B>Kumar</B>

what i want is
<B>Mr. Arun Kumar</B>

even when there are spelling mistakes.


Rohit Banga


On Tue, Feb 9, 2010 at 1:57 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> If you don't get it working that way, then you have to ask you the
> question: Why do you want it indexed that way? Is it because you don't want
> to find all people in that field when you add ony "Mr." to a search query?
> It looks like you use StandardAnalyzer, and in this case, I would add "mr",
> not "mr!", to the stop word list and index the name field as any other
> field. Before doing this, it would be good to explain, what you are
> intending to do/prevent by indexing with NOT_ANALYZED, which is the source
> of your problem.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Rohit Banga [mailto:iamrohitbanga@gmail.com]
> > Sent: Tuesday, February 09, 2010 9:03 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene fields not analyzed
> >
> > let us assume this is the only field that is relevant (others are
> > stored and
> > not indexed).
> > i tried termquery and it does not work.
> > i also tried keyword analyzer and still could not make it work.
> >
> > @Mark
> > i cannot escape the spaces in my query as i am using Lucene to identify
> > occurences of names among other things in the unstructured sentence.
> > so while adding names to the index, i used keyword analyzer and changed
> > the
> > name to be added to the index to "Mr.\\ Kumar"
> > but still couldn't get it to work.
> >
> >
> >
> >
> >
> >
> > Rohit Banga
> >
> >
> > On Tue, Feb 9, 2010 at 1:06 PM, Mark Harwood
> > <markharw00d@yahoo.co.uk>wrote:
> >
> > > I suspect it is because QueryParser uses space characters to separate
> > > different clauses in a query string while you want the space to
> > represent
> > > some content in your "name" field. Try escaping the space character.
> > >
> > > Cheers
> > > Mark
> > >
> > >
> > >
> > > On 9 Feb 2010, at 07:26, Rohit Banga wrote:
> > >
> > > > Hello
> > > >
> > > > i have a field that stores names of people. i have used the
> > NOT_ANALYZED
> > > > parameter to index the names.
> > > >
> > > > this is what happens during indexing
> > > >
> > > >    doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
> > > > Field.Index.NOT_ANALYZED));
> > > >
> > > >
> > > >
> > > > when i search it, i create a query parser using standardanalyzer
> > and
> > > append
> > > > ~0.5 to the search query.
> > > >
> > > > the problem is that if the indexed name is "Mr. Kumar", my search
> > does
> > > not
> > > > work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the
> > > space).
> > > >
> > > > // searching code
> > > >        File index_directory = new File(INDEX_DIR_PATH);
> > > >        IndexReader reader =
> > > > IndexReader.open(FSDirectory.open(index_directory), true);
> > > >        Searcher searcher = new IndexSearcher(reader);
> > > >
> > > >        Analyzer analyzer = new
> > StandardAnalyzer(Version.LUCENE_CURRENT);
> > > >
> > > >        QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
> > > "name",
> > > > analyzer);
> > > >
> > > >        Query query;
> > > >        query = parser.parse(text + "~0.5");
> > > >
> > > > how to make it work?
> > > >
> > > > Rohit Banga
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message