lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: Help with phrase indexing
Date Mon, 18 May 2009 06:22:24 GMT
Hi,
Actually I'd really suggest you to 'buy' a copy of Lucene In Action - 2nd
Edition. Its currently available as MEAP and its amazing. Perhaps the prices
are also down 40% or something, though 'm not really sure about it.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Mon, May 18, 2009 at 10:43 AM, Ridzwan Aminuddin <
ridzwan.aminuddin@gmail.com> wrote:

> HI all, thanks for the responses thus far.
>
> Another question linked to the first, do you guys know any good tutorials
> or
> startpoint for me to understand how to go about designing my own customized
> analyzer?
>
> This would be of great help. Thanks in advance!
>
> Regards,
>
> Ridzwan
>
> 2009/5/14 Asbjørn A. <asbjorn@fellinghaug.com>
>
> > Seid Mohammed:
> > > I need this exactly solution.
> > > Can you please tell me how could I DO IT?
> > > I am badly in nead of it
> >
> > > > On Thu, May 14, 2009 at 5:58 AM, Ridzwan Aminuddin
> > > > <ra@world-check.com>wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Is Lucene able to index phrases instead if individual terms? If it
> is,
> > can
> > > >> we also feed it a 'thesaurus or dictionary' of phrases that it
> should
> > look
> > > >> out for when indexing. Thanks in advance,
> > > >>
> > > >> Ridzwan
> >
> > Hi Seid.
> >
> > I constructed something like this in my master degree. What I bascially
> > did was to write a custom analyzer. However I identified the top 100
> > most searched phrases in my collection, and then filtered for those. If
> > a document contained a identified phrase, then the analyzer would
> > construct a Token for those terms.
> >
> > Another approach is to decide that only two-word phrases should be
> > search for. And, for instance, find verbs and use the "word right after".
> >
> > Example string: "The boat was very fast".
> > Tokens: "The boat", "was", "very fast".
> >
> > If your analyzer contain a range of phrases to search for, then it
> > should be straightforward to identify those phrases and index them as
> > Tokens. Just remember to set the start and end offset of the Token to
> > correct values.
> >
> > You can have a look at this thesis here:
> > http://asbjorn.fellinghaug.com/filer/master/Master_thesis.pdf
> >
> > And the java source code here:
> > http://daim.idi.ntnu.no/show.php?type=vedlegg&id=3429
> >
> > Hope this helps.
> >
> > --
> > Asbjørn A. Fellinghaug
> > asbjorn@fellinghaug.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message