lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ridzwan Aminuddin <ridzwan.aminud...@gmail.com>
Subject Re: Help with phrase indexing
Date Mon, 18 May 2009 05:13:30 GMT
HI all, thanks for the responses thus far.

Another question linked to the first, do you guys know any good tutorials or
startpoint for me to understand how to go about designing my own customized
analyzer?

This would be of great help. Thanks in advance!

Regards,

Ridzwan

2009/5/14 Asbjørn A. <asbjorn@fellinghaug.com>

> Seid Mohammed:
> > I need this exactly solution.
> > Can you please tell me how could I DO IT?
> > I am badly in nead of it
>
> > > On Thu, May 14, 2009 at 5:58 AM, Ridzwan Aminuddin
> > > <ra@world-check.com>wrote:
> > >
> > >> Hi all,
> > >>
> > >> Is Lucene able to index phrases instead if individual terms? If it is,
> can
> > >> we also feed it a 'thesaurus or dictionary' of phrases that it should
> look
> > >> out for when indexing. Thanks in advance,
> > >>
> > >> Ridzwan
>
> Hi Seid.
>
> I constructed something like this in my master degree. What I bascially
> did was to write a custom analyzer. However I identified the top 100
> most searched phrases in my collection, and then filtered for those. If
> a document contained a identified phrase, then the analyzer would
> construct a Token for those terms.
>
> Another approach is to decide that only two-word phrases should be
> search for. And, for instance, find verbs and use the "word right after".
>
> Example string: "The boat was very fast".
> Tokens: "The boat", "was", "very fast".
>
> If your analyzer contain a range of phrases to search for, then it
> should be straightforward to identify those phrases and index them as
> Tokens. Just remember to set the start and end offset of the Token to
> correct values.
>
> You can have a look at this thesis here:
> http://asbjorn.fellinghaug.com/filer/master/Master_thesis.pdf
>
> And the java source code here:
> http://daim.idi.ntnu.no/show.php?type=vedlegg&id=3429
>
> Hope this helps.
>
> --
> Asbjørn A. Fellinghaug
> asbjorn@fellinghaug.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message