lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ghinwa Choueiter" <ghi...@csail.mit.edu>
Subject How to index word-pairs and phrases
Date Tue, 19 Feb 2008 00:36:21 GMT
Hi,

I am new to Lucene and have been reading the documentation. I would like to use Lucene to
query a song database by lyrics. The query could potentially contain typos, or even wrong
words, word contractions (can't versus cannot), etc..

I would like to create an inverted list by word pairs and possibly phrases and not just by
isolated words. For example:
<w1,w2>   < d1, d10, d27>
<w2,w3>   <d2, d13>
...

OR even
<phrase 1> <d1, d3,...>
<phrase 2> <...>
...

It seems to me that, by default, the index in Lucene stores statistics for isolated words.
The Lucene documentation refers to the word "Term" all the time and seems to imply that "Term"
can be a word or a phrase, but I can't see how IndexWriter can read a document and index it
by word pairs. 

thank you in advance for the answers and my apologies if I did not get the terminology quite
right.

-Ghinwa

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message