lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject RE: Lucene indexes
Date Tue, 24 Feb 2009 22:35:43 GMT

: The problem that I am trying to solve is : How to index phrases (rather 
: than phrase querying)? I have a Questions/Answers corpus, the 
: architecture I am using for IR creates one index for questions and 
: another one for answers (based on single terms) and then matches between 
: them. I want to index phrases in addition to single terms (for both 
: questions and answers) and then make a search for all terms and phrases 
: in the questions index.

can you elaborate a little on what you mean by "index phrases" ... 
specificly what is it you want to be able to to do, that you don't think 
you can do with a PhraseQuery?

my best guess, reading between the lines, is that want to discover
documents in your "answers" index that might correlate to documents in 
your "questions" index based on a high overlap of phrases -- i'm also 
guessing (reading between the lines) that you realize you can use things 
like TermEnum and TermDocs to find terms in common btween both indexes, 
and which documents contain those terms

if my guesses are correct, indexing using ShingleFilter might be of use to 
you -- Shingling is (lucene specific?) vernacular for word based ngrams, 
and by indexing in this way you can get "terms" consisting of multiple 
successive "words" when indexing, and then match things up that way.

as someone else mentioned, you can also use other custom Tokenization if 
you have a better definition of a "phrase" then just a sequence of 
successive words (ie: index whole sentences as a single term, etc...)

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message