lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-user] index and search words separated by hyphens
Date Wed, 06 Jul 2011 22:31:33 GMT
On Wed, Jul 06, 2011 at 10:45:30PM +0200, arjan wrote:
> Does anyone know how to retrieve a document by words in the document  
> that are separated by hyphens?

This shouldn't be a problem. :)

> Reason for my question is this. If I index a single document of a single  
> line that contains words separated by hyphens, I can retrieve that  
> document by any word, but the words separated by hyphens nor the whole  
> phrase including the hyphens.
>
> For example I index a single document with only this sentence
>
> "please subscribe to this mailing-list"
>
> I can retrieve this document by searching for "please" or "subscribe" or  
> "please subscribe", but not by searching for "mailing-list" or "mailing"  
> or "list".

I'm confused -- there seems to be a contradiction between your ability to
retrieve the document "by any word", and your inability to retrieve the
document by searching for "mailing" or "list".

Can you please clarify what you get when you search for "mailing"?

> It seems that the words "mailing" and "list" are treated as separate  
> words, since both "mailing" and "list" can be found in the lexicon.  

They're in the lexicon?  Do you mean that you've gone all the way down into
Lucy::Index::Lexicon, or something else?

> Any help would be appreciated, or is this a bug?

How are you building/executing the query?

What does the FieldType assigned to the field in question look like?

For common Analyzer configurations, Lucy's QueryParser is supposed to parse
hyphenated constructs as phrases -- so these should all produce the same
results:

    "mailing list"
    "mailing-list"
    mailing-list

Similarly, these should all produce the same results:

    "please subscribe"
    "please-subscribe"
    please-subscribe

It might be interesting to know whether those work as expected.

Best,

Marvin Humphrey


Mime
View raw message