incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arjan <ar...@unitedknowledge.nl>
Subject Re: [lucy-user] index and search words separated by hyphens
Date Wed, 06 Jul 2011 23:35:30 GMT
Dear Marvin,

I am sorry: it's my bad. While answering your questions below, I found I 
have made a mistake in my test code to pin down my problem. You asked 
how I build the query and this made have another look at how I 
instantiate the QueryParser object. I selected the wrong fields. I may 
have done something similar in the real code. I will check this 
tomorrow. Sorry to have bothered you.

Thanks for asking the right questions and telling me how it should work.

Kind regards,
Arjan.

>> Reason for my question is this. If I index a single document of a single
>> line that contains words separated by hyphens, I can retrieve that
>> document by any word, but the words separated by hyphens nor the whole
>> phrase including the hyphens.
>>
>> For example I index a single document with only this sentence
>>
>> "please subscribe to this mailing-list"
>>
>> I can retrieve this document by searching for "please" or "subscribe" or
>> "please subscribe", but not by searching for "mailing-list" or "mailing"
>> or "list".
> I'm confused -- there seems to be a contradiction between your ability to
> retrieve the document "by any word", and your inability to retrieve the
> document by searching for "mailing" or "list".
>
> Can you please clarify what you get when you search for "mailing"?
I can retrieve the document by "please", "subscribe", "to" and "this" 
but not by "mailing", "list" or "mailing-list". So if I search for 
mailing, I get zero hits.
>> It seems that the words "mailing" and "list" are treated as separate
>> words, since both "mailing" and "list" can be found in the lexicon.
> They're in the lexicon?  Do you mean that you've gone all the way down into
> Lucy::Index::Lexicon, or something else?
Yes, like so:
my $polyreader = Lucy::Index::IndexReader->open(
         index => $env->message_storage,
     );
my $seg_readers = $polyreader->seg_readers;

foreach my $seg_reader ( @$seg_readers ) {
     say "segment: $seg_reader";
     my $lex_reader = $seg_reader->obtain( "Lucy::Index::LexiconReader" );
     my $lexicon    = $lex_reader->lexicon( field => 'title' );

     while ( $lexicon->next ) {
         say encode( 'utf8', $lexicon->get_term );
     }
}
>> Any help would be appreciated, or is this a bug?
> How are you building/executing the query?
Ohhhhh....
> What does the FieldType assigned to the field in question look like?
>
> For common Analyzer configurations, Lucy's QueryParser is supposed to parse
> hyphenated constructs as phrases -- so these should all produce the same
> results:
>
>      "mailing list"
>      "mailing-list"
>      mailing-list
>
> Similarly, these should all produce the same results:
>
>      "please subscribe"
>      "please-subscribe"
>      please-subscribe
>
> It might be interesting to know whether those work as expected.
>
> Best,
>
> Marvin Humphrey
>


-- 
Recent: http://www.lomcongres.nl/
Congres- en nieuwsbriefportaal met relatiebeheer systeem voor het Landelijk Overleg Milieuhandhaving

Setting Standards, a a Delft University of Technology and United Knowledge simulation exercise
on strategy and cooperation in standardization, http://www.setting-standards.com

United Knowledge, internet voor de publieke sector
Keizersgracht 74
1015 CT Amsterdam
T +31 (0)20 52 18 300
F +31 (0)20 52 18 301
bureau@unitedknowledge.nl
http://www.unitedknowledge.nl

M +31 (0)6 2427 1444
E arjan@unitedknowledge.nl

Bezoek onze site op:
http://www.unitedknowledge.nl

Of bekijk een van onze projecten:
http://www.handhavingsportaal.nl/
http://www.setting-standards.com/
http://www.lomcongres.nl/
http://www.clubvanmaarssen.org/




Mime
View raw message