incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Desilets, Alain" <>
Subject [lucy-user] Can lucy do substring search?
Date Tue, 31 Jan 2012 18:19:38 GMT
I have a Lucy index with one field called URL.

I would like to do substring searchs on this field, for example, find all the records whose
URL includes (i.e. all the urls which are part of the abc directory
on that site).

Is there a way to do this?

I guess I could always treat the field as a tokenized string:

my $string_tokenizer = Lucy::Analysis::RegexTokenizer->new( pattern => '\w+');
my $analyzer = Lucy::Analysis::PolyAnalyzer->new( analyzers => [$string_tokenizer]);

But then I would probably have to do some pos-search processing to make sure that the URLS
of the retrieved records actually DO fit the pattern, and that there are no differences in
the non-word characters that were stripped out by the indexer.

I was wondering if there was a way to tokenize the string into individual characters instead,
and whether that is advisable from a performance point of view.


Alain D├ęsilets
Agent de recherche | Research Officer 
Institut de technologie de l'information | Institute for Information Technology Conseil national
de recherches du Canada | National Research Council of Canada

View raw message