incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Desilets, Alain" <Alain.Desil...@nrc-cnrc.gc.ca>
Subject [lucy-user] Can lucy do substring search?
Date Tue, 31 Jan 2012 18:19:38 GMT
I have a Lucy index with one field called URL.

I would like to do substring searchs on this field, for example, find all the records whose
URL includes http://www.somewhere.com/abc/ (i.e. all the urls which are part of the abc directory
on that site).

Is there a way to do this?

I guess I could always treat the field as a tokenized string:


---
my $string_tokenizer = Lucy::Analysis::RegexTokenizer->new( pattern => '\w+');
my $analyzer = Lucy::Analysis::PolyAnalyzer->new( analyzers => [$string_tokenizer]);
---

But then I would probably have to do some pos-search processing to make sure that the URLS
of the retrieved records actually DO fit the pattern, and that there are no differences in
the non-word characters that were stripped out by the indexer.

I was wondering if there was a way to tokenize the string into individual characters instead,
and whether that is advisable from a performance point of view.

Thx.

Alain D├ęsilets
Agent de recherche | Research Officer 
Institut de technologie de l'information | Institute for Information Technology Conseil national
de recherches du Canada | National Research Council of Canada


Mime
View raw message