incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <pe...@peknet.com>
Subject Re: [lucy-user] Can lucy do substring search?
Date Thu, 02 Feb 2012 14:26:57 GMT
On 2/2/12 7:40 AM, Desilets, Alain wrote:
> Thx Peter. In my case, the fields on which I need to do wild-card searches are fields
that specify the URL of a document. I want to be able to use this to limit the search to documents
which are on specific web sites.
>
> It seems the best balance in that case, between accuracy and speed, would be to tokenize
on non word character. Then, I could retrieve a superset of docs on say, www.somewhere.org,
by searching for "www.somewhere.org" (with a QueryParser). This might accidentally retrieve
docs whose urls contain www/somewhwere/org (for example), but I would do a second pass to
filter the docs whose url do not match the actual expression www.somewhere.org. I would need
to do this second pass anyway, even if I was using a WildCard search, because, I might accidentally
match a URL that has www.somewhere.org in a different part than the IP name (ex: http:/www.aplace.com/www.somewhere.org.html).
>

why not pull the hostname out at indexing time into its own field? then 
your particular use case should get no false positives?


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Mime
View raw message