lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aviran Mordo" <>
Subject RE: Wildcard workaround
Date Wed, 28 May 2003 15:23:06 GMT
You can also index the file names with a leading character. For instance
index "file1.exe" will be indexed as "_file1.exe" and always add the
leading character to the search term.
So if the user input is "*.exe" your query should be "_*.exe" and if the
user input "fi*" you'll change it to "_fi*"


-----Original Message-----
From: David Warnock [] 
Sent: Wednesday, May 28, 2003 10:55 AM
To: Lucene Users List
Subject: Re: Wildcard workaround


> I have a file database indexed by content and also by filename. It 
> would be nice if the user could perform a usual search like "*.ext".
> Anybody tried a workaround for this issue ? ( this is needed only for 
> the name of the file, for the rest of the terms the rules are fine 
> with me)

If the term begins with * then could you expand it into a set of 36 
terms eg a*.ext b*.ext ... z*.ext 0*.ext

No idea how this would compare to the other alternatives for speed. But 
it would be simple to code and would not increase index size.

Of course if filenames can use unicode character sets then you have a 
problem. At that point you would need to do a check of what all the 
first characters are to know what terms to use (ie only create a tewrm 
for each character that is used as the 1st character of a filename).


David Warnock, Sundayta Ltd.
iDocSys for Document Management. VisibleResults for Fundraising.
Development and Hosting of Web Applications and Sites.

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message