Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 46389 invoked from network); 28 May 2003 15:22:09 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 28 May 2003 15:22:08 -0000 Received: (qmail 15458 invoked by uid 97); 28 May 2003 15:24:22 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 15451 invoked from network); 28 May 2003 15:24:21 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 28 May 2003 15:24:21 -0000 Received: (qmail 46114 invoked by uid 500); 28 May 2003 15:22:05 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 46094 invoked from network); 28 May 2003 15:22:05 -0000 Received: from unknown (HELO isc?mail.infosciences.com) (141.156.69.115) by daedalus.apache.org with SMTP; 28 May 2003 15:22:05 -0000 Received: from Aviran (141.156.69.109 [141.156.69.109]) by isc_mail.infosciences.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id KLP5SH9P; Wed, 28 May 2003 11:26:36 -0400 From: "Aviran Mordo" To: "'Lucene Users List'" Subject: RE: Wildcard workaround Date: Wed, 28 May 2003 11:23:06 -0400 Message-ID: <009501c3252d$0fd37c20$6a00a8c0@Aviran> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4024 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <3ED4CDC4.6000008@sundayta.com> Importance: Normal X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N You can also index the file names with a leading character. For instance index "file1.exe" will be indexed as "_file1.exe" and always add the leading character to the search term. So if the user input is "*.exe" your query should be "_*.exe" and if the user input "fi*" you'll change it to "_fi*" Aviran -----Original Message----- From: David Warnock [mailto:david@sundayta.com] Sent: Wednesday, May 28, 2003 10:55 AM To: Lucene Users List Subject: Re: Wildcard workaround Andrei, > I have a file database indexed by content and also by filename. It > would be nice if the user could perform a usual search like "*.ext". > > Anybody tried a workaround for this issue ? ( this is needed only for > the name of the file, for the rest of the terms the rules are fine > with me) If the term begins with * then could you expand it into a set of 36 terms eg a*.ext b*.ext ... z*.ext 0*.ext No idea how this would compare to the other alternatives for speed. But it would be simple to code and would not increase index size. Of course if filenames can use unicode character sets then you have a problem. At that point you would need to do a check of what all the first characters are to know what terms to use (ie only create a tewrm for each character that is used as the 1st character of a filename). HTH Dave -- David Warnock, Sundayta Ltd. http://www.sundayta.com iDocSys for Document Management. VisibleResults for Fundraising. Development and Hosting of Web Applications and Sites. --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org