Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 53028 invoked from network); 13 Mar 2007 18:31:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Mar 2007 18:31:11 -0000 Received: (qmail 80880 invoked by uid 500); 13 Mar 2007 18:31:10 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 80844 invoked by uid 500); 13 Mar 2007 18:31:10 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 80819 invoked by uid 99); 13 Mar 2007 18:31:10 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2007 11:31:10 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [129.177.30.12] (HELO noralf.uib.no) (129.177.30.12) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2007 11:30:58 -0700 Received: from alfred.uib.no (smtp.uib.no) [129.177.30.120] by noralf.uib.no for java-user@lucene.apache.org with esmtp (Exim 4.34) id 1HRBlc-0005fR-1A; Tue, 13 Mar 2007 19:30:37 +0100 Received: from ak024197.klientdrift.uib.no [129.177.24.197] by smtp.uib.no for java-user@lucene.apache.org with esmtp (Exim 4.34) id 1HRBlb-0004xT-OO; Tue, 13 Mar 2007 19:30:35 +0100 Message-ID: <45F6EDC8.10703@aksis.uib.no> Date: Tue, 13 Mar 2007 19:30:32 +0100 From: Oystein Reigem User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Wildcard searches with * or ? as the first character References: In-Reply-To: Content-Type: multipart/alternative; boundary="------------030400050800090400000209" X-checked-clean: by exiscan on noralf X-Scanner: b8d28f1d329167d2d2e435803e001246 http://tjinfo.uib.no/virus.html X-UiB-SpamFlag: NO UIB: -23.3 hits, 8.0 required X-UiB-SpamReport: spamassassin found; -15 From is listed in 'whitelist_SA' -9.0 Message received from UIB -0.4 Did not pass through any untrusted hosts 1.0 BODY: UIB_MAILWON 0.0 BODY: HTML included in message 0.1 BODY: Message is 50% to 60% HTML X-Virus-Checked: Checked by ClamAV on apache.org --------------030400050800090400000209 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hi, I have read that with Lucene it is not possible to do wildcard searches with * or ? as the first character. Wildcard searches with * as the first character (or both first and last character) are useful for text in languages that have a lot of compound words, like German and the Scandinavian languages. Some systems do offer such searches, but at a penalty. I assume such systems sometimes do a sequential search of the text, which is slow, and sometimes a sequential search of an index, which might be a bit faster, but still quite slow. But a slow search might be better than no search, as long as the user is aware of the consequences of doing wildcard searches starting with a wildcard character. Any comments? Cheers, - �ystein - -- �ystein Reigem, The department of culture, language and information technology (Aksis), Allegt 27, N-5007 Bergen, Norway. Tel: +47 55 58 32 42. Fax: +47 55 58 94 70. E-mail: . Home tel: +47 56 14 06 11. Mobile: +47 97 16 96 64. Home e-mail: . Aksis home page: . --------------030400050800090400000209--