Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 16836 invoked from network); 23 Jun 2005 10:27:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 23 Jun 2005 10:27:37 -0000 Received: (qmail 8569 invoked by uid 500); 23 Jun 2005 10:27:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 7661 invoked by uid 500); 23 Jun 2005 10:27:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 7648 invoked by uid 99); 23 Jun 2005 10:27:17 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2005 03:27:17 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [194.44.193.196] (HELO uar.i-hypergrid.com) (194.44.193.196) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2005 03:27:18 -0700 Received: from [192.168.0.86] (vbychkoviak.jforce [192.168.0.86]) by uar.i-hypergrid.com (Postfix) with ESMTP id 8CD1D242D3 for ; Thu, 23 Jun 2005 13:27:13 +0300 (EEST) Message-ID: <42BA8E80.8040301@i-hypergrid.com> Date: Thu, 23 Jun 2005 13:27:12 +0300 From: Volodymyr Bychkoviak User-Agent: Mozilla Thunderbird 0.9 (Windows/20041103) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Question for Wildcard Search: References: <12704.1119424968@www76.gmx.net> <17081.6873.177777.675795@tanto-xipolis.de> <1119511900.42ba655c58945@sms.ed.ac.uk> <1119513629.42ba6c1d3db99@sms.ed.ac.uk> In-Reply-To: <1119513629.42ba6c1d3db99@sms.ed.ac.uk> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hello about 3 months ago I posted some idea about wildcard searching. main idea was to index every character of input as separate term. and then search using PhraseQuery. for example word "12345" would be indexed as "1" "2" "3" "4" "5". to find "*23*" you can use PhraseQuery with this two terms ("2" "3"). But this approach is limited only to queries with wildcards in the begin or end. Later I did some research and wrote Extension to PhraseQuery that allows to set term relative position to range of values (to insert gaps for "*" and "?") this approach is good because it does not rewrite queries and never run into OutOfMemory or TooManyClauses Exceptions regards, Volodymyr Bychkoviak 14.03.2005 13:54 Dave Kor wrote: >Quoting Dave Kor : > > > >>Quoting Erik Hatcher : >> >> >> >>>Anyone tried this technique with Lucene? >>> >>> >>Actually, the problem is that the wildcard code has to search over a large >>subset of terms because the list of terms is, well, a linear structure. >> >>If, for example, all terms in the index is arranged as a suffix tree, the >>sort >>of wildcard search that currently is cpu intensive will no longer be cpu >>intensive. >> >> > >Hmm I realized I should add a qualifier to the above statement. Searching for >matching terms would no longer be cpu intensive, especially for wildcards like >*foo* or *foo. The other wildcard search problem of having too many matching >terms to lookup in the index still remains unsolved. > >--------------------------------------------------------------------- >To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >For additional commands, e-mail: java-user-help@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org