Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 46935 invoked from network); 4 Nov 2004 17:52:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 4 Nov 2004 17:52:31 -0000 Received: (qmail 97630 invoked by uid 500); 4 Nov 2004 17:52:25 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 97556 invoked by uid 500); 4 Nov 2004 17:52:23 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 97540 invoked by uid 99); 4 Nov 2004 17:52:23 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of greenlion@gmail.com designates 64.233.170.198 as permitted sender) Received: from [64.233.170.198] (HELO rproxy.gmail.com) (64.233.170.198) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 04 Nov 2004 09:52:23 -0800 Received: by rproxy.gmail.com with SMTP id 40so44618rnz for ; Thu, 04 Nov 2004 09:52:21 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:mime-version:content-type:content-transfer-encoding; b=bRdM/MZ5HIwd+Jx6cYXo4gbI6/9qYjBRxK3fS6qNnhpfLVoha4nFuVMxwkbkyYTUhTtNiLLOYU0/jH/13Uh6CJaNtTZPxDtxKU+HBIKZC2/Wt2/gPz4bVwqZpoLHUR+D/dgmGtvJs83f0fCrN4kB2hPIctTgW9as8jOSkYvAGUc= Received: by 10.38.8.9 with SMTP id 9mr387216rnh; Thu, 04 Nov 2004 09:52:21 -0800 (PST) Received: by 10.38.179.13 with HTTP; Thu, 4 Nov 2004 09:52:21 -0800 (PST) Message-ID: Date: Thu, 4 Nov 2004 10:52:21 -0700 From: Justin Swanhart Reply-To: Justin Swanhart To: Lucene Users List Subject: prefix wildcard matching options (*blah) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I'm thinking about making a seperate field in my index for prefix wildcard searches. I would chop off x characters from the front to create "subtokens" for the prefix matches. For the term: republican terms created: republican epublican publican ublican blican My query parser would then intelligently decide if their is a term that has a wildcard as the first character of the term. Instead of searching the normal field, it would then remove the wildcard from the start of the term and search on the prefix field instead. A search for "*pub*" would be converted to "pub*" in the prefix field. A search for "*blican" would be converted to "blican" Does this sound like an intelligent way to create fast prefix querying ability? Can I index the prefix field with a seperate analyzer that makes the prefix tokens, or should I just do the index-time expansion manually? I wouldn't need to search with this analyzer, just index with it, because the searching doesn't have to expand all those terms. If using a seperate analyzer for the prefix field makes more sense how do I make a tokenizer that returns multiple tokens for one word? --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org