Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9758B11A12 for ; Wed, 27 Aug 2014 13:54:48 +0000 (UTC) Received: (qmail 30059 invoked by uid 500); 27 Aug 2014 13:54:45 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 29996 invoked by uid 500); 27 Aug 2014 13:54:45 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 29984 invoked by uid 99); 27 Aug 2014 13:54:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Aug 2014 13:54:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of milindr@gmail.com designates 209.85.217.179 as permitted sender) Received: from [209.85.217.179] (HELO mail-lb0-f179.google.com) (209.85.217.179) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Aug 2014 13:54:40 +0000 Received: by mail-lb0-f179.google.com with SMTP id v6so429321lbi.24 for ; Wed, 27 Aug 2014 06:54:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=etUFjHRyRzCTBcsCx2Tjnl6m3NCIg+2DAAP2h0SmjpI=; b=0qxZvz/EHhvOLr2HeHRk+fB6CVXa8uxOT4mRHSnQM+t1ZgZ2qN6vslwtEjDwLTjGYY Elxytm98Q+7m8wi/mhyLqQ3CEwed1ehvkHTx+Dq+JTUNm/iRXKsWMM4NsoVpo8AApm5M 41knYgUt5Yj09LXSjAiS18TEIO3l0yLOpsTqOOCToLeXGjlg5iKhCF97CkY283UvDe7Q Omw5lXLxpkyiYqfTghINaIGFjQoXl7DLlv3/vdQDrI0B4ETDLul8WN0TPbnBz09QUwNm tsD+ojFJrRMVON9f3O/XeX/+ONlE5Ee7ICxpqzJGqn/yRNNtugj1kfFGUQxNsjvp0qrU 2qYw== MIME-Version: 1.0 X-Received: by 10.152.6.193 with SMTP id d1mr2585503laa.69.1409147658882; Wed, 27 Aug 2014 06:54:18 -0700 (PDT) Received: by 10.25.20.170 with HTTP; Wed, 27 Aug 2014 06:54:18 -0700 (PDT) In-Reply-To: <6E6C37B1A0EA40B38E85A23108C7B2B4@JackKrupansky14> References: <6E6C37B1A0EA40B38E85A23108C7B2B4@JackKrupansky14> Date: Wed, 27 Aug 2014 09:54:18 -0400 Message-ID: Subject: Re: Why does this search fail? From: Milind To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=089e013d1910a4d5ed05019cc1a1 X-Virus-Checked: Checked by ClamAV on apache.org --089e013d1910a4d5ed05019cc1a1 Content-Type: text/plain; charset=UTF-8 I see. This is going to be extremely difficult to explain to end users. It doesn't work as they would expect. Some of the tokenizing rules are already somewhat confusing. Their expectation is that it should work the way their searches work in Google. It's difficult enough to recognize that because the period is surrounded by a digit and alphabet (as opposed to 2 digits or 2 alphabets), it gets tokenized. So I'd have expected that C0001.DevNm00* would effectively become a search for C0001 OR DevNm00*. But now, because of the presence of the wildcard, it's considered as 1 term and the period is not a tokenizer. That's actually good, but now the fact that it's still considered as 2 terms for wildcard searches makes it very unintuitive. I don't suppose that I can do anything about making wildcard search use multiple terms if joined together with a tokenizer. But is there any way that I can force it to go through an analyzer prior to doing the search? On Tue, Aug 26, 2014 at 4:21 PM, Jack Krupansky wrote: > Sorry, but you can only use a wildcard on a single term. "C0001.DevNm001" > gets indexed as two terms, "c0001" and "devnm001", so your wildcard won't > match any term (at least in this case.) > > Also, if your query term includes a wildcard, it will not be fully > analyzed. Some filters such as lower case are defined as "multi-term", so > they will be performed, but the standard tokenizer is not being called, so > the dot remains and this whole term is treated as one term, unlike the > index analysis. > > -- Jack Krupansky > > -----Original Message----- From: Milind > Sent: Tuesday, August 26, 2014 12:24 PM > To: java-user@lucene.apache.org > Subject: Why does this search fail? > > > I have a field with the value C0001.DevNm001. If I search for > > C0001.DevNm001 --> Get Hit > DevNm00* --> Get Hit > C0001.DevNm00* --> Get No Hit > > The field gets tokenized on the period since it's surrounded by a letter > and and a number. The query gets evaluated as a prefix query. I'd have > thought that this should have found the document. Any clues on why this > doesn't work? > > The full code is below. > > Directory theDirectory = new RAMDirectory(); > Version theVersion = Version.LUCENE_47; > Analyzer theAnalyzer = new StandardAnalyzer(theVersion); > IndexWriterConfig theConfig = > new IndexWriterConfig(theVersion, theAnalyzer); > IndexWriter theWriter = new IndexWriter(theDirectory, theConfig); > > String theFieldName = "Name"; > String theFieldValue = "C0001.DevNm001"; > Document theDocument = new Document(); > theDocument.add(new TextField(theFieldName, theFieldValue, > Field.Store.YES)); > theWriter.addDocument(theDocument); > theWriter.close(); > > String theQueryStr = theFieldName + ":C0001.DevNm00*"; > Query theQuery = > new QueryParser(theVersion, theFieldName, > theAnalyzer).parse(theQueryStr); > System.out.println(theQuery.getClass() + ", " + theQuery); > IndexReader theIndexReader = DirectoryReader.open(theDirectory); > IndexSearcher theSearcher = new IndexSearcher(theIndexReader); > TopScoreDocCollector collector = TopScoreDocCollector.create(10, > true); > theSearcher.search(theQuery, collector); > ScoreDoc[] theHits = collector.topDocs().scoreDocs; > System.out.println("Hits found: " + theHits.length); > > Output: > > class org.apache.lucene.search.PrefixQuery, Name:c0001.devnm00* > Hits found: 0 > > > -- > Regards > Milind > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- Regards Milind --089e013d1910a4d5ed05019cc1a1--