Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 85D84E3F3 for ; Wed, 6 Feb 2013 10:14:20 +0000 (UTC) Received: (qmail 29027 invoked by uid 500); 6 Feb 2013 10:14:16 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 28519 invoked by uid 500); 6 Feb 2013 10:14:15 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 28481 invoked by uid 99); 6 Feb 2013 10:14:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2013 10:14:14 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,T_FRT_ADULT2 X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.210.174] (HELO mail-ia0-f174.google.com) (209.85.210.174) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2013 10:14:07 +0000 Received: by mail-ia0-f174.google.com with SMTP id o25so1354509iad.33 for ; Wed, 06 Feb 2013 02:13:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=4Ry3UF+/7/5S04LdK0Tr46RJshKegQxo4+K54NqfzHI=; b=V+XLoFY/pSBQgzl3YQB7Ev39IZE18iUpzBM6HzotGuwjqow4s4poJQFWGzEJwJu0io 1zDfi1QJLPdeWGatpauV29QiziRy7vn+4pqprcM8Zjvz48Nedzj6DbcRgZPhNg7VHQ9m HnkNbAoZnvPJAfSk+gL/5OzCIzom/8ff+ihbGiRaHT7PNywL/AVQUJ9sVCAt7kTv4QhR o2GQBhDCH6nFcrSiNrwe77l/OW9n5JVStht824NSzgPZuCcxabU5DR5VrW/XVOL+4RhS vg0Yd0mIhkD2oPmE9F0G60lYT1+EfLMwo9X6yzMEjqmwlpXA5b+jT8OtAom1U9enMNPZ SpHg== MIME-Version: 1.0 X-Received: by 10.50.12.226 with SMTP id b2mr4840812igc.28.1360145626087; Wed, 06 Feb 2013 02:13:46 -0800 (PST) Received: by 10.64.21.131 with HTTP; Wed, 6 Feb 2013 02:13:45 -0800 (PST) In-Reply-To: <51121EB0.4060108@gmail.com> References: <51121EB0.4060108@gmail.com> Date: Wed, 6 Feb 2013 12:13:45 +0200 Message-ID: Subject: Re: Handle expression in the index From: Alon Muchnick To: "java-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=14dae9340f11e2c15304d50b945b X-Gm-Message-State: ALoCoQnH1yUv858fy51TFaZk0bZsKoanYVHqDcqyg5MSUGUWleVt6YBX1UaVk4IIRlMBT55SUnr2 X-Virus-Checked: Checked by ClamAV on apache.org --14dae9340f11e2c15304d50b945b Content-Type: text/plain; charset=ISO-8859-1 hi Nicolas , if i understand correctly what you are describing is that your tag field will contain Lucine queries syntax - one word = exact match , 2 words "xx yy" = phrase match , and so on . there is a search method called "Prospective search" which fits this situation . you can try and use this procedure : 1.run a query searching for a tag (e.g. "first time") the ScoreDocs you get from the search will contain the "potential results" (potential since it will also contain tags with only one word from the phrase "time" or "first" and not necessarily both). 2.iterate over the score doc and : 2.1 create a single in memory document index that will hold your original search term ("first time ") , use the *MemoryIndex *object its super fast and perfect for this type of search . 2.2 create a Lucune query out of the tag you are currently iterating on. 2.3 run the query you created in the previous step on the index you created on step 2.1 if you got a hit that means that the tag matches your search term and you can collect the text from that doc . the above procedure works and it is quite fast (depending on how many "potential results " results you get from your first search ) . you can also read this blog which has an example : http://www.sajalkayan.com/prospective-search-using-python.html if some one has a batter approach to this issue , i would love to here it as well. Alon On Wed, Feb 6, 2013 at 11:13 AM, Nicolas Roduit wrote: > I'm starting with Lucene 4 and have built my own analyzer with stemming > and synonyms. This works perfectly. > > I built a Lucene index with several documents (with an ID) containing a > text (with TextField) and a list of words or expressions related to the > text (a kind of tag). Everything is OK when I make a query containing one > of these words (tags), I find the related text. How can I proceed if I want > to have a tag that contains several words (e.g. "first time"). This > expression must not be separated in two words. The problem is when I make a > query with the word "first" I will get the document in the hits, I would > like to get the hit only when search for "first time". > > Can someone give me a clue? > > > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org > For additional commands, e-mail: java-user-help@lucene.apache.**org > > --14dae9340f11e2c15304d50b945b--