Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <51121EB0.4060108@gmail.com>
References: <51121EB0.4060108@gmail.com>
Date: Wed, 6 Feb 2013 12:13:45 +0200
Message-ID: 
 <CA+Ltc5J8YeH2VzVjXQ-fBfRFwix+fdttmimV8ou2tGPfoy5K4g@mail.gmail.com>
Subject: Re: Handle expression in the index
From: Alon Muchnick <alon@datonics.com>
To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
Content-Type: multipart/alternative; boundary=14dae9340f11e2c15304d50b945b

--14dae9340f11e2c15304d50b945b
Content-Type: text/plain; charset=ISO-8859-1

hi Nicolas ,
if i understand correctly what you are describing is that your tag field
will contain Lucine queries syntax  - one word = exact match , 2 words "xx
yy" = phrase match , and so on .

there is a search method called "Prospective search" which fits this
situation .

you can try and use this procedure :

1.run a query searching for a tag  (e.g. "first time")  the ScoreDocs you
get from the search will contain the "potential results" (potential  since
it will also contain tags with only one word from the phrase "time" or
"first" and not necessarily both).

2.iterate over the score doc and :
2.1 create a single in memory document index that will hold your original
search term ("first time ") , use the *MemoryIndex *object its super fast
and perfect for this type of search .
2.2 create a Lucune query out of the tag you are currently iterating on.
2.3 run the query you created in the previous step on the index you created
on step 2.1 if you got a hit that means that the tag matches your search
term and you can collect the text from that doc .

the above procedure works and it is quite fast (depending on how many
"potential results " results you get from your first search ) .

you can also read this blog which has an example :

http://www.sajalkayan.com/prospective-search-using-python.html

if some one has a batter approach to this issue , i would love to here it
as well.

Alon


On Wed, Feb 6, 2013 at 11:13 AM, Nicolas Roduit <nicolas.roduit@gmail.com>wrote:

> I'm starting with Lucene 4 and have built my own analyzer with stemming
> and synonyms. This works perfectly.
>
> I built a Lucene index with several documents (with an ID) containing a
> text (with TextField)  and a list of words or expressions related to the
> text (a kind of tag). Everything is OK when I make a query containing one
> of these words (tags), I find the related text. How can I proceed if I want
> to have a tag that contains several words (e.g. "first time"). This
> expression must not be separated in two words. The problem is when I make a
> query with the word "first" I will get the document in the hits, I would
> like to get the hit only when search for "first time".
>
> Can someone give me a clue?
>
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

--14dae9340f11e2c15304d50b945b--