lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashi Kant <shashi_k...@yahoo.com>
Subject Re: Lucene indexes
Date Tue, 24 Feb 2009 16:12:47 GMT
Nada,

You might want to consider writing a custom tokenizer which will allow you to generate tokens
based on your needs (other than whitespace).
Another option would be to look at SpanQuery or SpanNearQuery which would help with the kind
of problem you are trying to solve (assuming I understand you correctly).

HTH,
Shashi




----- Original Message ----
From: Nada Mimouni <mimouni@tk.informatik.tu-darmstadt.de>
To: java-user@lucene.apache.org
Sent: Tuesday, February 24, 2009 9:22:19 AM
Subject: RE: Lucene indexes

Thank you Erick.

I am totally aware that Lucene uses inverted index (class: IndexWriter).
I have read in the literature about new efficient indexes that are created to handle phrases
indexing, so I wondered if there are some updates or new classes added to Lucene for that
reason.

The problem that I am trying to solve is : How to index phrases (rather than phrase querying)?

I have a Questions/Answers corpus, the architecture I am using for IR creates one index for
questions and another one for answers (based on single terms) and then matches between them.
I want to index phrases in addition to single terms (for both questions and answers) and then
make a search for all terms and phrases in the questions index. 

If you have any idea how I can solve this problem of indexing phrases, it would be of great
help. 

Nada Mimouni



-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Tue 2/24/2009 2:13 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene indexes

I have to ask why do you care? Which is another way of asking
what problem you're trying to solve that you think this information
would help with. As far as I know Lucene is an inverted index,
period. You use IndexWriter to create it.

Really the best way to get a sense for which classes to use is to work
through some of the examples in Lucene In Action or on the website.

This may help as far as the structure of the index is concerned:
http://lucene.apache.org/java/2_4_0/fileformats.html

Best
Erick

On Tue, Feb 24, 2009 at 5:36 AM, Nada Mimouni <
mimouni@tk.informatik.tu-darmstadt.de> wrote:

>
> Hello everybody,
>
> 1) What is the difference between :
> - inverted index
> - nextword index
> - common index
>
> 2) Which one(s) is(are) supported by Lucene?
>
> 3) Which class(es) create this(those) index(es)?
>
>
> Thank you in advance for your help.
> Nada Mimouni
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message