lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: index enforcing query terms to appear within the same sentence
Date Fri, 04 Mar 2011 15:17:27 GMT
Another index, or a different field in the same index but without the
modified gaps.  Maybe PerFieldAnalyzerWrapper would help - one
Analyzer for field x with modified gaps and a different one for field
y with standard gaps.


--
Ian.


On Fri, Mar 4, 2011 at 2:40 PM, Michael Wiegand
<michael.wiegand@lsv.uni-saarland.de> wrote:
> Thank you for all these useful hints!
>
> If I use the multi-valued fields in combination with "modified" position
> increments, I would actually distort the shape of a document.
> For instance, if I would like to compare a retrieval enforcing query term
> co-occurrence within the same sentence with a co-occurrence using
> PhraseQuery (or SpanNearQuery) just enforcing that the query terms have to
> appear within a text window of n words (also allowing this window to cross
> sentence boundaries), I would need to create another index where I do not
> modify the position increments.
> Is that right?
>
> Best,
> Michael
>
> Ian Lea schrieb:
>>
>> You can use multi valued fields if you play with the position
>> increment gap.  See e.g.
>>
>> http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-sentence-td1501269.html
>>
>> A google search for "lucene indexing sentences" or similar finds that, and
>> more.
>>
>>
>> Different docs can have different fields/different numbers of fields,
>> but the position gap approach is probably better.
>>
>>
>> --
>> Ian.
>>
>>
>> On Fri, Mar 4, 2011 at 7:06 AM, Michael Wiegand
>> <michael.wiegand@lsv.uni-saarland.de> wrote:
>>
>>>
>>> Hi,
>>>
>>> I would like to create an index with Lucene to a document collections of
>>> text files.
>>> The index should be created in such a way, that for the search I can
>>> enforce
>>> that query term A and query term B are contained within the same
>>> sentence.
>>>
>>> How should implement the index? Should I have for every sentence a
>>> different
>>> field (but make sure that it is not a multi-valued field because they
>>> would
>>> get merged which is exactly what I do not want)?
>>> Would it be problematic that different documents would then end up having
>>> different numbes of fields?
>>>
>>> Thank you in advance!
>>>
>>> Best,
>>> Michael
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message