lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wiegand <michael.wieg...@lsv.uni-saarland.de>
Subject Re: index enforcing query terms to appear within the same sentence
Date Thu, 10 Mar 2011 16:34:15 GMT
Conceptually, I think I know what to do. Unfortunately, with the given 
interfaces of Lucene I have some difficulty.

If I add the content of a document sentence by sentence, i.e. line by 
line, (using a multi-valued field), there are only two constructors 
possible:
Field(String name, String value, Field.Store store, Field.Index index)
or
Field(String name, String value, Field.Store store, Field.Index index, 
Field.TermVector termVector)
The sentence comes as a string which I get from a BufferedReader-object 
by using the readLine() method.

But as far as I understood, I need to access some TokenStream-object in 
order to set the PositionIncrementAttribute. So how should that work?

Thank you in advance.

Ian Lea schrieb:
>> You can use multi valued fields if you play with the position
>> increment gap.  See e.g.
>> http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-sentence-td1501269.html

>>
>>
>> A google search for "lucene indexing sentences" or similar finds 
>> that, and more.
>>
>>
>> Different docs can have different fields/different numbers of fields,
>> but the position gap approach is probably better.
>>
>>
>> -- 
>> Ian.
>>
>>
>> On Fri, Mar 4, 2011 at 7:06 AM, Michael Wiegand
>> <michael.wiegand@lsv.uni-saarland.de> wrote:
>>  
>>> Hi,
>>>
>>> I would like to create an index with Lucene to a document 
>>> collections of
>>> text files.
>>> The index should be created in such a way, that for the search I can 
>>> enforce
>>> that query term A and query term B are contained within the same 
>>> sentence.
>>>
>>> How should implement the index? Should I have for every sentence a 
>>> different
>>> field (but make sure that it is not a multi-valued field because 
>>> they would
>>> get merged which is exactly what I do not want)?
>>> Would it be problematic that different documents would then end up 
>>> having
>>> different numbes of fields?
>>>
>>> Thank you in advance!
>>>
>>> Best,
>>> Michael
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>     
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>   
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message