lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
Subject Re: accessing the query string from inside TokenFilter
Date Wed, 26 Oct 2011 08:06:01 GMT
 From what I can see in the debugger the analyzer chain is implemented
as a stack with last filter at the bottom and the first filter at the top.

An analyzer query chain of:
charFilter: MappingCharFilterFactory
tokenizer : WhitespaceTokenizerFactory
filter    : PatternReplaceFilterFactory
filter    : LowerCaseFilterFactory
filter    : ShingleFilterFactory
filter    : SynonymFilterFactory

has a chain of:
this.input(SynonymFilter) --> input(ShingleFilter) -->
input(LowerCaseFilter) --> input(PatternReplaceFilter) -->
input(WhitespaceTokenizer) --> input(MappingCharFilter) -->
input(CharReader) --> input(StringReader).str

So I can always "see" the input of StringReader, but can I access it?

Bernd

Am 26.10.2011 09:37, schrieb Chris Male:
> We've also lost the full query string by the time the QP creates its
> TokenStream, right? Because the QP tokenizes on whitespace.
>
> On Wed, Oct 26, 2011 at 8:32 PM, Uwe Schindler<uwe@thetaphi.de>  wrote:
>
>> Hi Simon,
>>
>> The problem is the xchanged consumer/producer role. Once the TokenStream
>> calls clearAttributes() the attributes are gone, but query parser can only
>> set the attribute *before* calling incrementToken(), so you have no chance
>> to get them, as Tokenizer cleared it before any filter can read it (unless
>> we use an attribute with clear() a no-op, which would fail lots of tests,
>> as it's a hack).
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>>> Sent: Wednesday, October 26, 2011 9:21 AM
>>> To: dev@lucene.apache.org
>>> Subject: Re: accessing the query string from inside TokenFilter
>>>
>>> What Uwe says is correct though. What we possibly could do is adding a
>>> queryattribute that is set in a query parser (you can do that yourself
>> though).
>>> not sure if it is worth it and if we should do it.
>>>
>>> simon
>>>
>>> On Wed, Oct 26, 2011 at 8:58 AM, Uwe Schindler<uwe@thetaphi.de>  wrote:
>>>> Hi,
>>>>
>>>> QueryParser and TokenStreams are clearly separated, there is no way to
>>>> get the query string from inside a TokenStream (and there cannot be,
>>>> because QP is a consumer of the TS, which is used not only for query
>>>> parsing). The only chance you have is to use a ThreadLocal that you
>>>> set before the query is parsed and then use it in the TokenFilter.
>>>>
>>>> Uwe
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>> http://www.thetaphi.de
>>>> eMail: uwe@thetaphi.de
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>>>>> Sent: Wednesday, October 26, 2011 8:33 AM
>>>>> To: dev@lucene.apache.org
>>>>> Subject: accessing the query string from inside TokenFilter
>>>>>
>>>>> Dear list,
>>>>> while writing some TokenFilter for my analyzer chain I need access to
>>>>> the
>>>> query
>>>>> string from inside of my TokenFilter for some comparison, but the
>>>>> Filters
>>>> are
>>>>> working with a TokenStream and get seperate Tokens.
>>>>> Currently I couldn't get any access to the query string.
>>>>>
>>>>> It would be great to have such a funtionality in lucene/solr.
>>>>>
>>>>> Should I write a jira issue for it or is there somewhere a wish list?
>>>>>
>>>>> Best regards
>>>>> Bernd
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>>>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>>> commands, e-mail: dev-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
>

-- 
*************************************************************
Bernd Fehling                Universit├Ątsbibliothek Bielefeld
Dipl.-Inform. (FH)                        Universit├Ątsstr. 25
Tel. +49 521 106-4060                   Fax. +49 521 106-4052
bernd.fehling@uni-bielefeld.de                33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message