lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
Subject Re: accessing the query string from inside TokenFilter
Date Wed, 26 Oct 2011 12:05:53 GMT
OK, I think "query string" is a bit to specific, so more general
what I need is access from inside of a filter to the complete string
(not only token) being analyzed.

A very dirty workaround would be a "collector filter" which collects all
tokens after WhitespaceTokenizer and makes it somehow available for
the following filters, or not?
So at least at the last run of incrementToken() I have the original string.

Bernd

Am 26.10.2011 10:26, schrieb Uwe Schindler:
> The input from StringReader does not help you:
> - in the case of QueryParser it is *not* the query string!!!
> - storing it in an attribute would blow up your heap for real documents
>
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>> Sent: Wednesday, October 26, 2011 10:06 AM
>> To: dev@lucene.apache.org
>> Subject: Re: accessing the query string from inside TokenFilter
>>
>>   From what I can see in the debugger the analyzer chain is implemented as
> a
>> stack with last filter at the bottom and the first filter at the top.
>>
>> An analyzer query chain of:
>> charFilter: MappingCharFilterFactory
>> tokenizer : WhitespaceTokenizerFactory
>> filter    : PatternReplaceFilterFactory
>> filter    : LowerCaseFilterFactory
>> filter    : ShingleFilterFactory
>> filter    : SynonymFilterFactory
>>
>> has a chain of:
>> this.input(SynonymFilter) -->  input(ShingleFilter) -->
>> input(LowerCaseFilter) -->  input(PatternReplaceFilter) -->
>> input(WhitespaceTokenizer) -->  input(MappingCharFilter) -->
>> input(CharReader) -->  input(StringReader).str
>>
>> So I can always "see" the input of StringReader, but can I access it?
>>
>> Bernd
>>
>> Am 26.10.2011 09:37, schrieb Chris Male:
>>> We've also lost the full query string by the time the QP creates its
>>> TokenStream, right? Because the QP tokenizes on whitespace.
>>>
>>> On Wed, Oct 26, 2011 at 8:32 PM, Uwe Schindler<uwe@thetaphi.de>   wrote:
>>>
>>>> Hi Simon,
>>>>
>>>> The problem is the xchanged consumer/producer role. Once the
>>>> TokenStream calls clearAttributes() the attributes are gone, but
>>>> query parser can only set the attribute *before* calling
>>>> incrementToken(), so you have no chance to get them, as Tokenizer
>>>> cleared it before any filter can read it (unless we use an attribute
>>>> with clear() a no-op, which would fail lots of tests, as it's a hack).
>>>>
>>>> Uwe
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>> http://www.thetaphi.de
>>>> eMail: uwe@thetaphi.de
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>>>>> Sent: Wednesday, October 26, 2011 9:21 AM
>>>>> To: dev@lucene.apache.org
>>>>> Subject: Re: accessing the query string from inside TokenFilter
>>>>>
>>>>> What Uwe says is correct though. What we possibly could do is adding
>>>>> a queryattribute that is set in a query parser (you can do that
>>>>> yourself
>>>> though).
>>>>> not sure if it is worth it and if we should do it.
>>>>>
>>>>> simon
>>>>>
>>>>> On Wed, Oct 26, 2011 at 8:58 AM, Uwe Schindler<uwe@thetaphi.de>
>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> QueryParser and TokenStreams are clearly separated, there is no way
>>>>>> to get the query string from inside a TokenStream (and there cannot
>>>>>> be, because QP is a consumer of the TS, which is used not only for
>>>>>> query parsing). The only chance you have is to use a ThreadLocal
>>>>>> that you set before the query is parsed and then use it in the
> TokenFilter.
>>>>>>
>>>>>> Uwe
>>>>>>
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>>>>>> eMail: uwe@thetaphi.de
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>>>>>>> Sent: Wednesday, October 26, 2011 8:33 AM
>>>>>>> To: dev@lucene.apache.org
>>>>>>> Subject: accessing the query string from inside TokenFilter
>>>>>>>
>>>>>>> Dear list,
>>>>>>> while writing some TokenFilter for my analyzer chain I need access
> to
>>>>>>> the
>>>>>> query
>>>>>>> string from inside of my TokenFilter for some comparison, but
the
>>>>>>> Filters
>>>>>> are
>>>>>>> working with a TokenStream and get seperate Tokens.
>>>>>>> Currently I couldn't get any access to the query string.
>>>>>>>
>>>>>>> It would be great to have such a funtionality in lucene/solr.
>>>>>>>
>>>>>>> Should I write a jira issue for it or is there somewhere a wish
> list?
>>>>>>>
>>>>>>> Best regards
>>>>>>> Bernd
>>>>>>>
>>>>>>>
> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>>>>>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>>>>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> additional
>>>>> commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>
>> --
>> *************************************************************
>> Bernd Fehling                Universitätsbibliothek Bielefeld
>> Dipl.-Inform. (FH)                        Universitätsstr. 25
>> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
>> bernd.fehling@uni-bielefeld.de                33615 Bielefeld
>>
>> BASE - Bielefeld Academic Search Engine - www.base-search.net
>> *************************************************************
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

-- 
*************************************************************
Bernd Fehling                Universitätsbibliothek Bielefeld
Dipl.-Inform. (FH)                        Universitätsstr. 25
Tel. +49 521 106-4060                   Fax. +49 521 106-4052
bernd.fehling@uni-bielefeld.de                33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message