lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: accessing the query string from inside TokenFilter
Date Wed, 26 Oct 2011 12:33:07 GMT
On Wed, Oct 26, 2011 at 2:09 PM, Robert Muir <rcmuir@gmail.com> wrote:
> Use a queryparser that doesnt break on whitespace as a workaround?
> Or, we can start thinking about how to fix QueryParser
> (https://issues.apache.org/jira/browse/LUCENE-2605)

+1
>
> The bug is that QueryParser tries to be a Tokenizer and breaks on whitespace.
> Allowing tokenizer access to the query string would just mean that
> your tokenizer hacks around this by trying to be a QueryParser, too,
> making matters even worse!
>
>
> On Wed, Oct 26, 2011 at 8:05 AM, Bernd Fehling
> <bernd.fehling@uni-bielefeld.de> wrote:
>> OK, I think "query string" is a bit to specific, so more general
>> what I need is access from inside of a filter to the complete string
>> (not only token) being analyzed.
>>
>> A very dirty workaround would be a "collector filter" which collects all
>> tokens after WhitespaceTokenizer and makes it somehow available for
>> the following filters, or not?
>> So at least at the last run of incrementToken() I have the original string.
>>
>> Bernd
>>
>> Am 26.10.2011 10:26, schrieb Uwe Schindler:
>>>
>>> The input from StringReader does not help you:
>>> - in the case of QueryParser it is *not* the query string!!!
>>> - storing it in an attribute would blow up your heap for real documents
>>>
>>> Uwe
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>> -----Original Message-----
>>>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>>>> Sent: Wednesday, October 26, 2011 10:06 AM
>>>> To: dev@lucene.apache.org
>>>> Subject: Re: accessing the query string from inside TokenFilter
>>>>
>>>>  From what I can see in the debugger the analyzer chain is implemented as
>>>
>>> a
>>>>
>>>> stack with last filter at the bottom and the first filter at the top.
>>>>
>>>> An analyzer query chain of:
>>>> charFilter: MappingCharFilterFactory
>>>> tokenizer : WhitespaceTokenizerFactory
>>>> filter    : PatternReplaceFilterFactory
>>>> filter    : LowerCaseFilterFactory
>>>> filter    : ShingleFilterFactory
>>>> filter    : SynonymFilterFactory
>>>>
>>>> has a chain of:
>>>> this.input(SynonymFilter) -->  input(ShingleFilter) -->
>>>> input(LowerCaseFilter) -->  input(PatternReplaceFilter) -->
>>>> input(WhitespaceTokenizer) -->  input(MappingCharFilter) -->
>>>> input(CharReader) -->  input(StringReader).str
>>>>
>>>> So I can always "see" the input of StringReader, but can I access it?
>>>>
>>>> Bernd
>>>>
>>>> Am 26.10.2011 09:37, schrieb Chris Male:
>>>>>
>>>>> We've also lost the full query string by the time the QP creates its
>>>>> TokenStream, right? Because the QP tokenizes on whitespace.
>>>>>
>>>>> On Wed, Oct 26, 2011 at 8:32 PM, Uwe Schindler<uwe@thetaphi.de>
  wrote:
>>>>>
>>>>>> Hi Simon,
>>>>>>
>>>>>> The problem is the xchanged consumer/producer role. Once the
>>>>>> TokenStream calls clearAttributes() the attributes are gone, but
>>>>>> query parser can only set the attribute *before* calling
>>>>>> incrementToken(), so you have no chance to get them, as Tokenizer
>>>>>> cleared it before any filter can read it (unless we use an attribute
>>>>>> with clear() a no-op, which would fail lots of tests, as it's a hack).
>>>>>>
>>>>>> Uwe
>>>>>>
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>> http://www.thetaphi.de
>>>>>> eMail: uwe@thetaphi.de
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>>>>>>> Sent: Wednesday, October 26, 2011 9:21 AM
>>>>>>> To: dev@lucene.apache.org
>>>>>>> Subject: Re: accessing the query string from inside TokenFilter
>>>>>>>
>>>>>>> What Uwe says is correct though. What we possibly could do is
adding
>>>>>>> a queryattribute that is set in a query parser (you can do that
>>>>>>> yourself
>>>>>>
>>>>>> though).
>>>>>>>
>>>>>>> not sure if it is worth it and if we should do it.
>>>>>>>
>>>>>>> simon
>>>>>>>
>>>>>>> On Wed, Oct 26, 2011 at 8:58 AM, Uwe Schindler<uwe@thetaphi.de>
>>>>
>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> QueryParser and TokenStreams are clearly separated, there
is no way
>>>>>>>> to get the query string from inside a TokenStream (and there
cannot
>>>>>>>> be, because QP is a consumer of the TS, which is used not
only for
>>>>>>>> query parsing). The only chance you have is to use a ThreadLocal
>>>>>>>> that you set before the query is parsed and then use it in
the
>>>
>>> TokenFilter.
>>>>>>>>
>>>>>>>> Uwe
>>>>>>>>
>>>>>>>> -----
>>>>>>>> Uwe Schindler
>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>>>>>>>> eMail: uwe@thetaphi.de
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>>>>>>>>> Sent: Wednesday, October 26, 2011 8:33 AM
>>>>>>>>> To: dev@lucene.apache.org
>>>>>>>>> Subject: accessing the query string from inside TokenFilter
>>>>>>>>>
>>>>>>>>> Dear list,
>>>>>>>>> while writing some TokenFilter for my analyzer chain
I need access
>>>
>>> to
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>
>>>>>>>> query
>>>>>>>>>
>>>>>>>>> string from inside of my TokenFilter for some comparison,
but the
>>>>>>>>> Filters
>>>>>>>>
>>>>>>>> are
>>>>>>>>>
>>>>>>>>> working with a TokenStream and get seperate Tokens.
>>>>>>>>> Currently I couldn't get any access to the query string.
>>>>>>>>>
>>>>>>>>> It would be great to have such a funtionality in lucene/solr.
>>>>>>>>>
>>>>>>>>> Should I write a jira issue for it or is there somewhere
a wish
>>>
>>> list?
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>> Bernd
>>>>>>>>>
>>>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For
>>>>>>>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For
>>>>>>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>>>
>>> additional
>>>>>>>
>>>>>>> commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> *************************************************************
>>>> Bernd Fehling                Universitätsbibliothek Bielefeld
>>>> Dipl.-Inform. (FH)                        Universitätsstr. 25
>>>> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
>>>> bernd.fehling@uni-bielefeld.de                33615 Bielefeld
>>>>
>>>> BASE - Bielefeld Academic Search Engine - www.base-search.net
>>>> *************************************************************
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>> --
>> *************************************************************
>> Bernd Fehling                Universitätsbibliothek Bielefeld
>> Dipl.-Inform. (FH)                        Universitätsstr. 25
>> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
>> bernd.fehling@uni-bielefeld.de                33615 Bielefeld
>>
>> BASE - Bielefeld Academic Search Engine - www.base-search.net
>> *************************************************************
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
>
>
> --
> lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message