lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <dmsmith...@gmail.com>
Subject Re: accessing the query string from inside TokenFilter
Date Wed, 26 Oct 2011 12:28:53 GMT
+1 please fix the QP bug. It should only identify query keywords and non-keywords. 



On Oct 26, 2011, at 8:09 AM, Robert Muir <rcmuir@gmail.com> wrote:

> Use a queryparser that doesnt break on whitespace as a workaround?
> Or, we can start thinking about how to fix QueryParser
> (https://issues.apache.org/jira/browse/LUCENE-2605)
> 
> The bug is that QueryParser tries to be a Tokenizer and breaks on whitespace.
> Allowing tokenizer access to the query string would just mean that
> your tokenizer hacks around this by trying to be a QueryParser, too,
> making matters even worse!
> 
> 
> On Wed, Oct 26, 2011 at 8:05 AM, Bernd Fehling
> <bernd.fehling@uni-bielefeld.de> wrote:
>> OK, I think "query string" is a bit to specific, so more general
>> what I need is access from inside of a filter to the complete string
>> (not only token) being analyzed.
>> 
>> A very dirty workaround would be a "collector filter" which collects all
>> tokens after WhitespaceTokenizer and makes it somehow available for
>> the following filters, or not?
>> So at least at the last run of incrementToken() I have the original string.
>> 
>> Bernd
>> 
>> Am 26.10.2011 10:26, schrieb Uwe Schindler:
>>> 
>>> The input from StringReader does not help you:
>>> - in the case of QueryParser it is *not* the query string!!!
>>> - storing it in an attribute would blow up your heap for real documents
>>> 
>>> Uwe
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>>>> Sent: Wednesday, October 26, 2011 10:06 AM
>>>> To: dev@lucene.apache.org
>>>> Subject: Re: accessing the query string from inside TokenFilter
>>>> 
>>>>  From what I can see in the debugger the analyzer chain is implemented as
>>> 
>>> a
>>>> 
>>>> stack with last filter at the bottom and the first filter at the top.
>>>> 
>>>> An analyzer query chain of:
>>>> charFilter: MappingCharFilterFactory
>>>> tokenizer : WhitespaceTokenizerFactory
>>>> filter    : PatternReplaceFilterFactory
>>>> filter    : LowerCaseFilterFactory
>>>> filter    : ShingleFilterFactory
>>>> filter    : SynonymFilterFactory
>>>> 
>>>> has a chain of:
>>>> this.input(SynonymFilter) -->  input(ShingleFilter) -->
>>>> input(LowerCaseFilter) -->  input(PatternReplaceFilter) -->
>>>> input(WhitespaceTokenizer) -->  input(MappingCharFilter) -->
>>>> input(CharReader) -->  input(StringReader).str
>>>> 
>>>> So I can always "see" the input of StringReader, but can I access it?
>>>> 
>>>> Bernd
>>>> 
>>>> Am 26.10.2011 09:37, schrieb Chris Male:
>>>>> 
>>>>> We've also lost the full query string by the time the QP creates its
>>>>> TokenStream, right? Because the QP tokenizes on whitespace.
>>>>> 
>>>>> On Wed, Oct 26, 2011 at 8:32 PM, Uwe Schindler<uwe@thetaphi.de>
  wrote:
>>>>> 
>>>>>> Hi Simon,
>>>>>> 
>>>>>> The problem is the xchanged consumer/producer role. Once the
>>>>>> TokenStream calls clearAttributes() the attributes are gone, but
>>>>>> query parser can only set the attribute *before* calling
>>>>>> incrementToken(), so you have no chance to get them, as Tokenizer
>>>>>> cleared it before any filter can read it (unless we use an attribute
>>>>>> with clear() a no-op, which would fail lots of tests, as it's a hack).
>>>>>> 
>>>>>> Uwe
>>>>>> 
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>> http://www.thetaphi.de
>>>>>> eMail: uwe@thetaphi.de
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
>>>>>>> Sent: Wednesday, October 26, 2011 9:21 AM
>>>>>>> To: dev@lucene.apache.org
>>>>>>> Subject: Re: accessing the query string from inside TokenFilter
>>>>>>> 
>>>>>>> What Uwe says is correct though. What we possibly could do is
adding
>>>>>>> a queryattribute that is set in a query parser (you can do that
>>>>>>> yourself
>>>>>> 
>>>>>> though).
>>>>>>> 
>>>>>>> not sure if it is worth it and if we should do it.
>>>>>>> 
>>>>>>> simon
>>>>>>> 
>>>>>>> On Wed, Oct 26, 2011 at 8:58 AM, Uwe Schindler<uwe@thetaphi.de>
>>>> 
>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> QueryParser and TokenStreams are clearly separated, there
is no way
>>>>>>>> to get the query string from inside a TokenStream (and there
cannot
>>>>>>>> be, because QP is a consumer of the TS, which is used not
only for
>>>>>>>> query parsing). The only chance you have is to use a ThreadLocal
>>>>>>>> that you set before the query is parsed and then use it in
the
>>> 
>>> TokenFilter.
>>>>>>>> 
>>>>>>>> Uwe
>>>>>>>> 
>>>>>>>> -----
>>>>>>>> Uwe Schindler
>>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>>>>>>>> eMail: uwe@thetaphi.de
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
>>>>>>>>> Sent: Wednesday, October 26, 2011 8:33 AM
>>>>>>>>> To: dev@lucene.apache.org
>>>>>>>>> Subject: accessing the query string from inside TokenFilter
>>>>>>>>> 
>>>>>>>>> Dear list,
>>>>>>>>> while writing some TokenFilter for my analyzer chain
I need access
>>> 
>>> to
>>>>>>>>> 
>>>>>>>>> the
>>>>>>>> 
>>>>>>>> query
>>>>>>>>> 
>>>>>>>>> string from inside of my TokenFilter for some comparison,
but the
>>>>>>>>> Filters
>>>>>>>> 
>>>>>>>> are
>>>>>>>>> 
>>>>>>>>> working with a TokenStream and get seperate Tokens.
>>>>>>>>> Currently I couldn't get any access to the query string.
>>>>>>>>> 
>>>>>>>>> It would be great to have such a funtionality in lucene/solr.
>>>>>>>>> 
>>>>>>>>> Should I write a jira issue for it or is there somewhere
a wish
>>> 
>>> list?
>>>>>>>>> 
>>>>>>>>> Best regards
>>>>>>>>> Bernd
>>>>>>>>> 
>>>>>>>>> 
>>> ---------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For
>>>>>>>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For
>>>>>>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>>> 
>>> additional
>>>>>>> 
>>>>>>> commands, e-mail: dev-help@lucene.apache.org
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> *************************************************************
>>>> Bernd Fehling                Universitätsbibliothek Bielefeld
>>>> Dipl.-Inform. (FH)                        Universitätsstr. 25
>>>> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
>>>> bernd.fehling@uni-bielefeld.de                33615 Bielefeld
>>>> 
>>>> BASE - Bielefeld Academic Search Engine - www.base-search.net
>>>> *************************************************************
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>> 
>> 
>> --
>> *************************************************************
>> Bernd Fehling                Universitätsbibliothek Bielefeld
>> Dipl.-Inform. (FH)                        Universitätsstr. 25
>> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
>> bernd.fehling@uni-bielefeld.de                33615 Bielefeld
>> 
>> BASE - Bielefeld Academic Search Engine - www.base-search.net
>> *************************************************************
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>> 
>> 
> 
> 
> 
> -- 
> lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message