lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: accessing the query string from inside TokenFilter
Date Wed, 26 Oct 2011 08:26:03 GMT
The input from StringReader does not help you:
- in the case of QueryParser it is *not* the query string!!!
- storing it in an attribute would blow up your heap for real documents

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
> Sent: Wednesday, October 26, 2011 10:06 AM
> To: dev@lucene.apache.org
> Subject: Re: accessing the query string from inside TokenFilter
> 
>  From what I can see in the debugger the analyzer chain is implemented as
a
> stack with last filter at the bottom and the first filter at the top.
> 
> An analyzer query chain of:
> charFilter: MappingCharFilterFactory
> tokenizer : WhitespaceTokenizerFactory
> filter    : PatternReplaceFilterFactory
> filter    : LowerCaseFilterFactory
> filter    : ShingleFilterFactory
> filter    : SynonymFilterFactory
> 
> has a chain of:
> this.input(SynonymFilter) --> input(ShingleFilter) -->
> input(LowerCaseFilter) --> input(PatternReplaceFilter) -->
> input(WhitespaceTokenizer) --> input(MappingCharFilter) -->
> input(CharReader) --> input(StringReader).str
> 
> So I can always "see" the input of StringReader, but can I access it?
> 
> Bernd
> 
> Am 26.10.2011 09:37, schrieb Chris Male:
> > We've also lost the full query string by the time the QP creates its
> > TokenStream, right? Because the QP tokenizes on whitespace.
> >
> > On Wed, Oct 26, 2011 at 8:32 PM, Uwe Schindler<uwe@thetaphi.de>  wrote:
> >
> >> Hi Simon,
> >>
> >> The problem is the xchanged consumer/producer role. Once the
> >> TokenStream calls clearAttributes() the attributes are gone, but
> >> query parser can only set the attribute *before* calling
> >> incrementToken(), so you have no chance to get them, as Tokenizer
> >> cleared it before any filter can read it (unless we use an attribute
> >> with clear() a no-op, which would fail lots of tests, as it's a hack).
> >>
> >> Uwe
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe@thetaphi.de
> >>
> >>
> >>> -----Original Message-----
> >>> From: Simon Willnauer [mailto:simon.willnauer@googlemail.com]
> >>> Sent: Wednesday, October 26, 2011 9:21 AM
> >>> To: dev@lucene.apache.org
> >>> Subject: Re: accessing the query string from inside TokenFilter
> >>>
> >>> What Uwe says is correct though. What we possibly could do is adding
> >>> a queryattribute that is set in a query parser (you can do that
> >>> yourself
> >> though).
> >>> not sure if it is worth it and if we should do it.
> >>>
> >>> simon
> >>>
> >>> On Wed, Oct 26, 2011 at 8:58 AM, Uwe Schindler<uwe@thetaphi.de>
> wrote:
> >>>> Hi,
> >>>>
> >>>> QueryParser and TokenStreams are clearly separated, there is no way
> >>>> to get the query string from inside a TokenStream (and there cannot
> >>>> be, because QP is a consumer of the TS, which is used not only for
> >>>> query parsing). The only chance you have is to use a ThreadLocal
> >>>> that you set before the query is parsed and then use it in the
TokenFilter.
> >>>>
> >>>> Uwe
> >>>>
> >>>> -----
> >>>> Uwe Schindler
> >>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> >>>> eMail: uwe@thetaphi.de
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Bernd Fehling [mailto:bernd.fehling@uni-bielefeld.de]
> >>>>> Sent: Wednesday, October 26, 2011 8:33 AM
> >>>>> To: dev@lucene.apache.org
> >>>>> Subject: accessing the query string from inside TokenFilter
> >>>>>
> >>>>> Dear list,
> >>>>> while writing some TokenFilter for my analyzer chain I need access
to
> >>>>> the
> >>>> query
> >>>>> string from inside of my TokenFilter for some comparison, but the
> >>>>> Filters
> >>>> are
> >>>>> working with a TokenStream and get seperate Tokens.
> >>>>> Currently I couldn't get any access to the query string.
> >>>>>
> >>>>> It would be great to have such a funtionality in lucene/solr.
> >>>>>
> >>>>> Should I write a jira issue for it or is there somewhere a wish
list?
> >>>>>
> >>>>> Best regards
> >>>>> Bernd
> >>>>>
> >>>>>
---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> >>>>> additional commands, e-mail: dev-help@lucene.apache.org
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> >>>> additional commands, e-mail: dev-help@lucene.apache.org
> >>>>
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
additional
> >>> commands, e-mail: dev-help@lucene.apache.org
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >>
> >
> >
> 
> --
> *************************************************************
> Bernd Fehling                Universit├Ątsbibliothek Bielefeld
> Dipl.-Inform. (FH)                        Universit├Ątsstr. 25
> Tel. +49 521 106-4060                   Fax. +49 521 106-4052
> bernd.fehling@uni-bielefeld.de                33615 Bielefeld
> 
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *************************************************************
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message