lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 小鱼儿 <ctengc...@gmail.com>
Subject Re: Question about PhraseQuery's capacity...
Date Fri, 10 Jan 2020 09:09:38 GMT
Hi Adrien,
     I find i might make a mistake:
     There is 2 level processing in a Analyzer class: one is Tokenizer,
which is HMMChineseTokenizer, and the other is Analyzer which may apply
some filtering...
     I'm using lucene's default interface to set a Analyzer instance to do
the indexing, but i'm using the Tokenizer to parse raw query text to build
the Query.
     The wierd thing is, there is a lucene query-parser module, but it will
deal with some meta syntax like AND/OR filedName:xxx, so i think it cannot
directly deal with the raw query text?
     But when i try to use the upper Analyzer.tokenStream() to parse
separate terms from raw query text, i get the very confusing api:
TokenStream has no clear interface to get the terms(filtered tokens), but
the Attribute concept, which is used only in lucene internals. Where can i
find a sample code to extract the filtered tokens from the TokenStream
interface?

Adrien Grand <jpountz@gmail.com> 于2020年1月10日周五 下午4:53写道:

> It should match. My guess is that you might not reusing the same positions
> as set by the analysis chain when creating the phrase query? Can you show
> us how you build the phrase query?
>
> On Fri, Jan 10, 2020 at 9:24 AM 小鱼儿 <ctengctsh@gmail.com> wrote:
>
> > I use SmartChineseAnalyzer to do the indexing, and add a document with a
> > TextField whose value is a long sentence, when anaylized, will get 18
> > terms.
> >
> > & then i use the same value to construct a PhraseQuery, setting slop to
> 2,
> > and adding the 18 terms concequently...
> >
> > I expect the search api to find this document, but it returns empty.
> >
> > Where am i wrong?
> >
>
>
> --
> Adrien
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message