lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tincu Gabriel <tincu.gabr...@gmail.com>
Subject Re: What is the proper use of stop words in Lucene?
Date Thu, 24 Apr 2014 10:27:06 GMT
Hi there,
The StopFilterFactory can be used to produce StopFilters with the desired
stop-words inside of it . As a constructor argument it takes a
Map<String,String> and one of the valid keys you can pass inside of that is
"enablePositionIncrements" . If you don't pass that in then it defaults to
true. Is this what you were looking for?


On Wed, Apr 23, 2014 at 12:36 PM, Chris Tomlinson <
chris.j.tomlinson@gmail.com> wrote:

> Hello,
>
> I've written several times now on the list with this question / problem
> and no one has yet replied so I don't know if the question is too
> wrong-headed or if there is simply no one reading the list that can comment
> on the question.
>
> The question that I'm trying to get answered is what is the correct way of
> ignoring stop word gaps in Lucene 4.4+?
>
> While we are using Lucene 4.4 embedded in eXist-db (exist-db.org), I
> think the question is a proper Lucene question and really has nothing to do
> with the fact that we're using it in an embedded manner.
>
> The problem to be solved is how to ignore stop word gaps in queries -
> without the user having to indicate where such gaps might occur at query
> time.
>
> Since Lucene 4.4 the
> FilteringTokenFilter.setEnablePositionIncrements(false) is not available.
> None of the resources such as the "Lucene in Action" and so on explain how
> to use Lucene to get the desired effect now that 4.4+ has removed the
> previous approach.
>
> Prior to Lucene 4.4 it was possible to setEnablePositionIncrements(false)
> so that during indexing and querying the number and position of stop word
> gaps would be ignored (as mentioned on pp 138-139 of "Lucene in Action").
>
> This meant that a document with a phrase such as:
>
>    blue is the sky
>
> with stop words "is" and "the" would be selected by the query:
>
>    blue sky
>
> This is what we want to achieve.
>
> Why? We are working with Tibetan and elisions are not uncommon so that,
> e.g.:
>
>    rin po che
>
> on some occasions might be shortened to
>
>    rin che
>
> and we would like to have a query of
>
>    rin po che
>
> or
>
>    rin che
>
> find all occurrences of
>
>    rin po che
>
> and
>
>    rin che
>
> without having the user have to mark where elisions might occur.
>
> The
> org.apache.lucene.queryparser.flexible.standard.CommonQueryParserConfiguration
> provides a setEnablePositionIncrements but that does not seem to work to
> allow for the above desired query behavior that was possible prior to
> Lucene 4.4.
>
> What is the proper way to ignore stop word gaps?
>
> Thank you,
> Chris
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message