lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Filters Vs queries - for terms more than 1024
Date Thu, 27 Jul 2017 09:33:27 GMT
BooleanQuery is subject to the 1024 limit on the number of clauses, so you
can't use it in that case. You should use TermsQuery/TermsFilter instead.

Le mer. 19 juil. 2017 à 13:52, Kumaran Ramasubramanian <kums.134@gmail.com>
a écrit :

> Hi Adrien
>
>
> i have tried
> ​
> BooleanQuery with ConstantScoreQuery based suggestion from this link,
>
> http://lucene.472066.n3.nabble.com/BooleanFilter-vs-BooleanQuery-performance-td4106920.html
>
> If you want it fast, use
> > ​​
> > BooleanQuery and wrap it with ConstantScoreQuery. Then there is also no
> > scoring done (in most cases, older BooleanQuery sometimes still
> calculated
> > the score).
>
>
>
>
> 3. if i disable scoring process using ConstantScoreQuery, is it possible
> > give more than 1024 query clauses?
> >        i tried this.. But still getting java.lang.OutOfMemoryError.. Why
> ?
>
>
> java.lang.OutOfMemoryError: Java heap space
> > at
> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > at
> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:254)
> > at
> >
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
> > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > at
> org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> > at
> >
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
> > at
> >
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:164)
> > at
> >
> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
> > at
> >
> org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
> > at
> >
> org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)
>
>
>
>
> If i use BooleanQuery and wrap it with ConstantScoreQuery, shall i use 1
> lakh boolean clauses in booleanquery ?
>
>
>
>
>
> -
> ​-
> Kumaran R
>
> ​
>
> On Wed, Jul 19, 2017 at 8:26 AM, Kumaran Ramasubramanian <
> kums.134@gmail.com
> > wrote:
>
> >
> >
> > Thank you Adrien :-)
> >
> >
> >
> > On 18-Jul-2017 3:21 PM, "Adrien Grand" <jpountz@gmail.com> wrote:
> >
> > Sorry for the confusion, I keep saying query in all cases because queries
> > and filters got merged in Lucene 5.0. If you are using BooleanFilter
> rather
> > than BooleanQuery with Lucene 4 then things should be mostly ok if you
> have
> > many clauses. But like TermsQuery, BooleanFilter always consume all
> > matching documents from all its clauses. So if you intersect it with a
> > selective query, it is wasteful.
> >
> > Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <
> kums.134@gmail.com
> > >
> > a écrit :
> >
> > > ​Hi Adrien,
> > >
> > > Thanks for your input...
> > >
> > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > > booleanFilter... is there any consequence using filters in this case?
> > > shall
> > > > i proceed with this?
> > >
> > >
> > > ​code snippet i used for this statement 1.. ​
> > >
> > >                 for (int i = 0; i < 10
> > > > ​00​
> > > > 00; i++)
> > > >                 {
> > > >                     Term term = new Term("
> > > > ​key
> > > > "
> > > > ​+i​
> > > > , "
> > > > ​value
> > > > "
> > > > ​+i​
> > > > );
> > > >                     TermsFilter filter = new
> > > > ​​
> > > > TermsFilter(term);
> > > >                     FilterClause filterClause = new
> > FilterClause(filter,
> > > > BooleanClause.Occur.SHOULD);
> > > >                     boolFilter.add(filterClause);
> > > >                 }
> > >
> > >
> > >
> > > Do you see any problem in using
> > > ​
> > > TermsFilter over TermsQuery?
> > >
> > > btw, i will test with TermsQuery and let you know.
> > >
> > >
> > >
> > > ​--
> > > Kumaran ​R
> > >
> > >
> > >
> > >
> > > On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <jpountz@gmail.com>
> wrote:
> > >
> > > > Could you use TermInSetQuery (TermsQuery in older Lucene versions)?
> It
> > is
> > > > worse at skipping over matches than a BooleanQuery but keeps memory
> > > > usage low and disk access sequential, on the contrary to large
> boolean
> > > > queries.
> > > >
> > > > Otherwise you would probably need to rethink how you design your
> > > documents
> > > > in order to be able to run simpler queries.
> > > >
> > > > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> > > kums.134@gmail.com
> > > > >
> > > > a écrit :
> > > >
> > > > > Hi All,
> > > > >
> > > > > i am using lucene 4.10.4
> > > > >
> > > > > In lucene search, i know we have 1024 limitation in number of
> boolean
> > > > query
> > > > > clauses. i know we can increase this limit.. but i want to
> understand
> > > > > queries vs filter in lucene 4.10.4...
> > > > >
> > > > > i want to make queries larger than 1024.. Relevance is not needed
> for
> > > > > me. What are the best possible options?
> > > > >
> > > > > 1. using boolean filters is working for even 1lakh Filter Clauses
> in
> > > > > booleanFilter... is there any consequence using filters in this
> case?
> > > > shall
> > > > > i proceed with this?
> > > > >
> > > > > 2. if i am giving very less memory for filters, it is managed to
> > > > complete a
> > > > > search after so much GC cycles.. Why cannot we do the same for
> query
> > > > > clauses too? What is the actual technical reason for 1024
> limitation
> > in
> > > > > boolean query?
> > > > >
> > > > > 3. if i disable scoring process using ConstantScoreQuery, is it
> > > possible
> > > > > give more than 1024 query clauses?
> > > > >        i tried this.. But still getting
> java.lang.OutOfMemoryError..
> > > Why
> > > > ?
> > > > >
> > > > > java.lang.OutOfMemoryError: Java heap space
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > > > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > > > Lucene41PostingsReader.java:254)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > > > docs(SegmentTermsEnum.java:999)
> > > > > >
> > > > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > > > >
> > > > > > at
> > > > > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQue
> > ry.java:84)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > > > scorer(BooleanQuery.java:356)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > > > ConstantScoreQuery.java:164)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > > > filteredScorer(FilteredQuery.java:542)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > > > filteredBulkScorer(FilteredQuery.java:504)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > > > FilteredQuery.java:150)
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > Any pointers are much appreciated... Thank you..
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Kumaran R
> > > > >
> > > >
> > >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message