lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumaran Ramasubramanian <kums....@gmail.com>
Subject Re: Filters Vs queries - for terms more than 1024
Date Wed, 19 Jul 2017 11:51:35 GMT
Hi Adrien


i have tried
​
BooleanQuery with ConstantScoreQuery based suggestion from this link,
http://lucene.472066.n3.nabble.com/BooleanFilter-vs-BooleanQuery-performance-td4106920.html

If you want it fast, use
> ​​
> BooleanQuery and wrap it with ConstantScoreQuery. Then there is also no
> scoring done (in most cases, older BooleanQuery sometimes still calculated
> the score).




3. if i disable scoring process using ConstantScoreQuery, is it possible
> give more than 1024 query clauses?
>        i tried this.. But still getting java.lang.OutOfMemoryError.. Why ?


java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> at
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:254)
> at
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
> at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> at
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
> at
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:164)
> at
> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
> at
> org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
> at
> org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)




If i use BooleanQuery and wrap it with ConstantScoreQuery, shall i use 1
lakh boolean clauses in booleanquery ?





-
​-
Kumaran R

​

On Wed, Jul 19, 2017 at 8:26 AM, Kumaran Ramasubramanian <kums.134@gmail.com
> wrote:

>
>
> Thank you Adrien :-)
>
>
>
> On 18-Jul-2017 3:21 PM, "Adrien Grand" <jpountz@gmail.com> wrote:
>
> Sorry for the confusion, I keep saying query in all cases because queries
> and filters got merged in Lucene 5.0. If you are using BooleanFilter rather
> than BooleanQuery with Lucene 4 then things should be mostly ok if you have
> many clauses. But like TermsQuery, BooleanFilter always consume all
> matching documents from all its clauses. So if you intersect it with a
> selective query, it is wasteful.
>
> Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <kums.134@gmail.com
> >
> a écrit :
>
> > ​Hi Adrien,
> >
> > Thanks for your input...
> >
> > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > booleanFilter... is there any consequence using filters in this case?
> > shall
> > > i proceed with this?
> >
> >
> > ​code snippet i used for this statement 1.. ​
> >
> >                 for (int i = 0; i < 10
> > > ​00​
> > > 00; i++)
> > >                 {
> > >                     Term term = new Term("
> > > ​key
> > > "
> > > ​+i​
> > > , "
> > > ​value
> > > "
> > > ​+i​
> > > );
> > >                     TermsFilter filter = new
> > > ​​
> > > TermsFilter(term);
> > >                     FilterClause filterClause = new
> FilterClause(filter,
> > > BooleanClause.Occur.SHOULD);
> > >                     boolFilter.add(filterClause);
> > >                 }
> >
> >
> >
> > Do you see any problem in using
> > ​
> > TermsFilter over TermsQuery?
> >
> > btw, i will test with TermsQuery and let you know.
> >
> >
> >
> > ​--
> > Kumaran ​R
> >
> >
> >
> >
> > On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <jpountz@gmail.com> wrote:
> >
> > > Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It
> is
> > > worse at skipping over matches than a BooleanQuery but keeps memory
> > > usage low and disk access sequential, on the contrary to large boolean
> > > queries.
> > >
> > > Otherwise you would probably need to rethink how you design your
> > documents
> > > in order to be able to run simpler queries.
> > >
> > > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> > kums.134@gmail.com
> > > >
> > > a écrit :
> > >
> > > > Hi All,
> > > >
> > > > i am using lucene 4.10.4
> > > >
> > > > In lucene search, i know we have 1024 limitation in number of boolean
> > > query
> > > > clauses. i know we can increase this limit.. but i want to understand
> > > > queries vs filter in lucene 4.10.4...
> > > >
> > > > i want to make queries larger than 1024.. Relevance is not needed for
> > > > me. What are the best possible options?
> > > >
> > > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > > booleanFilter... is there any consequence using filters in this case?
> > > shall
> > > > i proceed with this?
> > > >
> > > > 2. if i am giving very less memory for filters, it is managed to
> > > complete a
> > > > search after so much GC cycles.. Why cannot we do the same for query
> > > > clauses too? What is the actual technical reason for 1024 limitation
> in
> > > > boolean query?
> > > >
> > > > 3. if i disable scoring process using ConstantScoreQuery, is it
> > possible
> > > > give more than 1024 query clauses?
> > > >        i tried this.. But still getting java.lang.OutOfMemoryError..
> > Why
> > > ?
> > > >
> > > > java.lang.OutOfMemoryError: Java heap space
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > > Lucene41PostingsReader.java:254)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > > docs(SegmentTermsEnum.java:999)
> > > > >
> > > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > > >
> > > > > at
> > > > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQue
> ry.java:84)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > > scorer(BooleanQuery.java:356)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > > ConstantScoreQuery.java:164)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > > filteredScorer(FilteredQuery.java:542)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > > filteredBulkScorer(FilteredQuery.java:504)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > > FilteredQuery.java:150)
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > Any pointers are much appreciated... Thank you..
> > > >
> > > >
> > > >
> > > > --
> > > > Kumaran R
> > > >
> > >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message