lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Filters Vs queries - for terms more than 1024
Date Tue, 18 Jul 2017 09:50:49 GMT
Sorry for the confusion, I keep saying query in all cases because queries
and filters got merged in Lucene 5.0. If you are using BooleanFilter rather
than BooleanQuery with Lucene 4 then things should be mostly ok if you have
many clauses. But like TermsQuery, BooleanFilter always consume all
matching documents from all its clauses. So if you intersect it with a
selective query, it is wasteful.

Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <kums.134@gmail.com>
a écrit :

> ​Hi Adrien,
>
> Thanks for your input...
>
> 1. using boolean filters is working for even 1lakh Filter Clauses in
> > booleanFilter... is there any consequence using filters in this case?
> shall
> > i proceed with this?
>
>
> ​code snippet i used for this statement 1.. ​
>
>                 for (int i = 0; i < 10
> > ​00​
> > 00; i++)
> >                 {
> >                     Term term = new Term("
> > ​key
> > "
> > ​+i​
> > , "
> > ​value
> > "
> > ​+i​
> > );
> >                     TermsFilter filter = new
> > ​​
> > TermsFilter(term);
> >                     FilterClause filterClause = new FilterClause(filter,
> > BooleanClause.Occur.SHOULD);
> >                     boolFilter.add(filterClause);
> >                 }
>
>
>
> Do you see any problem in using
> ​
> TermsFilter over TermsQuery?
>
> btw, i will test with TermsQuery and let you know.
>
>
>
> ​--
> Kumaran ​R
>
>
>
>
> On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <jpountz@gmail.com> wrote:
>
> > Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It is
> > worse at skipping over matches than a BooleanQuery but keeps memory
> > usage low and disk access sequential, on the contrary to large boolean
> > queries.
> >
> > Otherwise you would probably need to rethink how you design your
> documents
> > in order to be able to run simpler queries.
> >
> > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> kums.134@gmail.com
> > >
> > a écrit :
> >
> > > Hi All,
> > >
> > > i am using lucene 4.10.4
> > >
> > > In lucene search, i know we have 1024 limitation in number of boolean
> > query
> > > clauses. i know we can increase this limit.. but i want to understand
> > > queries vs filter in lucene 4.10.4...
> > >
> > > i want to make queries larger than 1024.. Relevance is not needed for
> > > me. What are the best possible options?
> > >
> > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > booleanFilter... is there any consequence using filters in this case?
> > shall
> > > i proceed with this?
> > >
> > > 2. if i am giving very less memory for filters, it is managed to
> > complete a
> > > search after so much GC cycles.. Why cannot we do the same for query
> > > clauses too? What is the actual technical reason for 1024 limitation in
> > > boolean query?
> > >
> > > 3. if i disable scoring process using ConstantScoreQuery, is it
> possible
> > > give more than 1024 query clauses?
> > >        i tried this.. But still getting java.lang.OutOfMemoryError..
> Why
> > ?
> > >
> > > java.lang.OutOfMemoryError: Java heap space
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > Lucene41PostingsReader.java:254)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > docs(SegmentTermsEnum.java:999)
> > > >
> > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > >
> > > > at
> > > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > scorer(BooleanQuery.java:356)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > ConstantScoreQuery.java:164)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > filteredScorer(FilteredQuery.java:542)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > filteredBulkScorer(FilteredQuery.java:504)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > FilteredQuery.java:150)
> > > >
> > > >
> > >
> > >
> > >
> > > Any pointers are much appreciated... Thank you..
> > >
> > >
> > >
> > > --
> > > Kumaran R
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message