lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 周洲 <zhou518z...@gmail.com>
Subject Re: Re: Many keywords problem
Date Tue, 08 May 2012 13:59:55 GMT
Let me leave,3q~

2012/5/8 Li Li <fancyerii@gmail.com>

> But this only get  (term1 or term2 or term3. ....). you can't
> implement (term1 or term2 ...) and (term3 or term4) by this method.
> maybe you should writer your own Scorer to deal with this kind of queries.
>
> On Tue, May 8, 2012 at 9:44 PM, Li Li <fancyerii@gmail.com> wrote:
> > disjunction query is much slower than conjuction query. That's why
> > many search engine use conjuction as default.
> > by the way, you say you have 5,000,000 documents. how many documents
> > match your query? do you need sort by relevant score or just want to
> > match and don't care sort?
> > if you don't care sort, you may try to use filter
> > e.g.
> > Query allDocsQuery=parser.parse("*:*);
> > TermsFilter cityFilter = new TermsFilter();
> > for (String term : terms) {
> >       cityFilter.addTerm(new Term("city",id));
> > }
> > searcher.search(allDocsQuery,cityFilter);
> >
> > I am not sure this method  is faster than boolean or query.
> > in theory, BooleanScorer is TAAT method(traverse each term in a 2k
> > window). BooleanScorer2 is DAAT algorithm. BooleanScorer is faster
> > than BooleanScorer2 but it can't support required queries and exlusive
> > queries and term count is less than 32(because it use a 32 bit integer
> > to remember which term hit).
> > TermsFilter is similar to BooleanScorer, it traverse all terms and use
> > a bitset to mask hited documents. if your matched document number is
> > very large, it may be faster than BooleanScorer2.
> >
> >
> > On Tue, May 8, 2012 at 6:54 PM, 齐保元 <qibaoyuan@126.com> wrote:
> >> Thanks for you reply,firstly.           So many or query is to monitor
> the term.One scene is that:if i want to know cities of a province and
> events that happens, I may instantiate the query like "(California or
> NewYork or SanFransico.... or SomePlace) and (Pollution or Criminal ... or
> Alcohol)".So, the long query happens...I hope i have describe the question
> clearly.----------------
> >> At 2012-05-08 18:44:13,"Li Li" <fancyerii@gmail.com> wrote:
> >>>a disjunction (or) query of so many terms is indeed slow.
> >>>can u describe your real problem? why you should the disjunction
> >>>results of so many terms?
> >>>
> >>>
> >>>
> >>>On Sun, May 6, 2012 at 9:57 PM, qibaoyuan@126.com <qibaoyuan@126.com>
> wrote:
> >>>> Hi,
> >>>>       I met a problem about how to search many keywords  in about
> 5,000,000 documents.For example the query may be like "(a1 or a2 or a3
> ....a200) and (b1 or b2 or b3 or b4 ..... b400)",I found it will take vey
> long time(40seconds) to get the the answer in only one field(Title
> field),and JVM will throw OutMemory error in more fields(title field plus
> content field).Any suggestions or good idea to solve this problem?thanks in
> advance.
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message