lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Diviacco <patrick.divia...@gmail.com>
Subject Re: how to get all documents in the results ?
Date Wed, 23 Mar 2011 08:08:13 GMT
yeah it is clear. However I don't just want all documents, I still want to
perform my specific query and on the bottom of the relevant docs, to list
all not relevant docs as well. (I need this for successive steps).

However now it seems to work. I've added MatchAllDocsQuery to my
BooleanQuery, and I get the original results plus all remaining docs on
bottom.

 The only thing I don't understand is the fixed score of not relevant docs
(the ones appearing only now with MatchAllDocsQuery).

This is the list (please ignore Time and Harvesine scores).

298736393 298736393 All-text score:9.775101 Time:1.0 Harvesine:1.0 true
298736393 298736445 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
298736393 298736527 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
298736393 298736583 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
true
298736393 298736638 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
true
298736393 298736709 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
true
298736393 298736781 All-text score:3.1505027 Time:0.9999981
Harvesine:-5593.9307 true
298736393 298736848 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
true
298736393 406412394 All-text score:0.62491775 Time:0.9983447
Harvesine:-5593.9307 false
298736393 361767007 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767030 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767061 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767089 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767103 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767123 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767146 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 ?361492738 All-text score:0.13399276 Time:0.99982494
Harvesine:-5593.9307 false
298736393 361492799 All-text score:0.13399276 Time:0.99982494
Harvesine:-5593.9307 false
298736393 85899696 All-text score:0.061811533 Time:0.9680632
Harvesine:-2355.1875 false
298736393 103368776 All-text score:0.032769617 Time:0.855137
Harvesine:-5593.9307 false
298736393 101106927 All-text score:0.029139375 Time:0.8688775
Harvesine:-5593.9307 false
298736393 80793153 All-text score:0.0018638512 Time:0.99774164
Harvesine:-2201.7686 false
298736393 82661153 All-text score:0.0018638512 Time:0.98817354
Harvesine:-5593.9307 false
298736393 84079583 All-text score:0.0018638512 Time:0.9796918
Harvesine:-5593.9307 false
298736393 84079707 All-text score:0.0018638512 Time:0.9796918
Harvesine:-5593.9307 false
298736393 84079911 All-text score:0.0018638512 Time:0.9796918
Harvesine:-5593.9307 false


On 23 March 2011 09:01, Anshum <anshumg@gmail.com> wrote:

> Hi Patrick,
> You *don't* need to add a MatchAllDocs query to anything. If you just want
> all docs, just pass it to the searcher.search function and you'd get all
> results.
> MatchAllDocs query is the same as BooleanQuery , just that MADQ matches all
> docs in the index. You wouldn't need to specify anything there.
> The below would work and get you all the docs in the index as the result
> (provided you specify a limit high enough for the numDocs to match param)
>
> *Query query = new MatchAllDocsQuery();*
> *searcher.search(query.....);*
>
> Hope this clarifies your doubt.
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Wed, Mar 23, 2011 at 1:14 PM, Patrick Diviacco <
> patrick.diviacco@gmail.com> wrote:
>
> > The issue with
> >
> >
> > My confusion about MatchAllDocsQuery is that I cannot specify which terms
> > in
> > which fields to search with it. I'm probably wrong.
> >
> > I currently have a BooleanQuery, that I use to build the query with
> several
> > fields and several terms.
> >
> > Can I just pass MatchAllDocsQuery to BooleanQuery.add method in order to
> > add
> > all remaining docs ?
> >
> > thanks
> >
> >
> >
> >
> > On 22 March 2011 12:42, Anshum <anshumg@gmail.com> wrote:
> >
> > > MatchAllDocs does not consider only a single field but all fields i.e.
> it
> > > takes a *:* query.
> > >
> > > *1. *
> > > *****Snip****
> > > Query query = new MatchAllDocsQuery();
> > > TopDocs  td = is.search(query, ir.numDocs());
> > > ScoreDoc[ ] scoreDocs = td.scoreDocs;
> > > for(ScoreDoc scoreDoc:scoreDocs){
> > >
> > > ... Your code...
> > >
> > > }
> > > ****/Snip***
> > >
> > > *2. Collector approach:*
> > >
> > > Query query = new MatchAllDocsQuery();
> > > int hitsPerPage = ir.numDocs();
> > > TopScoreDocCollector collector =
> TopScoreDocCollector.create(hitsPerPage,
> > > true);
> > > is.search(query, collector);
> > > ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
> > > for(ScoreDoc scoreDoc:scoreDocs){
> > >
> > > .. Your code..
> > >
> > > }
> > >
> > > The better part about using a collector based approach is that you
> could
> > do
> > > a lot in the collect step i.e. everything that you'd do in the
> iteration
> > > post search could also be done while collecting here.
> > >
> > > Also, could you also tell me the exact case as to what is it that you
> are
> > > trying to achieve. You may have a completely different option that you
> > > haven't read which someone could advice if they know the exact intent.
> > >
> > > Hope this helps.
> > >
> > > --
> > > Anshum Gupta
> > > http://ai-cafe.blogspot.com
> > >
> > >
> > > On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
> > > patrick.diviacco@gmail.com> wrote:
> > >
> > > > 1. "all" docs
> > > >
> > > > 2. because matchalldocs only consider one field at once. I'm
> searching
> > > over
> > > > multiple fields instead.
> > > >
> > > > 3. could you tell me more about this ? It might be a solution!
> > > >
> > > >
> > > >
> > > > On 22 March 2011 12:18, Anshum <anshumg@gmail.com> wrote:
> > > >
> > > > > so a few things
> > > > > 1. are you looking to get 'all' documents or only docs matching
> your
> > > > query?
> > > > > 2. if its about fetching all docs, why not use the matchalldocs
> > query?
> > > > > 3. did you try using a collector instead of topdocs?
> > > > >
> > > > > --
> > > > > Anshum Gupta
> > > > > http://ai-cafe.blogspot.com
> > > > >
> > > > >
> > > > > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
> > > > > patrick.diviacco@gmail.com> wrote:
> > > > >
> > > > > > I don't think the link you suggested can help, but maybe I'm
> wrong.
> > > > > >
> > > > > > Also, the parameter MAX_HITS is not useful, it just limit the
> > > results,
> > > > it
> > > > > > doesn't add the not relevant docs.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 22 March 2011 12:10, Anshum <anshumg@gmail.com> wrote:
> > > > > >
> > > > > > > Hi Patrick,
> > > > > > > You may have a look at this, perhaps this will help you
with
> it.
> > > Let
> > > > me
> > > > > > > know
> > > > > > > if you're still stuck up.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Anshum Gupta
> > > > > > > http://ai-cafe.blogspot.com
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 22, 2011 at 4:10 PM, <karl.wright@nokia.com>
> wrote:
> > > > > > >
> > > > > > > > Not sure what your use case actually is, but it sounds
like
> you
> > > may
> > > > > be
> > > > > > > > unclear how Lucene works.
> > > > > > > >
> > > > > > > > Each query clause you have will produce an iterator
that
> walks
> > > over
> > > > > the
> > > > > > > > documents that match that clause.  All the documents
from the
> > > > entire,
> > > > > > > root
> > > > > > > > query get scored.  The scoring evaluation per document
is
> also
> > > > > related
> > > > > > to
> > > > > > > > the form of your query expression hierarchy.
> > > > > > > >
> > > > > > > > So, MatchAllDocsQuery is exactly what you want if
you want a
> > > > document
> > > > > > > > iterator that includes all documents in the index.
 You can
> > > change
> > > > > how
> > > > > > > this
> > > > > > > > is scored by extending MatchAllDocsQuery and writing
a custom
> > > > scorer.
> > > > > > > >
> > > > > > > > Karl
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: ext Patrick Diviacco [mailto:
> patrick.diviacco@gmail.com]
> > > > > > > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > > > > > > To: java-user@lucene.apache.org
> > > > > > > > Subject: how to get all documents in the results ?
> > > > > > > >
> > > > > > > > I'm using the following code because I want to see
the entire
> > > > > > collection
> > > > > > > in
> > > > > > > > my query results:
> > > > > > > >
> > > > > > > > //adding wildcards-term to see all results
> > > > > > > > rest = new TermQuery(new Term("*","*"));
> > > > > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > > > > > > >
> > > > > > > > But it doesn't work, I only see the relevant docs
and not all
> > the
> > > > > other
> > > > > > > > ones.
> > > > > > > > How can I get all documents ordered by relevance instead
?
> > > > > > > >
> > > > > > > > ps. MatchAllDocsQuery is not a solution because I
need to
> > specify
> > > > my
> > > > > > own
> > > > > > > > custom query.
> > > > > > > >
> > > > > > > >
> > > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > > > > > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message