lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Diviacco <patrick.divia...@gmail.com>
Subject Re: how to get all documents in the results ?
Date Wed, 23 Mar 2011 08:17:32 GMT
ok, yes, I've solved. Thanks for help.

On 23 March 2011 09:15, Anshum <anshumg@gmail.com> wrote:

> So functionally I am assuming you've achieved what you'd been aiming for.
> About the scores, the matchalldocs does score docs based on norm factors
> etc.
> therefore the score wouldn't be 0.
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Wed, Mar 23, 2011 at 1:38 PM, Patrick Diviacco <
> patrick.diviacco@gmail.com> wrote:
>
> > yeah it is clear. However I don't just want all documents, I still want
> to
> > perform my specific query and on the bottom of the relevant docs, to list
> > all not relevant docs as well. (I need this for successive steps).
> >
> > However now it seems to work. I've added MatchAllDocsQuery to my
> > BooleanQuery, and I get the original results plus all remaining docs on
> > bottom.
> >
> >  The only thing I don't understand is the fixed score of not relevant
> docs
> > (the ones appearing only now with MatchAllDocsQuery).
> >
> > This is the list (please ignore Time and Harvesine scores).
> >
> > 298736393 298736393 All-text score:9.775101 Time:1.0 Harvesine:1.0 true
> > 298736393 298736445 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
> > 298736393 298736527 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
> > 298736393 298736583 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> > true
> > 298736393 298736638 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> > true
> > 298736393 298736709 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> > true
> > 298736393 298736781 All-text score:3.1505027 Time:0.9999981
> > Harvesine:-5593.9307 true
> > 298736393 298736848 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> > true
> > 298736393 406412394 All-text score:0.62491775 Time:0.9983447
> > Harvesine:-5593.9307 false
> > 298736393 361767007 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767030 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767061 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767089 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767103 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767123 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767146 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 ?361492738 All-text score:0.13399276 Time:0.99982494
> > Harvesine:-5593.9307 false
> > 298736393 361492799 All-text score:0.13399276 Time:0.99982494
> > Harvesine:-5593.9307 false
> > 298736393 85899696 All-text score:0.061811533 Time:0.9680632
> > Harvesine:-2355.1875 false
> > 298736393 103368776 All-text score:0.032769617 Time:0.855137
> > Harvesine:-5593.9307 false
> > 298736393 101106927 All-text score:0.029139375 Time:0.8688775
> > Harvesine:-5593.9307 false
> > 298736393 80793153 All-text score:0.0018638512 Time:0.99774164
> > Harvesine:-2201.7686 false
> > 298736393 82661153 All-text score:0.0018638512 Time:0.98817354
> > Harvesine:-5593.9307 false
> > 298736393 84079583 All-text score:0.0018638512 Time:0.9796918
> > Harvesine:-5593.9307 false
> > 298736393 84079707 All-text score:0.0018638512 Time:0.9796918
> > Harvesine:-5593.9307 false
> > 298736393 84079911 All-text score:0.0018638512 Time:0.9796918
> > Harvesine:-5593.9307 false
> >
> >
> > On 23 March 2011 09:01, Anshum <anshumg@gmail.com> wrote:
> >
> > > Hi Patrick,
> > > You *don't* need to add a MatchAllDocs query to anything. If you just
> > want
> > > all docs, just pass it to the searcher.search function and you'd get
> all
> > > results.
> > > MatchAllDocs query is the same as BooleanQuery , just that MADQ matches
> > all
> > > docs in the index. You wouldn't need to specify anything there.
> > > The below would work and get you all the docs in the index as the
> result
> > > (provided you specify a limit high enough for the numDocs to match
> param)
> > >
> > > *Query query = new MatchAllDocsQuery();*
> > > *searcher.search(query.....);*
> > >
> > > Hope this clarifies your doubt.
> > >
> > > --
> > > Anshum Gupta
> > > http://ai-cafe.blogspot.com
> > >
> > >
> > > On Wed, Mar 23, 2011 at 1:14 PM, Patrick Diviacco <
> > > patrick.diviacco@gmail.com> wrote:
> > >
> > > > The issue with
> > > >
> > > >
> > > > My confusion about MatchAllDocsQuery is that I cannot specify which
> > terms
> > > > in
> > > > which fields to search with it. I'm probably wrong.
> > > >
> > > > I currently have a BooleanQuery, that I use to build the query with
> > > several
> > > > fields and several terms.
> > > >
> > > > Can I just pass MatchAllDocsQuery to BooleanQuery.add method in order
> > to
> > > > add
> > > > all remaining docs ?
> > > >
> > > > thanks
> > > >
> > > >
> > > >
> > > >
> > > > On 22 March 2011 12:42, Anshum <anshumg@gmail.com> wrote:
> > > >
> > > > > MatchAllDocs does not consider only a single field but all fields
> > i.e.
> > > it
> > > > > takes a *:* query.
> > > > >
> > > > > *1. *
> > > > > *****Snip****
> > > > > Query query = new MatchAllDocsQuery();
> > > > > TopDocs  td = is.search(query, ir.numDocs());
> > > > > ScoreDoc[ ] scoreDocs = td.scoreDocs;
> > > > > for(ScoreDoc scoreDoc:scoreDocs){
> > > > >
> > > > > ... Your code...
> > > > >
> > > > > }
> > > > > ****/Snip***
> > > > >
> > > > > *2. Collector approach:*
> > > > >
> > > > > Query query = new MatchAllDocsQuery();
> > > > > int hitsPerPage = ir.numDocs();
> > > > > TopScoreDocCollector collector =
> > > TopScoreDocCollector.create(hitsPerPage,
> > > > > true);
> > > > > is.search(query, collector);
> > > > > ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
> > > > > for(ScoreDoc scoreDoc:scoreDocs){
> > > > >
> > > > > .. Your code..
> > > > >
> > > > > }
> > > > >
> > > > > The better part about using a collector based approach is that you
> > > could
> > > > do
> > > > > a lot in the collect step i.e. everything that you'd do in the
> > > iteration
> > > > > post search could also be done while collecting here.
> > > > >
> > > > > Also, could you also tell me the exact case as to what is it that
> you
> > > are
> > > > > trying to achieve. You may have a completely different option that
> > you
> > > > > haven't read which someone could advice if they know the exact
> > intent.
> > > > >
> > > > > Hope this helps.
> > > > >
> > > > > --
> > > > > Anshum Gupta
> > > > > http://ai-cafe.blogspot.com
> > > > >
> > > > >
> > > > > On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
> > > > > patrick.diviacco@gmail.com> wrote:
> > > > >
> > > > > > 1. "all" docs
> > > > > >
> > > > > > 2. because matchalldocs only consider one field at once. I'm
> > > searching
> > > > > over
> > > > > > multiple fields instead.
> > > > > >
> > > > > > 3. could you tell me more about this ? It might be a solution!
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 22 March 2011 12:18, Anshum <anshumg@gmail.com> wrote:
> > > > > >
> > > > > > > so a few things
> > > > > > > 1. are you looking to get 'all' documents or only docs
matching
> > > your
> > > > > > query?
> > > > > > > 2. if its about fetching all docs, why not use the matchalldocs
> > > > query?
> > > > > > > 3. did you try using a collector instead of topdocs?
> > > > > > >
> > > > > > > --
> > > > > > > Anshum Gupta
> > > > > > > http://ai-cafe.blogspot.com
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
> > > > > > > patrick.diviacco@gmail.com> wrote:
> > > > > > >
> > > > > > > > I don't think the link you suggested can help, but
maybe I'm
> > > wrong.
> > > > > > > >
> > > > > > > > Also, the parameter MAX_HITS is not useful, it just
limit the
> > > > > results,
> > > > > > it
> > > > > > > > doesn't add the not relevant docs.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 22 March 2011 12:10, Anshum <anshumg@gmail.com>
wrote:
> > > > > > > >
> > > > > > > > > Hi Patrick,
> > > > > > > > > You may have a look at this, perhaps this will
help you
> with
> > > it.
> > > > > Let
> > > > > > me
> > > > > > > > > know
> > > > > > > > > if you're still stuck up.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Anshum Gupta
> > > > > > > > > http://ai-cafe.blogspot.com
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 22, 2011 at 4:10 PM, <karl.wright@nokia.com>
> > > wrote:
> > > > > > > > >
> > > > > > > > > > Not sure what your use case actually is,
but it sounds
> like
> > > you
> > > > > may
> > > > > > > be
> > > > > > > > > > unclear how Lucene works.
> > > > > > > > > >
> > > > > > > > > > Each query clause you have will produce
an iterator that
> > > walks
> > > > > over
> > > > > > > the
> > > > > > > > > > documents that match that clause.  All the
documents from
> > the
> > > > > > entire,
> > > > > > > > > root
> > > > > > > > > > query get scored.  The scoring evaluation
per document is
> > > also
> > > > > > > related
> > > > > > > > to
> > > > > > > > > > the form of your query expression hierarchy.
> > > > > > > > > >
> > > > > > > > > > So, MatchAllDocsQuery is exactly what you
want if you
> want
> > a
> > > > > > document
> > > > > > > > > > iterator that includes all documents in
the index.  You
> can
> > > > > change
> > > > > > > how
> > > > > > > > > this
> > > > > > > > > > is scored by extending MatchAllDocsQuery
and writing a
> > custom
> > > > > > scorer.
> > > > > > > > > >
> > > > > > > > > > Karl
> > > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: ext Patrick Diviacco [mailto:
> > > patrick.diviacco@gmail.com]
> > > > > > > > > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > > > > > > > > To: java-user@lucene.apache.org
> > > > > > > > > > Subject: how to get all documents in the
results ?
> > > > > > > > > >
> > > > > > > > > > I'm using the following code because I want
to see the
> > entire
> > > > > > > > collection
> > > > > > > > > in
> > > > > > > > > > my query results:
> > > > > > > > > >
> > > > > > > > > > //adding wildcards-term to see all results
> > > > > > > > > > rest = new TermQuery(new Term("*","*"));
> > > > > > > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > > > > > > > > >
> > > > > > > > > > But it doesn't work, I only see the relevant
docs and not
> > all
> > > > the
> > > > > > > other
> > > > > > > > > > ones.
> > > > > > > > > > How can I get all documents ordered by relevance
instead
> ?
> > > > > > > > > >
> > > > > > > > > > ps. MatchAllDocsQuery is not a solution
because I need to
> > > > specify
> > > > > > my
> > > > > > > > own
> > > > > > > > > > custom query.
> > > > > > > > > >
> > > > > > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > > > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > > > For additional commands, e-mail:
> > > > > java-user-help@lucene.apache.org
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message