lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Diviacco <patrick.divia...@gmail.com>
Subject Re: how to get all documents in the results ?
Date Wed, 23 Mar 2011 08:02:18 GMT
Sorry for spam. I've actually added the following line:

booleanQuery.add(new MatchAllDocsQuery(), BooleanClause.Occur.SHOULD);

and now I get all collection back. So it seems to work...
I'm a bit confused about the score of the not relevant docs:

i.e. the last result is
298736393 107419924 All-text score:0.0018638512

I was expecting the score to be 0, instead.

thanks




On 23 March 2011 08:44, Patrick Diviacco <patrick.diviacco@gmail.com> wrote:

> The issue with
>
>
> My confusion about MatchAllDocsQuery is that I cannot specify which terms
> in which fields to search with it. I'm probably wrong.
>
> I currently have a BooleanQuery, that I use to build the query with several
> fields and several terms.
>
> Can I just pass MatchAllDocsQuery to BooleanQuery.add method in order to
> add all remaining docs ?
>
> thanks
>
>
>
>
> On 22 March 2011 12:42, Anshum <anshumg@gmail.com> wrote:
>
>> MatchAllDocs does not consider only a single field but all fields i.e. it
>> takes a *:* query.
>>
>> *1. *
>> *****Snip****
>> Query query = new MatchAllDocsQuery();
>> TopDocs  td = is.search(query, ir.numDocs());
>> ScoreDoc[ ] scoreDocs = td.scoreDocs;
>> for(ScoreDoc scoreDoc:scoreDocs){
>>
>> ... Your code...
>>
>> }
>> ****/Snip***
>>
>> *2. Collector approach:*
>>
>> Query query = new MatchAllDocsQuery();
>> int hitsPerPage = ir.numDocs();
>> TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage,
>> true);
>> is.search(query, collector);
>> ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
>> for(ScoreDoc scoreDoc:scoreDocs){
>>
>> .. Your code..
>>
>> }
>>
>> The better part about using a collector based approach is that you could
>> do
>> a lot in the collect step i.e. everything that you'd do in the iteration
>> post search could also be done while collecting here.
>>
>> Also, could you also tell me the exact case as to what is it that you are
>> trying to achieve. You may have a completely different option that you
>> haven't read which someone could advice if they know the exact intent.
>>
>> Hope this helps.
>>
>> --
>> Anshum Gupta
>> http://ai-cafe.blogspot.com
>>
>>
>> On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
>> patrick.diviacco@gmail.com> wrote:
>>
>> > 1. "all" docs
>> >
>> > 2. because matchalldocs only consider one field at once. I'm searching
>> over
>> > multiple fields instead.
>> >
>> > 3. could you tell me more about this ? It might be a solution!
>> >
>> >
>> >
>> > On 22 March 2011 12:18, Anshum <anshumg@gmail.com> wrote:
>> >
>> > > so a few things
>> > > 1. are you looking to get 'all' documents or only docs matching your
>> > query?
>> > > 2. if its about fetching all docs, why not use the matchalldocs query?
>> > > 3. did you try using a collector instead of topdocs?
>> > >
>> > > --
>> > > Anshum Gupta
>> > > http://ai-cafe.blogspot.com
>> > >
>> > >
>> > > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
>> > > patrick.diviacco@gmail.com> wrote:
>> > >
>> > > > I don't think the link you suggested can help, but maybe I'm wrong.
>> > > >
>> > > > Also, the parameter MAX_HITS is not useful, it just limit the
>> results,
>> > it
>> > > > doesn't add the not relevant docs.
>> > > >
>> > > >
>> > > >
>> > > > On 22 March 2011 12:10, Anshum <anshumg@gmail.com> wrote:
>> > > >
>> > > > > Hi Patrick,
>> > > > > You may have a look at this, perhaps this will help you with
it.
>> Let
>> > me
>> > > > > know
>> > > > > if you're still stuck up.
>> > > > >
>> > > >
>> > >
>> >
>> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Anshum Gupta
>> > > > > http://ai-cafe.blogspot.com
>> > > > >
>> > > > >
>> > > > > On Tue, Mar 22, 2011 at 4:10 PM, <karl.wright@nokia.com>
wrote:
>> > > > >
>> > > > > > Not sure what your use case actually is, but it sounds like
you
>> may
>> > > be
>> > > > > > unclear how Lucene works.
>> > > > > >
>> > > > > > Each query clause you have will produce an iterator that
walks
>> over
>> > > the
>> > > > > > documents that match that clause.  All the documents from
the
>> > entire,
>> > > > > root
>> > > > > > query get scored.  The scoring evaluation per document is
also
>> > > related
>> > > > to
>> > > > > > the form of your query expression hierarchy.
>> > > > > >
>> > > > > > So, MatchAllDocsQuery is exactly what you want if you want
a
>> > document
>> > > > > > iterator that includes all documents in the index.  You
can
>> change
>> > > how
>> > > > > this
>> > > > > > is scored by extending MatchAllDocsQuery and writing a custom
>> > scorer.
>> > > > > >
>> > > > > > Karl
>> > > > > >
>> > > > > > -----Original Message-----
>> > > > > > From: ext Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
>> > > > > > Sent: Tuesday, March 22, 2011 4:23 AM
>> > > > > > To: java-user@lucene.apache.org
>> > > > > > Subject: how to get all documents in the results ?
>> > > > > >
>> > > > > > I'm using the following code because I want to see the entire
>> > > > collection
>> > > > > in
>> > > > > > my query results:
>> > > > > >
>> > > > > > //adding wildcards-term to see all results
>> > > > > > rest = new TermQuery(new Term("*","*"));
>> > > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
>> > > > > >
>> > > > > > But it doesn't work, I only see the relevant docs and not
all
>> the
>> > > other
>> > > > > > ones.
>> > > > > > How can I get all documents ordered by relevance instead
?
>> > > > > >
>> > > > > > ps. MatchAllDocsQuery is not a solution because I need to
>> specify
>> > my
>> > > > own
>> > > > > > custom query.
>> > > > > >
>> > > > > >
>> > ---------------------------------------------------------------------
>> > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > > > > For additional commands, e-mail:
>> java-user-help@lucene.apache.org
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message