lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jian chen <chenjian1...@gmail.com>
Subject Re: Large queries
Date Sun, 16 Oct 2005 18:08:08 GMT
Hi, Trond,

It should be no problem for Lucene to handle 6 million documents.

For your query, it seems you want to do a disjunctive (or'ed) query for
multiple terms, 10 terms or 10000 terms for example. The worst case I can
think of is, you can very easily write your own query class to handle this,
utilizing the TermDocs iterator class.

Say, if you want to have one of 10 docID's, you have 10 TermDocs. Each
TermDoc corresponds to a term. Then, you can do a multi-way (in this case,
10 way) merge of these 10 TermDocs, and generate a final list of the doc
ids.

I suggest that you can look at PhraseQuery and PhraseScrorer to see how it
does the conjunctive merge to find the docs that contains all the terms. In
your case, instead of doing intersection, you are doing a union of all the
term docs, right?

Maybe there is already some query class that comes with the Lucene package
that does this. However, the method I described should also help just in
case.

Cheers,

Jian


On 10/16/05, Trond Aksel Myklebust <tamyk@online.no> wrote:
>
> How is Lucene handling very large queries? I have 6million documents,
> which
> each has a "docID" field. There is a total of 20000 distinct docID's, so
> many documents got the same docID which consists of a filename (only name,
> not path).
>
> Sometimes, I must get all documents that has one of 10 docID's, and
> sometimes I need to get all documents that has one of 10000 docIDs. Is
> there
> any other way than doing a query: docID:(file1 file2 file3 file4..) ?
>
>
>
> Trond A Myklebust
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message