lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Ma <>
Subject potential query performance issue
Date Fri, 15 Mar 2013 17:09:07 GMT
Hello guys,

Supposing I have one million documents, and each document has hundreds of
features. For a given query, it also has hundreds of features. I want to
fetch most relevant top 1000 documents by dot product related features of
query and documents (query/document features are in the same feature space).

I am not sure how Lucene implement internally? If we have to go through all
one million document to dot product the query, then I am concerning about
the performance. Appreciate if anyone could confirm (1) how Lucene works
internally for this use case (2) any smart ideas to make improvement for
query efficiency to select top 1000 documents?

thanks in advance,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message