lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ranganath B N <>
Subject Ways to store and search tens of billions of text document content in one lucene index
Date Fri, 23 Jun 2017 06:24:56 GMT

    Two ways I know to accomplish this is   1, append several text  documents (say 100)  
along with the text document number using demarcators  for them  in  to one field of lucene
document  and    search using SpanNearQuery for   search terms  to retrieve the matching text
 documents.  But if the matches is  in the order of tens of thousands of lucene documents,
 time taken will be more  because I think spanNearQuery  parses  the  concatenated text  documents
 in a  field   to  find the position of the matching span.

2, Instead of concatenating different  text documents,   create one field for each text document
(100 fields) and the corresponding text document numbers in another 100 fields      for a
lucene document  and try to search the  "search terms"   using  DisjunctionmaxQuery  consisting
of 100  Boolean queries  for  each of the  100 text fields. Then use
Explanation object  to  find  the  matching  text  documents  from   hit documents. But again
lets say   there are 10000 lucene document matches,   I need to execute 10,000 *100=10lakh
explanation.ismatch() methods which again takes
Lot of time.

What strategies do you recommend  for this task  "Ways to store  and search  tens of billions
of  text document content in one lucene index"?  so that I can accomplish this in optimal

Ranganath B. N.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message