lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Manu <>
Subject An feasibility question
Date Fri, 07 Nov 2014 16:36:21 GMT


 I apologize for taking your time, but I am not trained in
this area, but someone suggested that this software could do want I need
completed, and I would like to enquire as to whether it can.


 I require matching a series of titles (currently over 40k)
contained in individual cells in a worksheet with the contents of rich
documents (i.e. Word, PDF). The searching process would need to be automated,
since there will be several thousand titles and numerous documents. The
matching would be "fuzzy" since there may be some variation in
punctuation, or a misuse of a preposition.


The software would record the relevance of any match (i.e. a
percentage score), as well as the names of the documents and the page numbers
where the matches were found. This information would be saved in a format that
could be opened by Excel. Since there is likely to be multiple matches in the
same document or across documents, each match for each title would have its own


I will appreciate your assistance and I look forward to your



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message