lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Manu <>
Subject Feasability
Date Thu, 01 Dec 2016 01:54:10 GMT

I want to start off by saying that I am not a programmer...and have very little knowledge
in this area.

What I would like to know if Apache would be capable of doing the following:

Take an extensive list (A) of strings of unique words (these are titles - anywhere from 4
words to 30) saved in either an Excel worksheet or in a text file and search for instances
(B) where these can be found in PDF files saved on a hard drive (over 100k files). The search
would need to be done using a fuzzy logic rather than exact matching and the output would
be in an Excel file list the unique string found (A), the file name in which the match was
made (B), the page number where the match was made and the surrounding text on either side
of As well, would this be a complicated program, usable by novices coached in the process
necessary to input the title file (A) and direct the search to the relevant folder containing
the PDF files (B).

I eagerly await (hopefully) an affirmative answer.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message