lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xavier Morera <xav...@familiamorera.com>
Subject Re: Feasability
Date Thu, 01 Dec 2016 02:23:00 GMT
The answer is yes, but you would need to do some programming and
configuring.

On Wed, Nov 30, 2016 at 7:54 PM, Chris Manu <chrismanu90@hotmail.com> wrote:

> Hello,
>
>
> I want to start off by saying that I am not a programmer...and have very
> little knowledge in this area.
>
>
> What I would like to know if Apache would be capable of doing the
> following:
>
> Take an extensive list (A) of strings of unique words (these are titles -
> anywhere from 4 words to 30) saved in either an Excel worksheet or in a
> text file and search for instances (B) where these can be found in PDF
> files saved on a hard drive (over 100k files). The search would need to be
> done using a fuzzy logic rather than exact matching and the output would be
> in an Excel file list the unique string found (A), the file name in which
> the match was made (B), the page number where the match was made and the
> surrounding text on either side of As well, would this be a complicated
> program, usable by novices coached in the process necessary to input the
> title file (A) and direct the search to the relevant folder containing the
> PDF files (B).
>
>
> I eagerly await (hopefully) an affirmative answer.
>
>
> Cheers!
>
>


-- 

*Xavier Morera*

Entrepreneur | Author & Trainer | Consultant | Developer & Scrum Master

*www.xaviermorera.com <http://www.xaviermorera.com/>*

office:  (305) 600-4919

cel:     +506 8849-8866

skype: xmorera
Twitter <https://twitter.com/xmorera> | LinkedIn
<https://www.linkedin.com/in/xmorera> | Pluralsight Author
<http://www.pluralsight.com/author/xavier-morera>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message