lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Branham <>
Subject Re: Feasability
Date Thu, 01 Dec 2016 13:43:53 GMT
Someone like this maybe?


Sent from my Sprint Phone.

------ Original message------
From: Chris Manu
Date: Wed, Nov 30, 2016 9:33 PM
Subject:Re: Feasability

Thank you for responding. So, theoretically, I would need to hire someone with Apache programing
experience to do this correct (given that I know nothing about programing)? What type of experience
should I look for?

From: Xavier Morera <>
Sent: December 1, 2016 2:23 AM
Subject: Re: Feasability

The answer is yes, but you would need to do some programming and

On Wed, Nov 30, 2016 at 7:54 PM, Chris Manu <> wrote:

> Hello,
> I want to start off by saying that I am not a programmer...and have very
> little knowledge in this area.
> What I would like to know if Apache would be capable of doing the
> following:
> Take an extensive list (A) of strings of unique words (these are titles -
> anywhere from 4 words to 30) saved in either an Excel worksheet or in a
> text file and search for instances (B) where these can be found in PDF
> files saved on a hard drive (over 100k files). The search would need to be
> done using a fuzzy logic rather than exact matching and the output would be
> in an Excel file list the unique string found (A), the file name in which
> the match was made (B), the page number where the match was made and the
> surrounding text on either side of As well, would this be a complicated
> program, usable by novices coached in the process necessary to input the
> title file (A) and direct the search to the relevant folder containing the
> PDF files (B).
> I eagerly await (hopefully) an affirmative answer.
> Cheers!


*Xavier Morera*

Entrepreneur | Author & Trainer | Consultant | Developer & Scrum Master

* <>*

Xavier Morera<><>
I have been working with Solr for a while, mainly from the .NET world and I basically love
it. I use SolrNet which I think it is a very mature and stable library.

office:  (305) 600-4919

cel:     +506 8849-8866

skype: xmorera
Twitter <> | LinkedIn

xmorera (@xmorera) | Twitter<>
The latest Tweets from xmorera (@xmorera). Eternal optimist, entrepreneur, lifelong learner,
passionate about technology. Costa Rica

<> | Pluralsight Author

Xavier Morera | LinkedIn<><>
Xavier Morera is an entrepreneur, project manager, Pluralsight author, speaker, trainer, Certified
Scrum Master & Professional and Certified Microsoft professional ...

Xavier Morera - .Net Author | Pluralsight<><>
Xavier is an entrepreneur, project manager, technical author, trainer, Certified Scrum Professional
& Scrum Master, and Certified Microsoft Professional.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message