Return-Path: X-Original-To: apmail-lucene-general-archive@www.apache.org Delivered-To: apmail-lucene-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 139FF175ED for ; Fri, 7 Nov 2014 16:36:49 +0000 (UTC) Received: (qmail 95099 invoked by uid 500); 7 Nov 2014 16:36:48 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 95038 invoked by uid 500); 7 Nov 2014 16:36:48 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 95026 invoked by uid 99); 7 Nov 2014 16:36:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Nov 2014 16:36:48 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chrismanu90@hotmail.com designates 65.54.190.221 as permitted sender) Received: from [65.54.190.221] (HELO BAY004-OMC4S19.hotmail.com) (65.54.190.221) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Nov 2014 16:36:41 +0000 Received: from BAY174-W35 ([65.54.190.200]) by BAY004-OMC4S19.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.22751); Fri, 7 Nov 2014 08:36:21 -0800 X-TMN: [JA9y65lBcY+4TEKh/dzNiG+nUem+zr8w] X-Originating-Email: [chrismanu90@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_4a58d651-4a34-4b48-ab76-779a0ccb2c02_" From: Chris Manu To: "general@lucene.apache.org" Subject: An feasibility question Date: Fri, 7 Nov 2014 16:36:21 +0000 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 07 Nov 2014 16:36:21.0224 (UTC) FILETIME=[F6641E80:01CFFAA8] X-Virus-Checked: Checked by ClamAV on apache.org --_4a58d651-4a34-4b48-ab76-779a0ccb2c02_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable =0A= =0A= Hello=2C=0A= =0A= I apologize for taking your time=2C but I am not trained in=0A= this area=2C but someone suggested that this software could do want I need= =0A= completed=2C and I would like to enquire as to whether it can.=0A= =0A= =0A= =0A= I require matching a series of titles (currently over 40k)=0A= contained in individual cells in a worksheet with the contents of rich=0A= documents (i.e. Word=2C PDF). The searching process would need to be automa= ted=2C=0A= since there will be several thousand titles and numerous documents. The=0A= matching would be "fuzzy" since there may be some variation in=0A= punctuation=2C or a misuse of a preposition.=0A= =0A= =0A= =0A= The software would record the relevance of any match (i.e. a=0A= percentage score)=2C as well as the names of the documents and the page num= bers=0A= where the matches were found. This information would be saved in a format t= hat=0A= could be opened by Excel. Since there is likely to be multiple matches in t= he=0A= same document or across documents=2C each match for each title would have i= ts own=0A= row.=0A= =0A= =0A= =0A= =0A= =0A= I will appreciate your assistance and I look forward to your=0A= reply.=0A= =0A= =0A= =0A= =0A= =0A= Cheers!=0A= =0A= = --_4a58d651-4a34-4b48-ab76-779a0ccb2c02_--