Return-Path: X-Original-To: apmail-incubator-lucy-user-archive@www.apache.org Delivered-To: apmail-incubator-lucy-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9F6FB9544 for ; Wed, 8 Feb 2012 03:08:55 +0000 (UTC) Received: (qmail 49657 invoked by uid 500); 8 Feb 2012 03:08:55 -0000 Delivered-To: apmail-incubator-lucy-user-archive@incubator.apache.org Received: (qmail 49542 invoked by uid 500); 8 Feb 2012 03:08:40 -0000 Mailing-List: contact lucy-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-user@incubator.apache.org Delivered-To: mailing list lucy-user@incubator.apache.org Received: (qmail 49534 invoked by uid 99); 8 Feb 2012 03:08:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Feb 2012 03:08:37 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [76.96.62.16] (HELO qmta01.westchester.pa.mail.comcast.net) (76.96.62.16) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Feb 2012 03:08:29 +0000 Received: from omta18.westchester.pa.mail.comcast.net ([76.96.62.90]) by qmta01.westchester.pa.mail.comcast.net with comcast id XEU71i0011wpRvQ51F89QC; Wed, 08 Feb 2012 03:08:09 +0000 Received: from pekmac.local ([75.72.172.130]) by omta18.westchester.pa.mail.comcast.net with comcast id XF891i00K2p9gCz3eF89bo; Wed, 08 Feb 2012 03:08:09 +0000 Message-ID: <4F31E718.3070407@peknet.com> Date: Tue, 07 Feb 2012 21:08:08 -0600 From: Peter Karman Reply-To: peter@peknet.com User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: lucy-user@incubator.apache.org References: <155F7CAA7ADA443384A36CCADB09D5BD@UlissesGomesPC> In-Reply-To: <155F7CAA7ADA443384A36CCADB09D5BD@UlissesGomesPC> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [lucy-user] Complex search Odisseu21 wrote on 2/3/12 8:56 AM: > I am new in Lucy and looking for fast and elegant, search solutions that are able to: > > - return an excerpt, HTML highlighted, around the MASTER_KEY_WORD > - MASTER_KEY_WORD could be matched partial or not > - must be possible define the size of excerpt (before and after the MASTER_KEY_WORD, maybe in terms of number of words or lines) > - optional keywords, called INC_KEY_WORD, must be present, inside the excerpt, no matter the order > - optional keywords, called EXC_KEY_WORD, must not be present, inside the excerpt, no matter the order > - combinations of INC_KEY_WORD and EXC_KEY_WORD are possible > > Example: > apple (partial) -> MASTER_KEY_WORD > + (bag + blue, girl) -> INC_KEY_WORD combo > - (black+ man, orange) -> EXC_KEY_WORD combo > > must return excerpts that the string 'apple' exists (apple, apples, applebees, ...) > and ('bag' AND 'blue') or 'girl' > but not ('black' AND 'man') or 'orange' surrounding the master keyword 'apple' > > Today we are using Postgres queries and some Perl code to do that in millions of docs. We have a good performance, for now. > > Is it possible to build such algorithm using Lucy? Fast an easy, in one step? > Or maybe Lucy will be used just to retrieve the excerpt surroundig the master key word with subsequent Perl code to apply the rest? you can do most of the above with Lucy, though not in one step. Some post-processing for the INC_ and EXC_ key words would be necessary. I use Search::Tools plus Lucy for this kind of thing, since Search::Tools will let me highlight and excerpt from the original document as well. -- Peter Karman . http://peknet.com/ . peter@peknet.com