incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Odisseu21" <odisse...@gmail.com>
Subject [lucy-user] Complex search
Date Fri, 03 Feb 2012 14:56:59 GMT
I am new in Lucy and looking for fast and elegant, search solutions that are able to:

- return an excerpt, HTML highlighted, around the MASTER_KEY_WORD
- MASTER_KEY_WORD could be matched partial or not
- must be possible define the size of excerpt (before and after the MASTER_KEY_WORD, maybe
in terms of number of words or lines)
- optional keywords, called INC_KEY_WORD, must be present, inside the excerpt, no matter the
order
- optional keywords, called EXC_KEY_WORD, must not be present, inside the excerpt, no matter
the order
- combinations of INC_KEY_WORD and EXC_KEY_WORD are possible

Example: 
              apple (partial)                -> MASTER_KEY_WORD
              + (bag + blue, girl)         -> INC_KEY_WORD combo
              -  (black+ man, orange)  -> EXC_KEY_WORD combo

must return excerpts that the string 'apple' exists (apple, apples, applebees, ...)
and ('bag' AND 'blue') or 'girl'
but not ('black' AND 'man') or 'orange' surrounding the master keyword 'apple'

Today we are using Postgres queries and some Perl code to do that in millions of docs. We
have a good performance, for now.

Is it possible to build such algorithm using Lucy? Fast an easy, in one step?
Or maybe Lucy will be used just to retrieve the excerpt surroundig the master key word with
subsequent Perl code to apply the rest?

thanks,
odisseu21
-----------------------------
"...Sed ut perspiciatis unde omnis iste natus error **EXC_KEY_WORD1** voluptatem accusantium
doloremque laudantium,
totam rem aperiam, eaque **INC_KEY_WORD3** quae ab illo inventore EXC_KEY_WORD2 et quasi architecto
beatae vitae dicta
sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit,
sed quia **INC_KEY_WORD2** magni **MASTER_KEY_WORD** eos qui **INC_KEY_WORD1** voluptatem
sequi nesciunt.
Neque porro quisquam est, qui dolorem ipsum quia dolor sit **EXC_KEY_WORD3**, consectetur,
adipisci velit,
sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat
voluptatem..."
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message