incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew S. Townley" <...@atownley.org>
Subject Re: [lucy-user] Feature question about Lucy vs. Ferret
Date Sat, 26 Feb 2011 13:35:44 GMT

On 25 Feb 2011, at 8:03 PM, Nathan Kurz wrote:

> On Fri, Feb 25, 2011 at 8:12 AM, Marvin Humphrey <marvin@rectangular.com> wrote:
>>  At the end of a search, you will only have documents and scores --
>> not sophisticated metadata about what part of the subquery matched and what
>> parts didn't and how much each matching part contributed to the score.
>> Keeping track of such metadata during the matching phase would be
>> prohibitively expensive.
> 
> It's only prohibitive if you don't need that data.  If actually need
> it (as Andrew seems to), and are going to do it in post-processing
> anyway, it's just the cost of doing business.

Exactly.  Fulltext is only one of several indexing/searching mechanisms I have, and 99% of
the time, the only reason I'm going to use the fulltext index is to display the results to
humans.  Thanks to Web search engines, users have certain expectations of being able to see
the highlighted matches, so that's the standard use case I have.

I'm happy to have the option of setting a :your_performance_will_suck_do_you_really_want_to_do_this
flag to :yes_damnit in order to get the results I want, but I'd prefer to have the API for
dealing with the results as straightforward as possible -- oh, yeah, and I'll hardly ever
be storing the information being queried in the index itself as I've already got a place for
it to live, and it needs to be available to the other indexing methods too.

> My kick has been about making it easy to swap in non-TF/IDF scorers.
> I think part of doing so will be adding greater room for scratch data
> to Hits returned. My canonical example is that I want to to be
> possible to do alphabetical sorting of Hits by a category field.   At
> some point you need a collector that can see field values, which if
> you squint right is just a special case of what Andrew wants.
> 
> While I can see that argument that this is traditionally not the way
> that TF/IDF systems work, it's this potential for search/database
> hybridization that makes Lucy so attractive to me.


Not knowing that much about TF/IDF systems, all I can agree with is the part about the fulltext/other
indexing hybrid approach being an essential part of information management in the future.

People thought things like Datablades/ORDBMS didn't make sense in RDBMS systems either until
vendors proved that you could essentially have your cake and eat it too from a performance
and flexibility perspective.  I see this as just part of the evolution of search technology
on the basis of realizations that the closed-world view of systems is a vestige of the past.
 Given the potential for Lucy given its approach, it seems sensible to try and design for
the future here too.

Again, thanks for all the discussion and information.

Cheers,

ast
--
Andrew S. Townley <ast@atownley.org>
http://atownley.org


Mime
View raw message