incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew S. Townley" <...@atownley.org>
Subject Re: [lucy-user] Feature question about Lucy vs. Ferret
Date Wed, 23 Feb 2011 17:14:49 GMT
Hi Peter,

Thanks for the quick reply!

On 23 Feb 2011, at 4:49 PM, Peter Karman wrote:

> Hi Andrew,
> 
> Andrew S. Townley wrote on 02/23/2011 09:51 AM:
> 
>> 
>> 1) Can lucy store multi-field "documents" a la Ferret?
> 
> Yes.

Great.

>> 
>> 2) Can lucy give me the match result information I'm looking for within each document
as part of the search hit information?
>> 
> 
> For highlighting and snippet extraction? yes.

Well, actually, I want it for more than that.  For my particular needs, I need to get the
field name where the match occurred in the document, and then I'd ideally like to have the
start offset into that field and the length of the match.

This is the core information I can't get right now from Ferret.  For example (nevermind about
the accuracy of the information here ;):

valkyrie$ irb
>> require 'ferret'
=> true
>> include Ferret
=> Object
>> index = Index::Index.new
=> #<Ferret::Index::Index:0x101342108 @default_input_field=:id, @mon_waiting_queue=[],
@qp=nil, @default_field=:*, @key=nil, @auto_flush=false, @mon_entering_queue=[], @open=true,
@dir=#<Ferret::Store::RAMDirectory:0x101342068>, @mon_count=0, @id_field=:id, @reader=nil,
@searcher=nil, @close_dir=true, @mon_owner=nil, @writer=nil, @options={:dir=>#<Ferret::Store::RAMDirectory:0x101342068>,
:analyzer=>#<Ferret::Analysis::StandardAnalyzer:0x101341e60>, :lock_retry_time=>2,
:default_field=>:*}>
>> index << {:title => "Fred flinstone", :description => "The cartoon series"
}
=> nil
>> index << {:title => "The Flinstones", :description => "Fred flinstone's
family" }
=> nil
>> index.search("flinstones")
=> #<struct Ferret::Search::TopDocs total_hits=1, hits=[#<struct Ferret::Search::Hit
doc=1, score=0.254271149635315>], max_score=0.254271149635315, searcher=#<Ferret::Search::Searcher:0x101314e60>>

The Ferret::Search::Hit gives me the document number and the score, but that's it.  In whatever
list format the results are actually in, I'd also like to have the information I mentioned.
 If you weren't storing the offset information, then it would make sense for it not to be
available, but if you were, then I'd expect to have the whole thing right there.  I can't
see how there'd be a performance issue in providing this information.

I just want to make sure we're on the same page, as this is a critical feature for what I'm
trying to do.

> 
>> 3) How would you relate the completeness/stability of the core C library and Ruby
bindings?
>> 
> 
> Alas, here's the rub. There are no Ruby bindings at present. The core C
> code is stable and "complete" (for some value of "complete" -- i.e. it
> works). But to date there are only Perl bindings.
> 
> I posted about this on the Ferret list awhile back, inviting Ruby
> developers to come have a look and help jump-start the Ruby
> implemenation. I realize your project has some immediate needs; please
> also consider hanging around and helping us define the Ruby
> implementation. Subscribe to lucy-dev to get started.


Thanks for the information.  Unfortunate.  Thanks for the offer to help out though.  It might
be a while before I have any bandwidth, but depending on how things go, lucy might be the
best long-term solution.

In digging around Google in the interim between now and my original note, I re-read the charter
for Lucy.  One of the things that struck me was the "implementing as much functionality in
high-level languages as possible" comment.  What does this mean, exactly?

Part of the reason I ask has to do with the future of my own project.  Much of what I have
now will eventually be rewritten piecemeal in C++ and then wrapped via SWIG so I can have
Ruby and Java bindings as well as use it in other environments natively supporting C/C++.
 Whatever route I end up going for fulltext, this is something that would need to support
the same kind of thing as I'd actually be leveraging it more from the C++ code than the Ruby
code.

With the way the statement above is phrased, it seems like this wouldn't really be possible.
 It also seems like there might be an awful lot of duplication of effort involved in actually
creating each language binding.  Why was this approach chosen rather than put all the muscle
in the C code and provide thin wrappers--even via SWIG or something more hand-tailored where
necessary/appropriate?

I tried to dig through the lucy SVN repository via the web UI, but I couldn't really figure
out what's there.  The code generator framework you're using is something I haven't seen before,
but at least it explains why I couldn't find the Ruby bindings! :)

Anyway, thanks for the answers.  Presently tinkering with the Ferret internals since it seems
like there ought to be a way to expose what I want (it's in the explain output), but there's
a lot of code, and I'm certainly no search engine expert!

Cheers,

ast
--
Andrew S. Townley <ast@atownley.org>
http://atownley.org


Mime
View raw message