incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Balmain" <>
Subject Re: Charmonizer
Date Thu, 19 Oct 2006 07:18:03 GMT
On 10/19/06, Marvin Humphrey <> wrote:
> On Oct 18, 2006, at 8:03 PM, David Balmain wrote:
> > For Ruby I can use the make alternative rake. But I'm thinking about
> > Ferret at the moment.
> Forgive me, I don't understand why you make the distinction in that
> sentence between "Ruby" and "Ferret".  Is there a reason you could
> use rake with Lucy but not Ferret?

Sorry, I definitely wasn't very clear. I just don't want the staight C
code in Ferret to have a dependency on Ruby. As far as building Ferret
with Ruby bindings goes, I already use rake so there is no problem

> >> I can spec extra flags to CBuilder's compile() function if turns out
> >> to be necessary.  However, CBuilder, by default, passes the same set
> >> of flags that were used when compiling the Perl executable (which are
> >> archived, along with a zillion other settings from Perl's Configure
> >> script, in the Config module).  On a RedHat 9 box I have access to,
> >> those flags include -D_LARGEFILE_SOURCE and -D_FILE_OFFSET_BITS=64,
> >> and I'm assuming that other Perl installations where LFS isn't the OS
> >> default also spec flags rather than defining macros within individual
> >> source files.
> >
> > Unfortunately these values are defined as macros in Ruby.
> Could we build a custom Charmonizer probe for Ruby then?
> static char ruby_largefiles_code[] = METAQUOTE
>      #include "ruby.h" /* or whatever the file is */
>      #include "_charm.h";
>      int main() {
>          Charm_Setup;
>          printf("%d", (int)sizeof(off_t));
>          return 0;
>      }

Good idea but I think I'll have to work on this.

#include <stdio.h>
#include "ruby.h" /* or whatever the file is */
int main() {
    printf("%d\n", _FILE_OFFSET_BITS);
    printf("%d\n", (int)sizeof(off_t));
    return 0;



:( I guess we could just check whether _FILE_OFFSET_BITS is defined
and equal to 64.

> > Any reason the native language needs to support LFS? If all access to
> > the index files is through Lucy, it shouldn't matter right?
> There's two levels of support we need to consider: whether the host
> language was compiled using LFS, and whether LFS is available at
> all.  I definitely want to avoid supporting systems that can't deal
> with large files at all because I don't want to have to think about
> how many bytes a file pointer might have every time I see one.  File
> pointers in Lucy should be 64-bit integers.  Period.
> As for the the case where the host language may not support LFS, we
> might get away with it, but I'm not a big fan of the idea, because
> LFS bugs are really hard to test for and only bite you when you've
> already got a lot going on.  And stuff can hide in funny places like
> that stat() call example.
> We should make Charmonizer's implementation fail-safe, regardless.
> We can add a LargeFiles_try_macros() function which adds those
> #defines to the probe code.  We can start off just with
> _LARGEFILE_SOURCE and _FILE_OFFSET_BITS=64, getting into the more
> esoteric #defines if we get failure reports.
> How many Ruby installs are there without LFS?   I'd be shocked if
> there were more than a handful of old and decrepit ones.  Should we
> support old versions?  I don't think Ferret is, and I'd prefer not
> to.  KinoSearch supports only Perl 5.8.3 and later.

Not many if any on *nix based systems but I'm not sure about Windows.
The standard version on windows doesn't have large file support.

> I propose that we probe for LFS in Ruby and bomb out if it's not
> there.  Then we add LargeFiles_try_macros() to ./charmonize and
> define -DLUCY_RUBY as a flag to enable it when compiling charmonize.c.
> #ifdef LUCY_RUBY
>      LargeFiles_try_macros();
> #endif
>      LarteFiles_run(conf_fh);
> > One other thing. Have you thought about detecting dirent.h in
> > charmonizer?
> We could add a Dirent module to Charmonizer, but I'm not sure I see
> immediate benefits.  We'll definitely need dirent.h for Lucy, because
> we need a way to list the contents of an FSDirectory/FSStore/
> FSInvIndex.  Fortunately, dirent.h is widely available.  Building
> Perl actually requires that it be available -- it's one of the few
> non-ANSI C modules Perl can't live without.

Well, unfortunately it's not available on VC6 which I need to use to
compile Ruby extensions. This is a bit of an issue in the ruby
community at the moment.

> The thing is, the behavior of dirent.h is predictable enough for our
> purposes.  Some systems provide d_namlen as a struct member, but
> others don't so if you want to write portable code you use strlen
> (entry->d_name).  I think that's the end of the story, isn't it?  We
> absolutely must have dirent.h, and we can write portable code for it
> without needing the sort of pre-compile-time probing Charmonizer
> provides.  We don't need to worry about other struct members that may
> or may not be there, and that a couple calls to strlen() on filenames
> won't be a performance concern.
> The only thing I can think of is whether readdir_r, the reentrant
> version of readdir, is always available.  That's something I don't
> know.  But I don't see anything in the AutoConf documentation about
> it, so I'd gather it's always there.
> I think we're closing in on the feature set Lucy needs Charmonizer to
> supply.  It'd be sorta nice to detect non-IEEE floats so we could
> throw a meaningful error at compile-time rather than just fail
> Similarity's tests on encode_norm/decode_norm.  But I don't think
> it's worth the effort since those systems are so rare, and I'm going
> to back-burner that one.
> Filepath handling is the one big feature left I think we ought to put
> in Charmonizer.  That sounds ambitious, but it doesn't have to be.
> Lucy basically only needs to know what the directory separator is,
> because all it ever needs to do is concatenate the filename onto the
> index directory.  Directory names ought to be normalized to full
> filepaths, but such paths are always going to have to be supplied by
> the user at the native level, so we can rely upon native routines for
> normalization.
> Since Charmonizer is only serving one master for now, its FilePath
> module can be cheesy and only supply one constant macro, DIR_SEP.

That sounds fine to me.

> > Are we going to need any directory reading functions in
> > Lucy? I use it to clear the directory when the IndexWriter create flag
> > is set to true but I guess this isn't really necessary.
> You also need it when you read an index which resides on the
> filesystem into a RAMDirectory/RAMStore/RAMInvIndex.

True, but we could simply use the segments file to see what files are
available. I guess it wasn't much code to make the dirent stuff I
needed available in VC6.

View raw message