lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <wellnho...@aevum.de>
Subject Re: [lucy-user] C library - RAM index serialization/deserialization
Date Tue, 03 Apr 2018 13:26:43 GMT
On 02/04/2018 19:52, serkanmulayim@gmail.com wrote:
> I realized that it is not possible to serialize the RAM index directly to bytes. This
is why I made some tests to copy all files in the ram folder to FS folder by iterating over
the contents for the RAM folder.

Yes, you have to inspect the RAMFolder using the private API of Folder, 
FileHandle, InStream, etc. It's documented in the .cfh files in 
core/Lucy/Store but it seems that you already figured this out.

     https://git1-us-west.apache.org/repos/asf?p=lucy.git;a=tree;f=core/Lucy/Store

> Why are there these virtual files? (I suspect that for optimization purposes (e.g. in
order not to read the cfmeta.json file over again), virtual files hold the cfmeta.json values).
So my question is, is it possible to create the virtual files from cfmeta.json value with
an API call? Or do you have any other suggestions.

These are so-called "compound files" used to consolidate multiple files into a 
single one and reduce the number of open file handles. On the filesystem 
level, there are two files cfmeta.json and cf.dat but Lucy's Store API 
automatically returns information about the virtual files. If you want to 
treat compound files as regular files, you have to check the Folder objects 
returned by Folder_Find_Folder. If it's a Lucy::Store::CompoundFileReader, 
call CFReader_Get_Real_Folder to get the actual RAMFolder or FSFolder:

     if (Folder_is_a(subfolder, COMPOUNDFILEREADER)) {
         CompoundFileReader *cf_reader = (CompoundFileReader*)subfolder;
         subfolder = CFReader_Get_Real_Folder(cf_reader);
     }

After deserializing cfmeta.json and cf.dat into a RAMFolder, you'll have to 
recreate the CFReaders and replace the entry in the enclosing folder. Have a 
look at Folder_Consolidate to get the idea. But the `entries` hash isn't 
exposed, so you probably can't do that without changes to the Lucy source code.

As an alternative, you could try to change Lucy's behavior to not create 
compound files for RAMFolders at all. Subclassing RAMFolder and making 
Folder_Consolidate a no-op should work.

Nick

Mime
View raw message