lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From serkanmulayim@gmail.com <serkanmula...@gmail.com>
Subject Re: [lucy-user] C library - RAM index serialization/deserialization
Date Tue, 03 Apr 2018 17:26:57 GMT
Hi Nick,

Thank you very much for your response. I was exactly stuck on what you mentioned. I need to
create an entry for the folder with the CFReader, but there is no internal functions supporting
that. Folder implementation fails since it does a check for the cfmeta.json.

Secondly it does not seem like I can link CFReader (Folder*) as an entry to the enclosing
folder since there is no function to link a folder inside a folder as far as I see. Can you
confirm?

If the above is not an option, I was thinking of subclassing RAMFolder. I do not think I need
to create a cfh file for that, but I will create my own Folder implementation with a static
header file. I think this is the right approach. (Unless there is a better approach which
does not require subclassing, e.g. by using inStreams)

Thanks again,
Serkan

On 2018/04/03 13:26:43, Nick Wellnhofer <wellnhofer@aevum.de> wrote: 
> On 02/04/2018 19:52, serkanmulayim@gmail.com wrote:
> > I realized that it is not possible to serialize the RAM index directly to bytes.
This is why I made some tests to copy all files in the ram folder to FS folder by iterating
over the contents for the RAM folder.
> 
> Yes, you have to inspect the RAMFolder using the private API of Folder, 
> FileHandle, InStream, etc. It's documented in the .cfh files in 
> core/Lucy/Store but it seems that you already figured this out.
> 
>      https://git1-us-west.apache.org/repos/asf?p=lucy.git;a=tree;f=core/Lucy/Store
> 
> > Why are there these virtual files? (I suspect that for optimization purposes (e.g.
in order not to read the cfmeta.json file over again), virtual files hold the cfmeta.json
values). So my question is, is it possible to create the virtual files from cfmeta.json value
with an API call? Or do you have any other suggestions.
> 
> These are so-called "compound files" used to consolidate multiple files into a 
> single one and reduce the number of open file handles. On the filesystem 
> level, there are two files cfmeta.json and cf.dat but Lucy's Store API 
> automatically returns information about the virtual files. If you want to 
> treat compound files as regular files, you have to check the Folder objects 
> returned by Folder_Find_Folder. If it's a Lucy::Store::CompoundFileReader, 
> call CFReader_Get_Real_Folder to get the actual RAMFolder or FSFolder:
> 
>      if (Folder_is_a(subfolder, COMPOUNDFILEREADER)) {
>          CompoundFileReader *cf_reader = (CompoundFileReader*)subfolder;
>          subfolder = CFReader_Get_Real_Folder(cf_reader);
>      }
> 
> After deserializing cfmeta.json and cf.dat into a RAMFolder, you'll have to 
> recreate the CFReaders and replace the entry in the enclosing folder. Have a 
> look at Folder_Consolidate to get the idea. But the `entries` hash isn't 
> exposed, so you probably can't do that without changes to the Lucy source code.
> 
> As an alternative, you could try to change Lucy's behavior to not create 
> compound files for RAMFolders at all. Subclassing RAMFolder and making 
> Folder_Consolidate a no-op should work.
> 
> Nick
> 

Mime
View raw message