lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: >1MB file to Zookeeper
Date Sat, 05 May 2012 12:39:05 GMT
ZK is not really designed for keeping large data files, from http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#Data+Access:
> ZooKeeper was not designed to be a general database or large object store.....If large
data storage is needed, the usually pattern of dealing with such data is to store it on a
bulk storage system, such as NFS or HDFS, and store pointers to the storage locations in ZooKeeper.

So perhaps we should think about adding K/V store support to ResourceLoader? If a file is
>1Mb, a reference to the file is stored in ZK under
the original resource name, in a way that ResourceLoader can tell that it is a reference,
not the complete file. We then make a simple 
LargeObjectStoreInterface (with get/put/del) which ResourceLoader uses to get the complete
file based on reference. To start with we can make a
ZkLargeFileStoreImpl where the put(key,val) method chops up the file and stores it spanning
multiple 1M ZK nodes, and the get(key) method
assembles all parts and returns the object. It would be good enough for most, but if you require
something better you can easily impl
support for CouchDb, Voldemort or whatever.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 4. mai 2012, at 19:09, Yonik Seeley wrote:

> On Fri, May 4, 2012 at 12:50 PM, Mark Miller <markrmiller@gmail.com> wrote:
>>> And how should we detect if data is compressed when
>>> reading from ZooKeeper?
>> 
>> I was thinking we could somehow use file extensions?
>> 
>> eg synonyms.txt.gzip - then you can use different compression algs depending on the
ext, etc.
>> 
>> We would want to try and make it as transparent as possible though...
> 
> At first I thought about adding a marker to the beginning of a file, but
> file extensions could work too, as long as the resource loader made it
> transparent
> (i.e. code would just need to ask for synonyms.txt, but the resource
> loader would search
> for synonyms.txt.gzip, etc, if the original name was not found)
> 
> Hmmm, but this breaks down for things like watches - I guess that's
> where putting the encoding inside the file would be a better option.
> 
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10


Mime
View raw message