accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Running Accumulo on a standard file system, without Hadoop
Date Tue, 17 Jan 2017 15:59:15 GMT
On Mon, Jan 16, 2017 at 5:53 PM, Josh Elser <josh.elser@gmail.com> wrote:
>
>
> Dylan Hutchison wrote:
>>>
>>> You can configure HDFS to use the RawLocalFileSystem class forfile://
>>> >  URIs which is what is done for a majority of the integration tests.
>>> > Beware
>>> >  that you configure the RawLocalFileSystem as the ChecksumFileSystem
>>> >  (default forfile://) will fail miserably around WAL recovery.
>>> >
>>> >  https://github.com/apache/accumulo/blob/master/test/src/main
>>> >  /java/org/apache/accumulo/test/BulkImportVolumeIT.java#L61
>>> >
>>> >
>>
>> Hi Josh, are you saying that the ChecksumFileSystem is required or
>> forbidden for WAL recovery?  Looking at the Hadoop code it seems that
>> LocalFileSystem wraps around a RawLocalFileSystem to provide checksum
>> capabilities.  Is that right?
>>
>
> Sorry I wasn't clearer: forbidden. If you use the RawLocalFileSystem and you
> should not see any issues. If you use the ChecksumFileSystem (which is the
> default) and you *will* see issues.

The ChecksumFileSystem does nothing for flush, thats why there are WAL
problems.  The RawLocalFileSystem pushes data to the OS (which may
buffer in memory for a short period), when flush is called.  However,
RawLocalFileSystem does not offer a way to force data to disk.  So
with RawLocalFileSystem you can restart Accumulo processes w/o losing
data.  However, it the OS is restarted then data may be lost.

Mime
View raw message