accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Weeks <rwe...@newbrightidea.com>
Subject Re: Anybody ever used the HDFS NFS Gateway?
Date Wed, 07 Oct 2015 02:25:52 GMT
Hi, Dylan,

Yeah, writing RFiles instead of using BatchWriters
(AccumuloFileOutputFormat vs. AccumuloOutputFormat) for efficiency and
atomicity of ingest ("improved" atomicity if that even makes sense).

I'm thinking about the NFS gateway just because the system that's producing
the CSV is kind of a black box to me. It doesn't speak Hadoop, as
Christopher alluded to, and I can't control its output format, but I can
direct its output to a filesystem that it perceives to be local.

My options are either an NFS write direct to HDFS via the gateway, or an
NFS write to a conventional filesystem that I control, followed by some
sort of inotify-driven migration from that server to HDFS.

-Russ

On Tue, Oct 6, 2015 at 6:12 PM Dylan Hutchison <dhutchis@uw.edu> wrote:

> Hi Russ,
>   I'm curious what you have in mind.  Are you looking for a solution more
> efficient than running clients that read the CSV files and open
> BatchWriters?
>
> Regards, Dylan
>
> On Tue, Oct 6, 2015 at 4:56 PM, Christopher <ctubbsii@apache.org> wrote:
>
>> I haven't tried it, but it sounds like a cool use case. Might be a good
>> alternative to distcp, more interoperable with tools which don't speak
>> hadoop.
>>
>> On Tue, Oct 6, 2015, 18:41 Russ Weeks <rweeks@newbrightidea.com> wrote:
>>
>>> I hope this isn't too off-topic. Any opinions re. its
>>> completeness/quality/reliability?
>>>
>>> (The use case is, CSV files -> NFS -> HDFS -> Spark -> RFiles ->
>>> Accumulo. Relevance established!)
>>>
>>> Thanks,
>>> -Russ
>>>
>>
>

Mime
View raw message