flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Persist streams of data
Date Mon, 29 Sep 2014 16:21:32 GMT
Thanks Fabian for the support. See inline for answers:

On Mon, Sep 29, 2014 at 6:12 PM, Fabian Hueske <fhueske@apache.org> wrote:

> Hi,
>
> there the right answer depends on (at least) two aspects:
>
> a) Do you have an actual streaming case or is it batch, i.e., does the
> data come from a potentially infinite stream or not? This basically
> determines the system to handle your data.
>   - Stream: I don't have much experience here, but Flink's new
> Streaming feature, Kafka or Flume might be worth looking at.
>   - Batch: A regular Flink job might work.
>

Stream, triples are generated from an external program with some batch size

b) How do you want to access your data? This influences the format to store
> the data.
>       - Full scans of some columns (large fraction of tuples) -> Parquet
> or ORC in HDFS
>       - Point access to certain tuples (also subsets of columns, few or
> many tuples) -> HBase,
>       - always read all full tuples -> Avro, ProtoBufs in HDFS
>
> Full scans of some columns. Is it possible to add batch of rows to a
parquet file? Or do I need to create a new File for each batch?
Then can I read an entire directory containing those files at once?


> I don't know how much throughput these systems are able to handle though...
>
> Hope this helps,
> Fabian
>
> 2014-09-29 17:32 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>
>> Hi guys,
>>
>> in my use case I have burst of data coming into my system (RDF triples
>> generated from a CSV that I need to process in a further step) and I was
>> trying to figure it out what is the best way to save them on HDFS.
>> Do you suggest me to save them on HBase or to use a serialization tool
>> like avro/parquet and similar? Do I need Flume as well or there's a Flink
>> solution for that?
>>
>> Best,
>> Flavio
>>
>

Mime
View raw message