hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Low-latency queries, HDFS exclusively or should I go, e.g.: MongoDB?
Date Tue, 20 Jan 2015 15:27:28 GMT
Apache Spark supports integration with HBase (which has REST API).

What's the amount of data you want to store in this system ?

Cheers

On Tue, Jan 20, 2015 at 3:40 AM, Alec Taylor <alec.taylor6@gmail.com> wrote:

> I am architecting a platform incorporating: recommender systems,
> information retrieval (ML), sequence mining, and Natural Language
> Processing.
>
> Additionally I have the generic CRUD and authentication components,
> with everything exposed RESTfully.
>
> For the storage layer(s), there are a few options which immediately
> present themselves:
>
> Generic CRUD layer (high speed needed here, though I suppose I could use
> Redis…)
>
> - Hadoop with HBase, perhaps with Phoenix for an elastic loose-schema
> SQL layer atop
> - Apache Spark (perhaps piping to HDFS)… ¿maybe?
> - MongoDB (or a similar document-store), a graph-database, or even
> something like Postgres
>
> Analytics layer (to enable Big Data / Data-intensive computing features)
>
> - Apache Spark
> - Hadoop with MapReduce and/or utilising some other Apache /
> non-Apache project with integration
> - Disco (from Nokia)
>
> ________________________________
>
> Should I prefer one layer—e.g.: on HDFS—over multiple disparite
> layers? - The advantage here is obvious, but I am certain there are
> disadvantages. (and yes, I know there are various ways; automated and
> manual; to push data from non HDFS-backed stores to HDFS)
>
> Also, as a bonus answer, which stack would you recommend for this
> user-network I'm building?
>

Mime
View raw message