hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alec Taylor <alec.tayl...@gmail.com>
Subject Low-latency queries, HDFS exclusively or should I go, e.g.: MongoDB?
Date Tue, 20 Jan 2015 11:40:08 GMT
I am architecting a platform incorporating: recommender systems,
information retrieval (ML), sequence mining, and Natural Language

Additionally I have the generic CRUD and authentication components,
with everything exposed RESTfully.

For the storage layer(s), there are a few options which immediately
present themselves:

Generic CRUD layer (high speed needed here, though I suppose I could use Redis…)

- Hadoop with HBase, perhaps with Phoenix for an elastic loose-schema
SQL layer atop
- Apache Spark (perhaps piping to HDFS)… ¿maybe?
- MongoDB (or a similar document-store), a graph-database, or even
something like Postgres

Analytics layer (to enable Big Data / Data-intensive computing features)

- Apache Spark
- Hadoop with MapReduce and/or utilising some other Apache /
non-Apache project with integration
- Disco (from Nokia)


Should I prefer one layer—e.g.: on HDFS—over multiple disparite
layers? - The advantage here is obvious, but I am certain there are
disadvantages. (and yes, I know there are various ways; automated and
manual; to push data from non HDFS-backed stores to HDFS)

Also, as a bonus answer, which stack would you recommend for this
user-network I'm building?

View raw message