hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daemeon reiydelle <daeme...@gmail.com>
Subject Re: Hadoop or RDBMS
Date Mon, 13 Jul 2015 14:53:15 GMT
Based on the brief description, which includes the relatively "small"
number of records, type of queries I can "imagine" the end customer would
make, my question would be how ad hoc are the queries vs. how well managed
by traditional RDBMS schemas?

Then I would be interested to understand the nature of your growth?

If commodity hardware/scalability is a driver, the size of the data
suggests traditional schema based rdbms's, perhaps with sharding such as
e.g. sharded MySQL, Postgress seems like it could scale well at the data
sizes you suggest. If you see both a significant growth and need the
ultra-ad hoc capability of a no-schema solution, I would ask if you have
considered Cassandra+Sparc (acknowledging the no-schema nature of the
repository drives quite a bit more data denormalization in C+S than in an

Net net, perhaps sharded mySQL could be a middle ground?


*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Jul 13, 2015 at 3:46 AM, James Peterzon | 123dm <james@123dm.nl>

> Hi there,
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
> Can anybody advise me?
> Thanks!
> Best regards,
> James Peterzon

View raw message