hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adaryl \"Bob\" Wakefield, MBA" <adaryl.wakefi...@hotmail.com>
Subject Data cleansing in modern data architecture
Date Sun, 20 Jul 2014 18:36:26 GMT
In the old world, data cleaning used to be a large part of the data warehouse load. Now that
we’re working in a schemaless environment, I’m not sure where data cleansing is supposed
to take place. NoSQL sounds fun because theoretically you just drop everything in but transactional
systems that generate the data are still full of bugs and create junk data. 

My question is, where does data cleaning/master data management/CDI belong in a modern data
architecture? Before it hit hits Hadoop? After?

B.
Mime
View raw message