hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashidhar Rao <raoshashidhar...@gmail.com>
Subject XML files in Hadoop
Date Sat, 03 Jan 2015 16:06:00 GMT
Hi,

Can someone help me by suggesting the best way to solve this use case

1. XML files keep flowing from external system and need to be stored into
HDFS.
2. These files  can be directly stored using NoSql database e.g any xml
supported NoSql. or
3. These files need to be processed and stored in one of the database
HBase, Hive etc.
4. There won't be any updates only read and has to be retrieved based on
some queries and a dashboard has to be created , bits of analytics

The xml files are huge and expected number of nodes is roughly around 12
nodes.
I am stuck in the storage part say if I convert xml to json and store it
into HBase , the processing part from xml to json will be huge.

It will be only reading and no updates.

Please suggest how to store these xml files.

Thanks
Shashi

Mime
View raw message