hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilm Schumacher <wilm.schumac...@gmail.com>
Subject Re: XML files in Hadoop
Date Sat, 03 Jan 2015 16:14:32 GMT
Hi,

how many xml files are you planning to store? Perhaps it is possible to
store them directly on hdfs and save meta data in hbase. This sounds
more reasonable to me.

If the number of xml files is to large (millions and billions), then you
can use hadoop map files to put files together. E.g. based on years, or
month.

Regards,

Wilm

Am 03.01.2015 um 17:06 schrieb Shashidhar Rao:
> Hi,
>
> Can someone help me by suggesting the best way to solve this use case
>
> 1. XML files keep flowing from external system and need to be stored
> into HDFS.
> 2. These files  can be directly stored using NoSql database e.g any
> xml supported NoSql. or
> 3. These files need to be processed and stored in one of the database
> HBase, Hive etc.
> 4. There won't be any updates only read and has to be retrieved based
> on some queries and a dashboard has to be created , bits of analytics
>
> The xml files are huge and expected number of nodes is roughly around
> 12 nodes.
> I am stuck in the storage part say if I convert xml to json and store
> it into HBase , the processing part from xml to json will be huge.
>
> It will be only reading and no updates.
>
> Please suggest how to store these xml files.
>
> Thanks
> Shashi


Mime
View raw message