hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashidhar Rao <raoshashidhar...@gmail.com>
Subject Re: Storing Json format in Hbase
Date Sun, 04 Jan 2015 16:02:30 GMT
 Ayache,

In fact my use case fit Exist db , open source ,no license and after all
xml documents and query through xquery and xpath and thanks for the
suggestion.
One last question, what do you think of this exist db ? Can this db scale
well upto 50 -100 terabytes or more of xml data load in future. I mean I
could not find much on their web site.
Who all are using this exist db in production? Any idea.

Thanks
Shashi

On Sun, Jan 4, 2015 at 9:15 PM, Shashidhar Rao <raoshashidhar123@gmail.com>
wrote:

> Thanks a lot Ayache for the links
>
> On Sun, Jan 4, 2015 at 8:49 PM, Ayache Khettar <
> ayache.khettar@googlemail.com> wrote:
>
>> Hi
>>
>> HBase doesn't support XML query using xpath. For that you will have to
>> consider an XML database such as exist (
>> http://exist-db.org/exist/apps/homepage/index.html) or MarkLogic
>> (requires
>> commercial licence). If you still want to use Hbase then consider storing
>> metadata along the XML Payload with same row ID. You will have to think of
>> your queries first before making decision on how metadata you would want
>> to
>> store. In one of the project I was involved in, we stored metadata data in
>> Apache solar using Hbase indexer (see cloudera product suite
>>
>> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-User-Guide/csug_use_hbase_indexer_service.html
>> )
>> which is near real time update. So the payload xml ends up in hbase and
>> the
>> metadata goes into apache solar. So you query against apache solar as
>> opposed to Hbase.
>>
>> There are various ways on how to achieve what you wanted and all down to
>> the choice of the technology and architecture drivers.
>>
>> all the best
>>
>> Ayache
>>
>>
>>
>>
>> On 4 January 2015 at 11:47, Shashidhar Rao <raoshashidhar123@gmail.com>
>> wrote:
>>
>> > Ayache and Chandrashekhar,
>> >
>> > You are correct, even I am reluctant to go for json transformation.
>> Storing
>> > xml in Hbase without  transformation to json would be a lot easier at
>> the
>> > storing stage.
>> >
>> > But, my concern is querying this xml data from HBase. Queries include
>> > aggregation, count and joins just to name a few. Can you please shed
>> some
>> > lights on how to query xml data from Hbase , is it possible to use
>> xquery
>> > or xpath?
>> >
>> > Json transformation was considered because of Mongodb, as it supports
>> > native json format and  it seems to be good in analytics. Analytics
>> would
>> > be at later stage.
>> >
>> > Can you please share some insights into xml querying from Hbase ,any
>> links
>> > would be helpful or any example , I am unable to find.
>> >
>> > Thanks in advance
>> >
>> > Shashi
>> >
>> > On Sun, Jan 4, 2015 at 4:18 PM, Ayache Khettar <
>> > ayache.khettar@googlemail.com> wrote:
>> >
>> > > Hi
>> > >
>> > > You could perfectly store XML into Hbase without any issue. All
>> depends
>> > > what do with the XML. To query back the XML, you will have to store
>> its
>> > >  metadata with it using the same row ID. This way you could query back
>> > the
>> > > XML. I would go for JSON transformation only if the down stream flow
>> > needs
>> > > the payload in JSON format.
>> > >
>> > > Ayache
>> > > On 4 January 2015 at 10:07, Chandrashekhar Kotekar <
>> > > shekhar.kotekar@gmail.com> wrote:
>> > >
>> > > > You can convert xml to json using map-reduce program and then store
>> > json
>> > > > into HBase but you need to decide what should be your row key.
>> > > >
>> > > > Another point you have to take into account is that if you want to
>> > search
>> > > > anything inside json or not. If you want to search inside json then
>> > HBase
>> > > > won't be best option for you. Probably you can switch to MongoDB or
>> > some
>> > > > other document store.
>> > > >
>> > > > Hope it helps...
>> > > >
>> > > > Regards,
>> > > > Chandrashekhar
>> > > > On 04-Jan-2015 3:32 PM, "Shashidhar Rao" <
>> raoshashidhar123@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Can someone guide me if the solution I am proposing is a feasible
>> > > option
>> > > > or
>> > > > > not
>> > > > >
>> > > > > 1. Large xml data is delivered through external system.
>> > > > > 2. Convert these into json format.
>> > > > > 3. Store it into HBASE ,even though there will be hardly any
>> updates
>> > ,
>> > > > only
>> > > > > retrieval. I have looked at Hive but finally had to decide
>> against it
>> > > as
>> > > > > retrieval would be slow.
>> > > > > 4. Need to use Hadoop Nosql as other components are all using
>> Hadoop
>> > > > > ecosystem.
>> > > > >
>> > > > > Can xml data be directly stored into Hbase without any
>> > > > > transformation.(second question)
>> > > > >
>> > > > > Any suggestions on storing xml data on Nosql. (only open source
>> and
>> > no
>> > > > > commercial nosql)
>> > > > >
>> > > > > Thanks in advance
>> > > > >
>> > > > > Shashi
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message