hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From UsefullyWastedIce <pavelgu...@gmail.com>
Subject Re: Will hadoop work for what i am trying to achieve?
Date Thu, 14 Jan 2010 19:17:37 GMT

For the sake of discussion, let's say my XML file is 5GB. 

When i say loading, i mean actually propagating that data across the HDFS.
When i process the data, i will be doing two things with it: filtering out
records that are irrelevant, and then modifying individual records by adding
additional information (joining them with other data). The final results
might potentially be saved by the end user, and then end user may want to
come back to it, and perform additional processing on it. 

XSLT processing would be ideal, but i've given up on it because I didn't
think it would work. I've done some tests with it on my local machine, and
in order to apply XSLT to an entire file, the entire file would get loaded
into memory, which was obviously not an option. 



stack-3 wrote:
> 
> Please describe what your queries will be like and what you mean by
> "loading, processing, and returning the data"?  So your files are xml? 
> What
> size?  Then you'd process them in user-time?  What kinda processing?
>  xslt'ing?
> 
> St.Ack
> 
> On Thu, Jan 14, 2010 at 10:03 AM, UsefullyWastedIce
> <pavelgutin@gmail.com>wrote:
> 
>>
>> I've been doing a lot of research over the past few days, but haven't
>> been
>> able to find out whether or not hadoop will work for what i am trying to
>> achieve.
>>
>> The data I have is initially in XML, and the user needs to be able to
>> query
>> that data very quickly (response time should be in the 10 second range).
>> Since the amount of data will grow into can easily grow into Gigabytes,
>> just
>> processing it on the fly is not fast enough.
>>
>> What I am was thinking of doing is loading that data into a hadoop
>> cluster,
>> processing it, and then serving the result back to the user. There are
>> many
>> tools I've looked at, and since most of them are running on top of
>> hadoop,
>> i
>> figured that this would be my biggest hurdle.
>>
>> Is 10 seconds a possible return time, including loading, processing and
>> returning the data?
>> --
>> View this message in context:
>> http://old.nabble.com/Will-hadoop-work-for-what-i-am-trying-to-achieve--tp27165540p27165540.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Will-hadoop-work-for-what-i-am-trying-to-achieve--tp27165540p27166654.html
Sent from the HBase User mailing list archive at Nabble.com.


Mime
View raw message