hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Earl <charlesce...@me.com>
Subject Re: structured data split
Date Fri, 11 Nov 2011 14:42:44 GMT
Hi,
Please also feel free to contact me. I'm working with STAR project at Brookhaven Lab, and
we are trying to build a MR workflow for analysis of particle data. I've done some preliminary
experiments running Root and other nuclear physics analysis software in MR and have been looking
at various file layouts.
Charles
On Nov 11, 2011, at 9:26 AM, Will Maier wrote:

> Hi Donal-
> 
> On Fri, Nov 11, 2011 at 10:12:44PM +0800, ?????? wrote:
>> My scenario is that I have lots of files from High Energy Physics experiment.
>> These files are in binary format,about 2G each, but basically they are
>> composed by lots of "Event", each Event is independent with others. The
>> physicists use a C++ program called ROOT to analysis these files,and write the
>> output to a result file(use open(),read(),write()).  I'm considering how to
>> store the files in HDFS, and use the Map-reduce to analize them.
> 
> May I ask which experiment you're working on? We run a HDFS cluster at one of
> the analysis centers for the CMS detector at the LHC. I'm not aware of anyone
> using Hadoop's MR for analysis, though about 10 PB of LHC data is now stored in
> HDFS. For your/our use case, I think that you would have to implement a
> domain-specific InputFormat yielding Events. ROOT files would be stored as-is in
> HDFS.
> 
> In CMS, we mostly run traditional HEP simulation and analysis workflows using
> plain batch jobs managed by common schedulers like Condor or PBS. These of
> course lack some of the features of the MR schedulers (like location awareness),
> but have some advantages. For example, we run Condor schedulers that
> transparently manage workflows of tens of thousands of jobs on dozens of
> heterogeneous clusters across North America.
> 
> Feel free to contact me off-list if have more HEP-specific questions about HDFS.
> 
> Thanks!
> 
> -- 
> 
> Will Maier - UW High Energy Physics
> cel: 608.438.6162
> tel: 608.263.9692
> web: http://www.hep.wisc.edu/~wcmaier/


Mime
View raw message