hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 臧冬松 <donal0...@gmail.com>
Subject Re: structured data split
Date Mon, 14 Nov 2011 08:32:09 GMT
Hi Charles,
Can you describe your MR workflow?
Do you use MR for reconstruction , analysis or simulation jobs?
What's the layout of the input and output files, ROOT? NTuple?
How do you split the input and merge the result?

Thanks!
Donal
2011/11/11 Charles Earl <charlescearl@me.com>

> Hi,
> Please also feel free to contact me. I'm working with STAR project at
> Brookhaven Lab, and we are trying to build a MR workflow for analysis of
> particle data. I've done some preliminary experiments running Root and
> other nuclear physics analysis software in MR and have been looking at
> various file layouts.
> Charles
> On Nov 11, 2011, at 9:26 AM, Will Maier wrote:
>
> > Hi Donal-
> >
> > On Fri, Nov 11, 2011 at 10:12:44PM +0800, ?????? wrote:
> >> My scenario is that I have lots of files from High Energy Physics
> experiment.
> >> These files are in binary format,about 2G each, but basically they are
> >> composed by lots of "Event", each Event is independent with others. The
> >> physicists use a C++ program called ROOT to analysis these files,and
> write the
> >> output to a result file(use open(),read(),write()).  I'm considering
> how to
> >> store the files in HDFS, and use the Map-reduce to analize them.
> >
> > May I ask which experiment you're working on? We run a HDFS cluster at
> one of
> > the analysis centers for the CMS detector at the LHC. I'm not aware of
> anyone
> > using Hadoop's MR for analysis, though about 10 PB of LHC data is now
> stored in
> > HDFS. For your/our use case, I think that you would have to implement a
> > domain-specific InputFormat yielding Events. ROOT files would be stored
> as-is in
> > HDFS.
> >
> > In CMS, we mostly run traditional HEP simulation and analysis workflows
> using
> > plain batch jobs managed by common schedulers like Condor or PBS. These
> of
> > course lack some of the features of the MR schedulers (like location
> awareness),
> > but have some advantages. For example, we run Condor schedulers that
> > transparently manage workflows of tens of thousands of jobs on dozens of
> > heterogeneous clusters across North America.
> >
> > Feel free to contact me off-list if have more HEP-specific questions
> about HDFS.
> >
> > Thanks!
> >
> > --
> >
> > Will Maier - UW High Energy Physics
> > cel: 608.438.6162
> > tel: 608.263.9692
> > web: http://www.hep.wisc.edu/~wcmaier/
>
>

Mime
View raw message