Return-Path: X-Original-To: apmail-incubator-bigtop-commits-archive@minotaur.apache.org Delivered-To: apmail-incubator-bigtop-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9BC239D1D for ; Wed, 7 Mar 2012 23:23:57 +0000 (UTC) Received: (qmail 59704 invoked by uid 500); 7 Mar 2012 23:23:57 -0000 Delivered-To: apmail-incubator-bigtop-commits-archive@incubator.apache.org Received: (qmail 59657 invoked by uid 500); 7 Mar 2012 23:23:57 -0000 Mailing-List: contact bigtop-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: bigtop-dev@incubator.apache.org Delivered-To: mailing list bigtop-commits@incubator.apache.org Received: (qmail 59647 invoked by uid 99); 7 Mar 2012 23:23:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Mar 2012 23:23:57 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Mar 2012 23:23:52 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 682F22388C35; Wed, 7 Mar 2012 23:22:51 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1298204 [12/36] - in /incubator/bigtop/branches/hadoop-0.23/bigtop-tests/test-artifacts/package: ./ src/main/groovy/org/apache/bigtop/itest/packagesmoke/ src/main/resources/ src/main/resources/apt/ src/main/resources/urpmi/ src/main/resour... Date: Wed, 07 Mar 2012 23:22:44 -0000 To: bigtop-commits@incubator.apache.org From: rvs@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20120307232251.682F22388C35@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Copied: incubator/bigtop/branches/hadoop-0.23/bigtop-tests/test-artifacts/package/src/main/resources/urpmi/package_data.xml (from r1297315, incubator/bigtop/branches/hadoop-0.23/bigtop-tests/test-artifacts/package/src/main/resources/package_data_urpmi.xml) URL: http://svn.apache.org/viewvc/incubator/bigtop/branches/hadoop-0.23/bigtop-tests/test-artifacts/package/src/main/resources/urpmi/package_data.xml?p2=incubator/bigtop/branches/hadoop-0.23/bigtop-tests/test-artifacts/package/src/main/resources/urpmi/package_data.xml&p1=incubator/bigtop/branches/hadoop-0.23/bigtop-tests/test-artifacts/package/src/main/resources/package_data_urpmi.xml&r1=1297315&r2=1298204&rev=1298204&view=diff ============================================================================== --- incubator/bigtop/branches/hadoop-0.23/bigtop-tests/test-artifacts/package/src/main/resources/package_data_urpmi.xml (original) +++ incubator/bigtop/branches/hadoop-0.23/bigtop-tests/test-artifacts/package/src/main/resources/urpmi/package_data.xml Wed Mar 7 23:22:40 2012 @@ -17,54 +17,12 @@ --> - -

A set of Java libraries for scalable machine learning.

- Mahout's goal is to build scalable machine learning libraries. -With scalable we mean: - -Scalable to reasonably large data sets. Our core algorithms for clustering, -classfication and batch based collaborative filtering are implemented on top of -Apache Hadoop using the map/reduce paradigm. However we do not restrict -contributions to Hadoop based implementations: Contributions that run on a -single node or on a non-Hadoop cluster are welcome as well. The core libraries -are highly optimized to allow for good performance also for non-distributed -algorithms. -Scalable to support your business case. Mahout is distributed under a -commercially friendly Apache Software license. -Scalable community. The goal of Mahout is to build a vibrant, responsive, -diverse community to facilitate discussions not only on the project itself but -also on potential use cases. Come to the mailing lists to find out more. - http://mahout.apache.org - - - - - - - auto - /etc/mahout/conf - /etc/mahout/conf.dist - /etc/mahout/conf.dist - - - -

Scripts and libraries for running software services on cloud infrastructure.

- Whirr provides - -* A cloud-neutral way to run services. You don't have to worry about the - idiosyncrasies of each provider. -* A common service API. The details of provisioning are particular to the - service. -* Smart defaults for services. You can get a properly configured system - running quickly, while still being able to override settings as needed. - http://incubator.apache.org/whirr - @@ -794,26 +752,12 @@ also on potential use cases. Come to the - -

Flume is a reliable, scalable, and manageable distributed log collection application for collecting data such as logs and delivering it to data stores such as Hadoop's HDFS.

- Flume is a reliable, scalable, and manageable distributed data collection - application for collecting data such as logs and delivering it to data stores - such as Hadoop's HDFS. It can efficiently collect, aggregate, and move large - amounts of log data. It has a simple, but flexible, architecture based on - streaming data flows. It is robust and fault tolerant with tunable reliability - mechanisms and many failover and recovery mechanisms. The system is centrally - managed and allows for intelligent dynamic management. It uses a simple - extensible data model that allows for online analytic applications. - https://github.com/cloudera/flume - - >=0.20.2+710 - >=3.3.1+10 >=1.6

@@ -1096,33 +1040,8 @@ also on potential use cases. Come to the - - - flume - - - - - /var/run/flume - Flume - /sbin/nologin - - - - - auto - /etc/flume/conf - /etc/flume/conf.empty - /etc/flume/conf.empty - -

- -

The flume master daemon is the central administration and data path control point for flume nodes.

- Flume is a reliable, scalable, and manageable distributed data collection application for collecting data such as logs and delivering it to data stores such as Hadoop's HDFS. It can efficiently collect, aggregate, and move large amounts of log data. It has a simple, but flexible, architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications. - https://github.com/cloudera/flume - @@ -1130,19 +1049,11 @@ also on potential use cases. Come to the - /self >=1.6

- - - 2345 - stop - true - - @@ -1151,11 +1062,6 @@ also on potential use cases. Come to the

- -

The flume node daemon is a core element of flume's data path and is responsible for generating, processing, and delivering data.

- Flume is a reliable, scalable, and manageable distributed data collection application for collecting data such as logs and delivering it to data stores such as Hadoop's HDFS. It can efficiently collect, aggregate, and move large amounts of log data. It has a simple, but flexible, architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications. - https://github.com/cloudera/flume - @@ -1163,19 +1069,11 @@ also on potential use cases. Come to the - /self >=1.6

- - - 2345 - stop - true - - @@ -1184,19 +1082,12 @@ also on potential use cases. Come to the

- -

Sqoop allows easy imports and exports of data sets between databases and the Hadoop Distributed File System (HDFS).

- Sqoop allows easy imports and exports of data sets between databases and the Hadoop Distributed File System (HDFS). - http://www.cloudera.com - >=1.6 - - @@ -1341,93 +1232,20 @@ also on potential use cases. Come to the - - - auto - /etc/sqoop/conf - /etc/sqoop/conf.dist - /etc/sqoop/conf.dist - -

- -

Shared metadata repository for Sqoop.

- Shared metadata repository for Sqoop. This optional package hosts a metadata server for Sqoop clients across a network to use. - http://www.cloudera.com - - /self

- - - 2345 - stop - true - - - - - sqoop - - - - - /var/lib/sqoop - Sqoop - /sbin/nologin - -

- -

Oozie is a system that runs workflows of Hadoop jobs.

- Oozie is a system that runs workflows of Hadoop jobs. - Oozie workflows are actions arranged in a control dependency DAG (Direct - Acyclic Graph). - - Oozie coordinator functionality allows to start workflows at regular - frequencies and when data becomes available in HDFS. - - An Oozie workflow may contain the following types of actions nodes: - map-reduce, map-reduce streaming, map-reduce pipes, pig, file-system, - sub-workflows, java, hive, sqoop and ssh (deprecated). - - Flow control operations within the workflow can be done using decision, - fork and join nodes. Cycles in workflows are not supported. - - Actions and decisions can be parameterized with job properties, actions - output (i.e. Hadoop counters) and HDFS file information (file exists, - file size, etc). Formal parameters are expressed in the workflow definition - as ${VAR} variables. - - A Workflow application is an HDFS directory that contains the workflow - definition (an XML file), all the necessary files to run all the actions: - JAR files for Map/Reduce jobs, shells for streaming Map/Reduce jobs, native - libraries, Pig scripts, and other resource files. - - Running workflow jobs is done via command line tools, a WebServices API - or a Java API. - - Monitoring the system and workflow jobs can be done via a web console, the - command line tools, the WebServices API and the Java API. - - Oozie is a transactional system and it has built in automatic and manual - retry capabilities. - - In case of workflow job failure, the workflow job can be rerun skipping - previously completed actions, the workflow application can be patched before - being rerun. - http://www.cloudera.com - @@ -1436,15 +1254,7 @@ also on potential use cases. Come to the >=1.6 - /self - - - 2345 - stop - true - - @@ -1527,31 +1337,8 @@ also on potential use cases. Come to the - - - oozie - - - - - /var/run/oozie - Oozie User - /bin/false - -

- -

Client for Oozie Workflow Engine

- Oozie client is a command line client utility that allows remote - administration and monitoring of worflows. Using this client utility - you can submit worflows, start/suspend/resume/kill workflows and - find out their status at any instance. Apart from such operations, - you can also change the status of the entire system, get vesion - information. This client utility also allows you to validate - any worflows before they are deployed to the Oozie server. - http://www.cloudera.com - @@ -2352,29 +2139,8 @@ also on potential use cases. Come to the - - - auto - /etc/oozie/conf - /etc/oozie/conf.dist - /etc/oozie/conf.dist - -

- -

A high-performance coordination service for distributed applications.

- ZooKeeper is a centralized service for maintaining configuration information, -naming, providing distributed synchronization, and providing group services. -All of these kinds of services are used in some form or another by distributed -applications. Each time they are implemented there is a lot of work that goes -into fixing the bugs and race conditions that are inevitable. Because of the -difficulty of implementing these kinds of services, applications initially -usually skimp on them ,which make them brittle in the presence of change and -difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed. - http://hadoop.apache.org/zookeeper/ - - >=1.6 @@ -2729,87 +2495,21 @@ difficult to manage. Even when done corr - - - zookeeper - - - - - /var/run/zookeeper - ZooKeeper - /sbin/nologin - - - - - auto - /etc/zookeeper/conf - /etc/zookeeper/conf.dist - /etc/zookeeper/conf.dist - -

- - - -

The Hadoop Zookeeper server

- This package starts the zookeeper server on startup - http://hadoop.apache.org/zookeeper/ - - - - - - - - /self

- - - 2345 - stop - true - -

- -

Pig is a platform for analyzing large data sets

- Pig is a platform for analyzing large data sets that consists of a high-level language - for expressing data analysis programs, coupled with infrastructure for evaluating these - programs. The salient property of Pig programs is that their structure is amenable - to substantial parallelization, which in turns enables them to handle very large data sets. - - At the present time, Pig's infrastructure layer consists of a compiler that produces - sequences of Map-Reduce programs, for which large-scale parallel implementations already - exist (e.g., the Hadoop subproject). Pig's language layer currently consists of a textual - language called Pig Latin, which has the following key properties: - - * Ease of programming - It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data - analysis tasks. Complex tasks comprised of multiple interrelated data transformations - are explicitly encoded as data flow sequences, making them easy to write, understand, - and maintain. - * Optimization opportunities - The way in which tasks are encoded permits the system to optimize their execution - automatically, allowing the user to focus on semantics rather than efficiency. - * Extensibility - Users can create their own functions to do special-purpose processing. - http://hadoop.apache.org/pig/ - - >=1.6 @@ -6711,34 +6411,11 @@ difficult to manage. Even when done corr - - - auto - /etc/pig/conf - /etc/pig/conf.dist - /etc/pig/conf.dist - - - -

Hive is a data warehouse infrastructure built on top of Hadoop

- Hive is a data warehouse infrastructure built on top of Hadoop that - provides tools to enable easy data summarization, adhoc querying and - analysis of large datasets data stored in Hadoop files. It provides a - mechanism to put structure on this data and it also provides a simple - query language called Hive QL which is based on SQL and which enables - users familiar with SQL to query this data. At the same time, this - language also allows traditional map/reduce programmers to be able to - plug in their custom mappers and reducers to do more sophisticated - analysis which may not be supported by the built-in capabilities of - the language. - http://hadoop.apache.org/hive/ - - >=0.20.1 >=1.6 @@ -7151,39 +6828,14 @@ difficult to manage. Even when done corr - - - auto - /etc/hive/conf.dist - /etc/hive/conf - /etc/hive/conf.dist - - - -

HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.

- HBase is an open-source, distributed, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase includes: - - * Convenient base classes for backing Hadoop MapReduce jobs with HBase tables - * Query predicate push down via server side scan and get filters - * Optimizations for real time queries - * A high performance Thrift gateway - * A REST-ful Web service gateway that supports XML, Protobuf, and binary data encoding options - * Cascading source and sink modules - * Extensible jruby-based (JIRB) shell - * Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX - http://hadoop.apache.org/hbase/ - - - >=0.20.2+700 - >=3.3.1+8 >=1.6

@@ -7335,34 +6987,8 @@ difficult to manage. Even when done corr - - - auto - /etc/hbase/conf.dist - /etc/hbase/conf - /etc/hbase/conf.dist - - - - - hbase - - - - - /var/run/hbase - HBase - /sbin/nologin - -

- -

Hbase Documentation

- Documentation for Hbase - http://hadoop.apache.org/hbase/ - - @@ -9126,99 +8752,36 @@ difficult to manage. Even when done corr

- -

The Hadoop HBase master Server.

- HMaster is the "master server" for a HBase. There is only one HMaster for a single HBase deployment. - http://hadoop.apache.org/hbase/ - - /self

- - - 2345 - stop - true - -

- -

The Hadoop HBase RegionServer server.

- HRegionServer makes a set of HRegions available to clients. It checks in with the HMaster. There are many HRegionServers in a single HBase deployment. - http://hadoop.apache.org/hbase/ - - /self

- - - 2345 - stop - false - -

- -

The Hadoop HBase Thrift Interface

- ThriftServer - this class starts up a Thrift server which implements the Hbase API specified in the Hbase.thrift IDL file. "Thrift is a software framework for scalable cross-language services development. It combines a powerful software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, and Ruby. Thrift was developed at Facebook, and we are now releasing it as open source." For additional information, see http://developers.facebook.com/thrift/. Facebook has announced their intent to migrate Thrift into Apache Incubator. - http://hadoop.apache.org/hbase/ - - /self

- - - 2345 - stop - false - -

- -

Hadoop is a software platform for processing vast amounts of data

- Hadoop is a software platform that lets one easily write and -run applications that process vast amounts of data. - -Here's what makes Hadoop especially useful: -* Scalable: Hadoop can reliably store and process petabytes. -* Economical: It distributes the data and processing across clusters - of commonly available computers. These clusters can number - into the thousands of nodes. -* Efficient: By distributing the data, Hadoop can process it in parallel - on the nodes where the data is located. This makes it - extremely rapid. -* Reliable: Hadoop automatically maintains multiple copies of data and - automatically redeploys computing tasks based on failures. - -Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS). -MapReduce divides applications into many small blocks of work. HDFS creates -multiple replicas of data blocks for reliability, placing them on compute -nodes around the cluster. MapReduce can then process the data where it is -located. - http://hadoop.apache.org/core/ - -

@@ -10645,345 +10208,64 @@ located. - - - hdfs - mapred - - - hdfs - - - mapred - - - - - /usr/lib/hadoop - Hadoop HDFS - /bin/bash - - - /usr/lib/hadoop - Hadoop MapReduce - /bin/bash - - - - - auto - /etc/hadoop/conf - /etc/hadoop/conf.empty - /etc/hadoop/conf.empty - -

- - - - -

Hadoop Pipes Library

- Hadoop Pipes Library - http://hadoop.apache.org/core/ - - - - - - - - - /self - - - - - - - - - - - - - - - - - - -

Hadoop Pipes Library

- Hadoop Pipes Library - http://hadoop.apache.org/core/ - - - - - - - - - /self - - - - - - - - - - - - - - - - - - -

Native libraries for Hadoop Compression

- Native libraries for Hadoop compression - http://hadoop.apache.org/core/ - - - - - - - - - /self - - - - - - - - - - - - - - - -

Native libraries for Hadoop Compression

- Native libraries for Hadoop compression - http://hadoop.apache.org/core/ - - - - - - - + - /self - - - - - - - - - - - - - - - -

The Hadoop namenode manages the block locations of HDFS files

- The Hadoop Distributed Filesystem (HDFS) requires one unique server, the -namenode, which manages the block locations of files on the filesystem. - http://hadoop.apache.org/core/ - - - - - - - - - /self - - - 2345 - stop - false - - - - - - - -

Hadoop Secondary namenode

- The Secondary Name Node periodically compacts the Name Node EditLog -into a checkpoint. This compaction ensures that Name Node restarts -do not incur unnecessary downtime. - http://hadoop.apache.org/core/ - - - - - - - + + - /self - - - 2345 - stop - false - - - - - - - -

Hadoop Data Node

- The Data Nodes in the Hadoop Cluster are responsible for serving up -blocks of data over the network to Hadoop Distributed Filesystem -(HDFS) clients. - http://hadoop.apache.org/core/ - - - - - - - + + - /self - - - 2345 - stop - false - - - - - - - -

Hadoop Job Tracker

- The jobtracker is a central service which is responsible for managing -the tasktracker services running on all nodes in a Hadoop Cluster. -The jobtracker allocates work to the tasktracker nearest to the data -with an available work slot. - http://hadoop.apache.org/core/ - - - - - - - + + - /self - - - 2345 - stop - false - - - - - - - -

Hadoop Task Tracker

- The tasktracker has a fixed number of work slots. The jobtracker -assigns MapReduce work to the tasktracker that is nearest the data -with an available work slot. - http://hadoop.apache.org/core/ - - - - - - - + + - /self - - - 2345 - stop - false - - - + - - - -

Hadoop installation in pseudo-distributed mode

- Installation of this RPM will setup your machine to run in pseudo-distributed mode -where each Hadoop daemon runs in a separate Java process. - http://hadoop.apache.org/core/ - - - - - - - - /self - /self - /self - /self - /self - /self @@ -11010,19 +10292,6 @@ where each Hadoop daemon runs in a separ - - - -

Hadoop Documentation

- Documentation for Hadoop - http://hadoop.apache.org/core/ - - - - - - - @@ -14164,3572 +13433,11 @@ where each Hadoop daemon runs in a separ - - - - -

Source code for Hadoop

- The Java source code for Hadoop and its contributed packages. This is handy when -trying to debug programs that depend on Hadoop. - http://hadoop.apache.org/core/ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [... 3404 lines stripped ...]