hdt-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hdt Wiki] Update of "HDTProductExperience" by BobKerns
Date Tue, 26 Feb 2013 08:37:26 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hdt Wiki" for change notification.

The "HDTProductExperience" page has been changed by BobKerns:
http://wiki.apache.org/hdt/HDTProductExperience

New page:
= HDT Product Experience =

This page is intended to collect thoughts about the overall product experience, from a user's
point of view, for a mature product. It is hoped this will lead to a shared idealized vision
of things like packaging, distribution, scope, and feature set that might guide more practical
and concrete development plans.

Many of these ideas might require cooperation, design, and implementation efforts in conjunction
with other Hadoop projects. We might expect to provide leadership and some core technology
to ensure high commonality.

== Installation ==

 * The user should be able to install using a single update site for all Hadoop-related Eclipse
tools.
   * Not all components would need to be supplied by us.
   * It might (or might not) be better for the update site to be handled by Hadoop-Core.
   * It should at least be linked from all relevant projects.
 * Simple Eclipse install
 * Updates made available promptly.
 * Compatible with current and recent Eclipse versions.

== Configuration ==

 * The user should be able to obtain a cluster's configuration by entering the hostname of
the master namenode.
   * All Hadoop services ought to coordinate to publish the necessary configuration information,
downloadable via a single URL.
   * The cluster should be able to provide multiple alternative configurations for the user
to select from.
 * The user should be able to override cluster-provided information.
   * The user should be able to provide multiple overrides of cluster-provided information
for different purposes, and select where appropriate.
     * The exact definition of "where appropriate" may get tricky.

== Feature Set ==

This list is intended to be broad features that users might expect to be available across
components.

In some cases, these duplicate functionality that is or might be provided via the web UI.
It's fine to integrate the web UI into HDT, but we should be looking for ways to link more
deeply into to Eclipse. Errors in the log, for example, should take us to Java source for
classes mentioned, and failing tasks and jobs should lead us to the appropriate step in the
appropriate script editor.

  * Data browsing
    * Filesystem browsing, for all supported filesystems
    * Database browsing, for all supported databases
    * Facilities for viewing large data sets -- sampling, searching, random access. Should
''not'' run out of memory trying to look at files in HDFS.
  * Job submission, for all types of jobs
  * Job tracking, for all types of jobs
    * This would include drilling down into more complex composite jobs, for example Pig,
or the multiple jobs that make up a Hive select.
    * Ultimately, this drills down to tasks.
    * Counters
  * Log browsing, for logs from all types of jobs and tasks
  * Debugging
    * Remote attachment of Eclipse to tasks
    * Remote debugging of scripts (for example, pause after first intermediate join to examine
result before passing it to next step, etc.)
    * Counters
  * Cluster monitoring (perhaps in one compact display)
    * Service and node status
    * Cluster load monitoring
    * Filesystem available space display

== Administration ==

Ideally, we would provide optional administration facilities. These would almost certainly
duplicate the web interfaces (which they should reuse). The advantages of including with Eclipe
are

  * One-stop shopping
  * Tighter integration -- for example, if service status display shows a service is down,
allowing you to restart the service in place would be handy -- or at least taking you to the
appropriate admin page.
    * Taking you to the appropriate admin page would allow better integration with 3rd-party
tools such as Cloudera Manager.
  * Tools for managing filesystem space. Who's using what, where.

== Security ==

  * The tools should properly integrate, and be tested against, a locked-down secure install.
    * "Secure" may be overstating things now, but security will become an increasing concern
as clusters are used by increasingly heterogeneous user communities.
  * Support for multiple user credentials on a particular cluster as part of the user-selectable
customized configuration.
  * The tools should aid the user in identifying the security status of resources, and tracking
down failures due to security restrictions.
  * Tools could aid the user in determining whether jobs comply with defined security policies.

Mime
View raw message