www-infrastructure-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brock Noland (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (INFRA-3672) import files into Hive Confluence
Date Tue, 21 Jun 2011 01:36:47 GMT

    [ https://issues.apache.org/jira/browse/INFRA-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052308#comment-13052308
] 

Brock Noland commented on INFRA-3672:
-------------------------------------

1. Yes you are right. They are there, if you search for ContributorsMeetings you can find
it. I noticed this with the Language sub pages and figured I would just fix them up.

2. You are correct. I noticed one such change is this one:

http://wiki.apache.org/hadoop/Hive?action=diff&rev1=72&rev2=71

But looking in my dump, I do not have that anywhere:

{(monospaced)}
$ pwd
/Users/noland/tmp/hive-wiki-full
$ grep -R Jenkins .
$ 
{(monospaced)}

I took the version specified in the "current" file and that revision does not have Jenkins,
it has Hudson:

{(monospaced)}
$ cat Hive/current 
00000061
$ cat Hive/revisions/00000061
= What is Hive =
[[http://hadoop.apache.org/hive/|Hive]] is a data warehouse infrastructure built on top of
[[.|Hadoop]]. It provides tools to enable easy data ETL, a mechanism to put structures on
the data, and the capability to querying and analysis of large data sets stored in Hadoop
files. Hive defines a simple SQL-like query language, called QL, that enables users familiar
with SQL to query the data. At the same time, this language also allows programmers who are
familiar with the MapReduce fromwork to be able to plug in their custom mappers and reducers
to perform more sophisticated analysis that may not be supported by the built-in capabilities
of the language.

Hive does not mandate read or written data be in the "Hive format"---there is no such thing.
Hive works equally well on Thrift, control delimited, or your specialized data formats.  Please
see [[/DeveloperGuide#File_Formats|File Format]] and [[http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook|SerDe]]
in [[/DeveloperGuide|Developer Guide]] for details.

= What Hive is NOT =
Hadoop is a batch processing system and Hadoop jobs tend to have high latency and incur substantial
overheads in job submission and scheduling. As a result - latency for Hive queries is generally
very high (minutes) even when data sets involved are very small (say a few hundred megabytes).
As a result it cannot be compared with systems such as Oracle where analyses are conducted
on a significantly smaller amount of data but the analyses proceed much more iteratively with
the response times between iterations being less than a few minutes. Hive aims to provide
acceptable (but not optimal) latency for interactive data browsing, queries over small data
sets or test queries. Hive also does not provide sort of data or query cache to make repeated
queries over the same data set faster.

Hive is not designed for online transaction processing and does not offer real-time queries
and row level updates. It is best used for batch jobs over large sets of immutable data (like
web logs). What Hive values most are scalability (scale out with more machines added dynamically
to the Hadoop cluster), extensibility (with MapReduce framework and UDF/UDAF/UDTF), fault-tolerance,
and loose-coupling with its input formats.

= Information =
 * General information about Hive
  * [[/GettingStarted|Getting Started]]
  * [[/Presentations|Presentations and Papers about Hive]]
  * [[/PoweredBy|A List of Sites and Applications Powered by Hive]]
  * [[http://hadoop.apache.org/hive/mailing_lists.html#Users|hive-users mailing list]]
  * Hive IRC Channel: ##hive at irc.freenode.net
 * For users:
  * [[/Tutorial|Hive Tutorial]]
  * [[/LanguageManual|HiveQL Language Manual (Queries, DML, DDL, and CLI)]]
  * [[/HivePlugins|Hive Plug-in Interfaces - User-Defined Functions and SerDes]]
  * [[/LanguageManual/UDF|Guide to Hive Operators and Functions]]
  * [[/HiveWebInterface|Hive Web Interface]]
  * [[/HiveClient|Hive Client (JDBC, ODBC, Thrift, etc)]]
 * For developers:
  * [[/HowToContribute|How to Contribute]]
  * [[/Development/ContributorsMeetings|Hive Contributors Meetings]]
  * [[/DeveloperGuide|Hive Developer Guide]]
  * [[/Performance|Hive Performance]]
  * [[/Design|Hive Architecture Overview]]
  * [[/DesignDocs|Hive Design Docs]]
  * [[/Roadmap|Roadmap/call to Add More Features]]
  * [[/HowToCommit|How to Commit]]
  * [[/HowToRelease|How to Release]]
 * For administrators:
  * [[/AdminManual/Installation|Installing Hive]]
  * [[/AdminManual/Configuration|Configuring Hive]]
  * [[/AdminManual/MetastoreAdmin|Setting up Metastore]]
  * [[/HiveWebInterface|Setting up Hive Web Interface]]
  * [[/AdminManual/SettingUpHiveServer|Setting up Hive Server (JDBC, ODBC, Thrift, etc)]]
  * [[/HiveAws|Hive on Amazon Web Services]]
 * Build Status:
  * [[http://hudson.zones.apache.org/hudson/view/Hive/|Hive builds]]
  * [[/HudsonBuild|HudsonBuild]]
 * [[/FAQ|FAQ]]
For more information, please see the official [[http://hadoop.apache.org/hive/|Hive website]].
{(monospaced)}

> import files into Hive Confluence
> ---------------------------------
>
>                 Key: INFRA-3672
>                 URL: https://issues.apache.org/jira/browse/INFRA-3672
>             Project: Infrastructure
>          Issue Type: Task
>      Security Level: public(Regular issues) 
>          Components: Confluence
>            Reporter: John Sichi
>            Assignee: Gavin
>         Attachments: Hive-161742-2.xml.zip
>
>
> This is a companion to INFRA-3641.  We've converted the MoinMoin dump into Confluence
using a test Confluence server, and then exported the space from there.  I don't think I have
the necessary Confluence access for doing the import into the real Apache Confluence server,
so I need some help with that.
> The export is in /home/jvs/Hive-204911-2.xml.zip
> It's OK if the import deletes any existing pages (there's currently only one, the Bylaws,
and I've made my own backup of that).
> After the import is done, we'll need to apply changes from MoinMoin which took place
since the dump, so if this request can be carried out soon, that would be helpful.
> Thanks!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message