hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/Tutorial" by StevenWong
Date Sat, 18 Jun 2011 01:26:14 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/Tutorial" page has been changed by StevenWong:
http://wiki.apache.org/hadoop/Hive/Tutorial?action=diff&rev1=38&rev2=39

Comment:
Fix typo.

  ||string ||regexp_replace(string A, string B, string C) ||returns the string resulting from
replacing all substrings in B that match the Java regular expression syntax(See [[http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html|Java
regular expressions syntax]]) with C. For example, regexp_replace('foobar', 'oo<nowiki>|</nowiki>ar',
) returns 'fb' ||
  ||int ||size(Map<K.V>) ||returns the number of elements in the map type ||
  ||int ||size(Array<T>) ||returns the number of elements in the array type ||
- ||<type> ||cast(expr as <type>) ||converts the results of the expression expr
to <type> e.g. cast('1' as BIGINT) will convert the string '1' to it integral representation.
A null is returned if the conversion does not succeed. ||
+ ||'''Expected "=" to follow "type"'''||cast(expr as <type>) ||converts the results
of the expression expr to <type> e.g. cast('1' as BIGINT) will convert the string '1'
to it integral representation. A null is returned if the conversion does not succeed. ||
  ||string ||from_unixtime(int unixtime) ||convert the number of seconds from unix epoch (1970-01-01
00:00:00 UTC) to a string representing the timestamp of that moment in the current system
time zone in the format of "1970-01-01 00:00:00" ||
  ||string ||to_date(string timestamp) ||Return the date part of a timestamp string: to_date("1970-01-01
00:00:00") = "1970-01-01" ||
  ||int ||year(string date) ||Return the year part of a date or a timestamp string: year("1970-01-01
00:00:00") = 1970, year("1970-01-01") = 1970 ||
@@ -461, +461 @@

   * Dynamic partition insert could potentially resource hog in that it could generate a large
number of partitions in a short time. To get yourself buckled, we define three parameters:
    * '''hive.exec.max.dynamic.partitions.pernode''' (default value being 100) is the maximum
dynamic partitions that can be created by each mapper or reducer. If one mapper or reducer
created more than that the threshold, a fatal error will be raised from the mapper/reducer
(through counter) and the whole job will be killed.
    * '''hive.exec.max.dynamic.partitions''' (default value being 1000) is the total number
of dynamic partitions could be created by one DML. If each mapper/reducer did not exceed the
limit but the total number of dynamic partitions does, then an exception is raised at the
end of the job before the intermediate data are moved to the final destination.
-   * '''hive.max.created.files''' (default value being 100000) is the maximum total number
of files created by all mappers and reducers. This is implemented by updating a Hadoop counter
by each mapper/reducer whenever a new file is created. If the total number is exceeding hive.max.created.files,
a fatal error will be thrown and the job will be killed.
+   * '''hive.exec.max.created.files''' (default value being 100000) is the maximum total
number of files created by all mappers and reducers. This is implemented by updating a Hadoop
counter by each mapper/reducer whenever a new file is created. If the total number is exceeding
hive.exec.max.created.files, a fatal error will be thrown and the job will be killed.
  
   * Another situation we want to protect against dynamic partition insert is that the user
may accidentally specify all partitions to be dynamic partitions without specifying one static
partition, while the original intention is to just overwrite the sub-partitions of one root
partition. We define another parameter hive.exec.dynamic.partition.mode=strict to prevent
the all-dynamic partition case. In the strict mode, you have to specify at least one static
partition. The default mode is strict. In addition, we have a parameter hive.exec.dynamic.partition=true/false
to control whether to allow dynamic partition at all. The default value is false.
   * In Hive 0.6, dynamic partition insert does not work with hive.merge.mapfiles=true or
hive.merge.mapredfiles=true, so it internally turns off the merge parameters. Merging files
in dynamic partition inserts are supported in Hive 0.7 (see JIRA HIVE-1307 for details).

Mime
View raw message