Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8B194200C54 for ; Wed, 12 Apr 2017 13:51:44 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 89A57160B8A; Wed, 12 Apr 2017 11:51:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ACDEA160BAC for ; Wed, 12 Apr 2017 13:51:42 +0200 (CEST) Received: (qmail 40425 invoked by uid 500); 12 Apr 2017 11:51:41 -0000 Mailing-List: contact commits-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list commits@carbondata.incubator.apache.org Received: (qmail 40416 invoked by uid 99); 12 Apr 2017 11:51:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Apr 2017 11:51:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 676ED180A5A for ; Wed, 12 Apr 2017 11:51:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.222 X-Spam-Level: X-Spam-Status: No, score=-4.222 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id L_Qy4Spnv3rM for ; Wed, 12 Apr 2017 11:51:32 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 817795FAE0 for ; Wed, 12 Apr 2017 11:51:30 +0000 (UTC) Received: (qmail 38888 invoked by uid 99); 12 Apr 2017 11:51:29 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Apr 2017 11:51:29 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 97A65DFBC8; Wed, 12 Apr 2017 11:51:29 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: chenliang613@apache.org To: commits@carbondata.incubator.apache.org Date: Wed, 12 Apr 2017 11:51:29 -0000 Message-Id: <3d5a700f2c774598bf8177ef637ef0d6@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [01/54] incubator-carbondata-site git commit: Wip for Automating Documentation for Website archived-at: Wed, 12 Apr 2017 11:51:44 -0000 Repository: incubator-carbondata-site Updated Branches: refs/heads/asf-site bc1361dda -> 99fd49060 http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/release-guide.md ---------------------------------------------------------------------- diff --git a/src/site/markdown/release-guide.md b/src/site/markdown/release-guide.md deleted file mode 100644 index 50a0e8a..0000000 --- a/src/site/markdown/release-guide.md +++ /dev/null @@ -1,482 +0,0 @@ - - -# Apache CarbonData Release Guide - -Apache CarbonData periodically declares and publishes releases. - -Each release is executed by a _Release Manager_, who is selected among the CarbonData committers. - This document describes the process that the Release Manager follows to perform a release. Any - changes to this process should be discussed and adopted on the - [dev@ mailing list](mailto:dev@carbondata.incubator.apache.org). - -Please remember that publishing software has legal consequences. This guide complements the -foundation-wide [Product Release Policy](http://www.apache.org/dev/release.html) and [Release -Distribution Policy](http://www.apache.org/dev/release-distribution). - -## Decide to release - -Deciding to release and selecting a Release Manager is the first step of the release process. -This is a consensus-based decision of the entire community. - -Anybody can propose a release on the dev@ mailing list, giving a solid argument and nominating a -committer as the Release Manager (including themselves). There's no formal process, no vote -requirements, and no timing requirements. Any objections should be resolved by consensus before -starting the release. - -_Checklist to proceed to next step:_ - -1. Community agrees to release -2. Community selects a Release Manager - -## Prepare for the release - -Before your first release, you should perform one-time configuration steps. This will set up your - security keys for signing the artifacts and access release repository. - -To prepare for each release, you should audit the project status in the Jira, and do necessary -bookkeeping. Finally, you should tag a release. - -### One-time setup instructions - -#### GPG Key - -You need to have a GPG key to sign the release artifacts. Please be aware of the ASF-wide -[release signing guidelines](https://www.apache.org/dev/release-signing.html). If you don't have -a GPG key associated with your Apache account, please create one according to the guidelines. - -Determine your Apache GPG key and key ID, as follows: - -``` -gpg --list-keys -``` - -This will list your GPG keys. One of these should reflect your Apache account, for exemple: - -``` -pub 2048R/845E6689 2016-02-23 -uid Nomen Nescio -sub 2048R/BA4D50BE 2016-02-23 -``` - -Here, the key ID is the 8-digit hex string in the `pub` line: `845E6689`. - -Now, add your Apache GPG key to the CarbonData's `KEYS` file in `dev` and `release` repositories -at `dist.apache.org`. Follow the instructions listed at the top of these files. - -Configure `git` to use this key when signing code by giving it your key ID, as follows: - -``` -git config --global user.signingkey 845E6689 -``` - -You may drop the `--global` option if you'd prefer to use this key for the current repository only. - -You may wish to start `gpg-agent` to unlock your GPG key only once using your passphrase. -Otherwise, you may need to enter this passphrase several times. The setup of `gpg-agent` varies -based on operating system, but may be something like this: - -``` -eval $(gpg-agent --daemon --no-grab --write-env-file $HOME/.gpg-agent-info) -export GPG_TTY=$(tty) -export GPG_AGENT_INFO -``` - -#### Access to Apache Nexus - -Configure access to the [Apache Nexus repository](https://repository.apache.org), used for -staging repository and promote the artifacts to Maven Central. - -1. You log in with your Apache account. -2. Confirm you have appropriate access by finding `org.apache.carbondata` under `Staging Profiles`. -3. Navigate to your `Profile` (top right dropdown menu of the page). -4. Choose `User Token` from the dropdown, then click `Access User Token`. Copy a snippet of the -Maven XML configuration block. -5. Insert this snippet twice into your global Maven `settings.xml` file, typically `${HOME]/ -.m2/settings.xml`. The end result should look like this, where `TOKEN_NAME` and `TOKEN_PASSWORD` -are your secret tokens: - -``` - - - - apache.releases.https - TOKEN_NAME - TOKEN_PASSWORD - - - apache.snapshots.https - TOKEN_NAME - TOKEN_PASSWORD - - - -``` - -#### Create a new version in Jira - -When contributors resolve an issue in Jira, they are tagging it with a release that will contain -their changes. With the release currently underway, new issues should be resolved against a -subsequent future release. Therefore, you should create a release item for this subsequent -release, as follows: - -1. In Jira, navigate to `CarbonData > Administration > Versions`. -2. Add a new release: choose the next minor version number compared to the one currently -underway, select today's date as the `Start Date`, and choose `Add`. - -#### Triage release-blocking issues in Jira - -There could be outstanding release-blocking issues, which should be triaged before proceeding to -build the release. We track them by assigning a specific `Fix Version` field even before the -issue is resolved. - -The list of release-blocking issues is available at the [version status page](https://issues.apache.org/jira/browse/CARBONDATA/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel). -Triage each unresolved issue with one of the following resolutions: - -* If the issue has been resolved and Jira was not updated, resolve it accordingly. -* If the issue has not been resolved and it is acceptable to defer until the next release, update - the `Fix Version` field to the new version you just created. Please consider discussing this - with stakeholders and the dev@ mailing list, as appropriate. -* If the issue has not been resolved and it is not acceptable to release until it is fixed, the - release cannot proceed. Instead, work with the CarbonData community to resolve the issue. - -#### Review Release Notes in Jira - -Jira automatically generates Release Notes based on the `Fix Version` applied to the issues. -Release Notes are intended for CarbonData users (not CarbonData committers/contributors). You -should ensure that Release Notes are informative and useful. - -Open the release notes from the [version status page](https://issues.apache.org/jira/browse/CARBONDATA/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel) -by choosing the release underway and clicking Release Notes. - -You should verify that the issues listed automatically by Jira are appropriate to appear in the -Release Notes. Specifically, issues should: - -* Be appropriate classified as `Bug`, `New Feature`, `Improvement`, etc. -* Represent noteworthy user-facing changes, such as new functionality, backward-incompatible -changes, or performance improvements. -* Have occurred since the previous release; an issue that was introduced and fixed between -releases should not appear in the Release Notes. -* Have an issue title that makes sense when read on its own. - -Adjust any of the above properties to the improve clarity and presentation of the Release Notes. - -#### Verify that a Release Build works - -Run `mvn clean install -Prelease` to ensure that the build processes that are specific to that -profile are in good shape. - -_Checklist to proceed to the next step:_ - -1. Release Manager's GPG key is published to `dist.apache.org`. -2. Release Manager's GPG key is configured in `git` configuration. -3. Release Manager has `org.apache.carbondata` listed under `Staging Profiles` in Nexus. -4. Release Manager's Nexus User Token is configured in `settings.xml`. -5. Jira release item for the subsequent release has been created. -6. There are no release blocking Jira issues. -7. Release Notes in Jira have been audited and adjusted. - -### Build a release - -Use Maven release plugin to tag and build release artifacts, as follows: - -``` -mvn release:prepare -``` - -Use Maven release plugin to stage these artifacts on the Apache Nexus repository, as follows: - -``` -mvn release:perform -``` - -Review all staged artifacts. They should contain all relevant parts for each module, including -`pom.xml`, jar, test jar, source, etc. Artifact names should follow -[the existing format](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.carbondata%22) -in which artifact name mirrors directory structure. Carefully review any new artifacts. - -Close the staging repository on Nexus. When prompted for a description, enter "Apache CarbonData -x.x.x release". - -### Stage source release on dist.apache.org - -Copy the source release to dev repository on `dist.apache.org`. - -1. If you have not already, check out the Incubator section of the `dev` repository on `dist -.apache.org` via Subversion. In a fresh directory: - -``` -svn co https://dist.apache.org/repos/dist/dev/incubator/carbondata -``` - -2. Make a directory for the new release: - -``` -mkdir x.x.x -``` - -3. Copy the CarbonData source distribution, hash, and GPG signature: - -``` -cp apache-carbondata-x.x.x-source-release.zip x.x.x -``` - -4. Add and commit the files: - -``` -svn add x.x.x -svn commit -``` - -5. Verify the files are [present](https://dist.apache.org/repos/dist/dev/incubator/carbondata). - -### Propose a pull request for website updates - -The final step of building a release candidate is to propose a website pull request. - -This pull request should update the following page with the new release: - -* `src/main/webapp/index.html` -* `src/main/webapp/docs/latest/mainpage.html` - -_Checklist to proceed to the next step:_ - -1. Maven artifacts deployed to the staging repository of -[repository.apache.org](https://repository.apache.org) -2. Source distribution deployed to the dev repository of -[dist.apache.org](https://dist.apache.org/repos/dist/dev/incubator/carbondata/) -3. Website pull request to list the release. - -## Vote on the release candidate - -Once you have built and individually reviewed the release candidate, please share it for the -community-wide review. Please review foundation-wide [voting guidelines](http://www.apache.org/foundation/voting.html) -for more information. - -Start the review-and-vote thread on the dev@ mailing list. Here's an email template; please -adjust as you see fit: - -``` -From: Release Manager -To: dev@carbondata.incubator.apache.org -Subject: [VOTE] Apache CarbonData Release x.x.x - -Hi everyone, -Please review and vote on the release candidate for the version x.x.x, as follows: - -[ ] +1, Approve the release -[ ] -1, Do not approve the release (please provide specific comments) - -The complete staging area is available for your review, which includes: -* JIRA release notes [1], -* the official Apache source release to be deployed to dist.apache.org [2], which is signed with the key with fingerprint FFFFFFFF [3], -* all artifacts to be deployed to the Maven Central Repository [4], -* source code tag "x.x.x" [5], -* website pull request listing the release [6]. - -The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. - -Thanks, -Release Manager - -[1] link -[2] link -[3] https://dist.apache.org/repos/dist/dist/incubator/carbondata/KEYS -[4] link -[5] link -[6] link -``` - -If there are any issues found in the release candidate, reply on the vote thread to cancel the vote. -There’s no need to wait 72 hours. Proceed to the `Cancel a Release (Fix Issues)` step below and -address the problem. -However, some issues don’t require cancellation. -For example, if an issue is found in the website pull request, just correct it on the spot and the -vote can continue as-is. - -If there are no issues, reply on the vote thread to close the voting. Then, tally the votes in a -separate email. Here’s an email template; please adjust as you see fit. - -``` -From: Release Manager -To: dev@carbondata.incubator.apache.org -Subject: [RESULT][VOTE] Apache CarbonData Release x.x.x - -I'm happy to announce that we have unanimously approved this release. - -There are XXX approving votes, XXX of which are binding: -* approver 1 -* approver 2 -* approver 3 -* approver 4 - -There are no disapproving votes. - -Thanks everyone! -``` - -While in incubation, the Apache Incubator PMC must also vote on each release, using the same -process as above. Start the review and vote thread on the `general@incubator.apache.org` list. - -``` -From: Release Manager -To: general@incubator.apache.org -Cc: dev@carbondata.incubator.apache.org -Subject: [VOTE] Apache CarbonData release x.x.x-incubating - -Hi everyone, -Please review and vote on the release candidate for the Apache CarbonData version x.x.x-incubating, - as follows: - -[ ] +1, Approve the release -[ ] -1, Do not approve the release (please provide specific comments) - -The complete staging area is available for your review, which includes: -* JIRA release notes [1], -* the official Apache source release to be deployed to dist.apache.org [2], -* all artifacts to be deployed to the Maven Central Repository [3], -* source code tag "x.x.x" [4], -* website pull request listing the release [5]. - -The Apache CarbonData community has unanimously approved this release [6]. - -As customary, the vote will be open for at least 72 hours. It is adopted by -a majority approval with at least three PMC affirmative votes. If approved, -we will proceed with the release. - -Thanks! - -[1] link -[2] link -[3] link -[4] link -[5] link -[6] lists.apache.org permalink to the vote result thread, e.g., https://lists.apache.org/thread -.html/32c991987e0abf2a09cd8afad472cf02e482af02ac35418ee8731940@%3Cdev.carbondata.apache.org%3E -``` - -If passed, close the voting and summarize the results: - -``` -From: Release Manager -To: general@incubator.apache.org -Cc: dev@carbondata.incubator.apache.org -Subject: [RESULT][VOTE] Apache CarbonData release x.x.x-incubating - -There are XXX approving votes, all of which are binding: -* approver 1 -* approver 2 -* approver 3 -* approver 4 - -There are no disapproving votes. - -We'll proceed with this release as staged. - -Thanks everyone! -``` - -_Checklist to proceed to the final step:_ - -1. Community votes to release the proposed release -2. While in incubation, Apache Incubator PMC votes to release the proposed release - -## Cancel a Release (Fix Issues) - -Any issue identified during the community review and vote should be fixed in this step. - -To fully cacel a vote: - -* Cancel the current release and verify the version is back to the correct SNAPSHOT: - -``` -mvn release:cancel -``` - -* Drop the release tag: - -``` -git tag -d x.x.x -git push --delete apache x.x.x -``` - -* Drop the staging repository on Nexus ([repository.apache.org](https://repository.apache.org)) - - -Verify the version is back to the correct SNAPSHOT. - -Code changes should be proposed as standard pull requests and merged. - -Once all issues have been resolved, you should go back and build a new release candidate with -these changes. - -## Finalize the release - -Once the release candidate has been reviewed and approved by the community, the release should be - finalized. This involves the final deployment of the release to the release repositories, - merging the website changes, and announce the release. - -### Deploy artifacts to Maven Central repository - -On Nexus, release the staged artifacts to Maven Central repository. In the `Staging Repositories` - section, find the relevant release candidate `orgapachecarbondata-XXX` entry and click `Release`. - -### Deploy source release to dist.apache.org - -Copy the source release from the `dev` repository to `release` repository at `dist.apache.org` -using Subversion. - -### Merge website pull request - -Merge the website pull request to list the release created earlier. - -### Mark the version as released in Jira - -In Jira, inside [version management](https://issues.apache.org/jira/plugins/servlet/project-config/CARBONDATA/versions) -, hover over the current release and a settings menu will appear. Click `Release`, and select -today's state. - -_Checklist to proceed to the next step:_ - -1. Maven artifacts released and indexed in the - [Maven Central repository](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.carbondata%22) -2. Source distribution available in the release repository of - [dist.apache.org](https://dist.apache.org/repos/dist/release/incubator/carbondata/) -3. Website pull request to list the release merged -4. Release version finalized in Jira - -## Promote the release - -Once the release has been finalized, the last step of the process is to promote the release -within the project and beyond. - -### Apache mailing lists - -Announce on the dev@ mailing list that the release has been finished. - -Announce on the user@ mailing list that the release is available, listing major improvements and -contributions. - -While in incubation, announce the release on the Incubator's general@ mailing list. - -_Checklist to declare the process completed:_ - -1. Release announced on the user@ mailing list. -2. Release announced on the Incubator's general@ mailing list. -3. Completion declared on the dev@ mailing list. http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/supported-data-types-in-carbondata.md ---------------------------------------------------------------------- diff --git a/src/site/markdown/supported-data-types-in-carbondata.md b/src/site/markdown/supported-data-types-in-carbondata.md deleted file mode 100644 index 8f271e3..0000000 --- a/src/site/markdown/supported-data-types-in-carbondata.md +++ /dev/null @@ -1,41 +0,0 @@ - - -# Data Types - -#### CarbonData supports the following data types: - - * Numeric Types - * SMALLINT - * INT/INTEGER - * BIGINT - * DOUBLE - * DECIMAL - - * Date/Time Types - * TIMESTAMP - * DATE - - * String Types - * STRING - * CHAR - - * Complex Types - * arrays: ARRAY```` - * structs: STRUCT```` http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/troubleshooting.md ---------------------------------------------------------------------- diff --git a/src/site/markdown/troubleshooting.md b/src/site/markdown/troubleshooting.md deleted file mode 100644 index 9181d83..0000000 --- a/src/site/markdown/troubleshooting.md +++ /dev/null @@ -1,247 +0,0 @@ - - -# Troubleshooting -This tutorial is designed to provide troubleshooting for end users and developers -who are building, deploying, and using CarbonData. - -## Failed to load thrift libraries - - **Symptom** - - Thrift throws following exception : - - ``` - thrift: error while loading shared libraries: - libthriftc.so.0: cannot open shared object file: No such file or directory - ``` - - **Possible Cause** - - The complete path to the directory containing the libraries is not configured correctly. - - **Procedure** - - Follow the Apache thrift docs at [https://thrift.apache.org/docs/install](https://thrift.apache.org/docs/install) to install thrift correctly. - -## Failed to launch the Spark Shell - - **Symptom** - - The shell prompts the following error : - - ``` - org.apache.spark.sql.CarbonContext$$anon$$apache$spark$sql$catalyst$analysis - $OverrideCatalog$_setter_$org$apache$spark$sql$catalyst$analysis - $OverrideCatalog$$overrides_$e - ``` - - **Possible Cause** - - The Spark Version and the selected Spark Profile do not match. - - **Procedure** - - 1. Ensure your spark version and selected profile for spark are correct. - - 2. Use the following command : - - ``` - "mvn -Pspark-2.1 -Dspark.version {yourSparkVersion} clean package" - ``` - - Note : Refrain from using "mvn clean package" without specifying the profile. - -## Failed to execute load query on cluster. - - **Symptom** - - Load query failed with the following exception: - - ``` - Dictionary file is locked for updation. - ``` - - **Possible Cause** - - The carbon.properties file is not identical in all the nodes of the cluster. - - **Procedure** - - Follow the steps to ensure the carbon.properties file is consistent across all the nodes: - - 1. Copy the carbon.properties file from the master node to all the other nodes in the cluster. - For example, you can use ssh to copy this file to all the nodes. - - 2. For the changes to take effect, restart the Spark cluster. - -## Failed to execute insert query on cluster. - - **Symptom** - - Load query failed with the following exception: - - ``` - Dictionary file is locked for updation. - ``` - - **Possible Cause** - - The carbon.properties file is not identical in all the nodes of the cluster. - - **Procedure** - - Follow the steps to ensure the carbon.properties file is consistent across all the nodes: - - 1. Copy the carbon.properties file from the master node to all the other nodes in the cluster. - For example, you can use scp to copy this file to all the nodes. - - 2. For the changes to take effect, restart the Spark cluster. - -## Failed to connect to hiveuser with thrift - - **Symptom** - - We get the following exception : - - ``` - Cannot connect to hiveuser. - ``` - - **Possible Cause** - - The external process does not have permission to access. - - **Procedure** - - Ensure that the Hiveuser in mysql must allow its access to the external processes. - -## Failure to read the metastore db during table creation. - - **Symptom** - - We get the following exception on trying to connect : - - ``` - Cannot read the metastore db - ``` - - **Possible Cause** - - The metastore db is dysfunctional. - - **Procedure** - - Remove the metastore db from the carbon.metastore in the Spark Directory. - -## Failed to load data on the cluster - - **Symptom** - - Data loading fails with the following exception : - - ``` - Data Load failure exeception - ``` - - **Possible Cause** - - The following issue can cause the failure : - - 1. The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not consistent across all nodes of the cluster. - - 2. Path to hdfs ddl is not configured correctly in the carbon.properties. - - **Procedure** - - Follow the steps to ensure the following configuration files are consistent across all the nodes: - - 1. Copy the core-site.xml, hive-site.xml, yarn-site,carbon.properties files from the master node to all the other nodes in the cluster. - For example, you can use scp to copy this file to all the nodes. - - Note : Set the path to hdfs ddl in carbon.properties in the master node. - - 2. For the changes to take effect, restart the Spark cluster. - - - -## Failed to insert data on the cluster - - **Symptom** - - Insertion fails with the following exception : - - ``` - Data Load failure exeception - ``` - - **Possible Cause** - - The following issue can cause the failure : - - 1. The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not consistent across all nodes of the cluster. - - 2. Path to hdfs ddl is not configured correctly in the carbon.properties. - - **Procedure** - - Follow the steps to ensure the following configuration files are consistent across all the nodes: - - 1. Copy the core-site.xml, hive-site.xml, yarn-site,carbon.properties files from the master node to all the other nodes in the cluster. - For example, you can use scp to copy this file to all the nodes. - - Note : Set the path to hdfs ddl in carbon.properties in the master node. - - 2. For the changes to take effect, restart the Spark cluster. - -## Failed to execute Concurrent Operations(Load,Insert,Update) on table by multiple workers. - - **Symptom** - - Execution fails with the following exception : - - ``` - Table is locked for updation. - ``` - - **Possible Cause** - - Concurrency not supported. - - **Procedure** - - Worker must wait for the query execution to complete and the table to release the lock for another query execution to succeed.. - -## Failed to create a table with a single numeric column. - - **Symptom** - - Execution fails with the following exception : - - ``` - Table creation fails. - ``` - - **Possible Cause** - - Behavior not supported. - - **Procedure** - - A single column that can be considered as dimension is mandatory for table creation. http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/useful-tips-on-carbondata.md ---------------------------------------------------------------------- diff --git a/src/site/markdown/useful-tips-on-carbondata.md b/src/site/markdown/useful-tips-on-carbondata.md deleted file mode 100644 index b1ff903..0000000 --- a/src/site/markdown/useful-tips-on-carbondata.md +++ /dev/null @@ -1,180 +0,0 @@ - - -# Useful Tips -This tutorial guides you to create CarbonData Tables and optimize performance. -The following sections will elaborate on the above topics : - -* [Suggestions to create CarbonData Table](#suggestions-to-create-carbondata-table) -* [Configurations For Optimizing CarbonData Performance](#configurations-for-optimizing-carbondata-performance) - -## Suggestions to Create CarbonData Table - -Recently CarbonData was used to analyze performance of Telecommunication field. -The results of the analysis for table creation with dimensions ranging from -10 thousand to 10 billion rows and 100 to 300 columns have been summarized below. - -The following table describes some of the columns from the table used. - - -**Table Column Description** - -| Column Name | Data Type | Cardinality | Attribution | -|-------------|---------------|-------------|-------------| -| msisdn | String | 30 million | Dimension | -| BEGIN_TIME | BigInt | 10 Thousand | Dimension | -| HOST | String | 1 million | Dimension | -| Dime_1 | String | 1 Thousand | Dimension | -| counter_1 | Numeric(20,0) | NA | Measure | -| ... | ... | NA | Measure | -| counter_100 | Numeric(20,0) | NA | Measure | - -CarbonData has more than 50 test cases, on the basis of these we have following suggestions to enhance the query performance : - - - -* **Put the frequently-used column filter in the beginning** - - For example, MSISDN filter is used in most of the query then we must put the MSISDN in the first column. -The create table command can be modified as suggested below : - -``` - create table carbondata_table( - msisdn String, - ... - )STORED BY 'org.apache.carbondata.format' - TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,..', - 'DICTIONARY_INCLUDE'='...'); -``` - - Now the query with MSISDN in the filter will be more efficient. - - -* **Put the frequently-used columns in the order of low to high cardinality** - - If the table in the specified query has multiple columns which are frequently used to filter the results, it is suggested to put - the columns in the order of cardinality low to high. This ordering of frequently used columns improves the compression ratio and - enhances the performance of queries with filter on these columns. - - For example if MSISDN, HOST and Dime_1 are frequently-used columns, then the column order of table is suggested as - Dime_1>HOST>MSISDN as Dime_1 has the lowest cardinality. - The create table command can be modified as suggested below : - -``` - create table carbondata_table( - Dime_1 String, - HOST String, - MSISDN String, - ... - )STORED BY 'org.apache.carbondata.format' - TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST..', - 'DICTIONARY_INCLUDE'='Dime_1..'); -``` - - -* **Put the Dimension type columns in order of low to high cardinality** - - If the columns used to filter are not frequently used, then it is suggested to order all the columns of dimension type in order of low to high cardinality. -The create table command can be modified as below : - -``` - create table carbondata_table( - Dime_1 String, - BEGIN_TIME bigint - HOST String, - MSISDN String, - ... - )STORED BY 'org.apache.carbondata.format' - TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI..', - 'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME..'); -``` - - -* **For measure type columns with non high accuracy, replace Numeric(20,0) data type with Double data type** - - For columns of measure type, not requiring high accuracy, it is suggested to replace Numeric data type with Double to enhance -query performance. The create table command can be modified as below : - -``` - create table carbondata_table( - Dime_1 String, - BEGIN_TIME bigint - HOST String, - MSISDN String, - counter_1 double, - counter_2 double, - ... - counter_100 double - )STORED BY 'org.apache.carbondata.format' - TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI', - 'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME'); -``` - The result of performance analysis of test-case shows reduction in query execution time from 15 to 3 seconds, thereby improving performance by nearly 5 times. - - -* **Columns of incremental character should be re-arranged at the end of dimensions** - - Consider the following scenario where data is loaded each day and the start_time is incremental for each load, it is -suggested to put start_time at the end of dimensions. - - Incremental values are efficient in using min/max index. The create table command can be modified as below : - -``` - create table carbondata_table( - Dime_1 String, - HOST String, - MSISDN String, - counter_1 double, - counter_2 double, - BEGIN_TIME bigint, - ... - counter_100 double - )STORED BY 'org.apache.carbondata.format' - TBLPROPERTIES ( 'DICTIONARY_EXCLUDE'='MSISDN,HOST,IMSI', - 'DICTIONARY_INCLUDE'='Dime_1,END_TIME,BEGIN_TIME'); -``` - - -* **Avoid adding high cardinality columns to dictionary** - - If the system has low memory configuration, then it is suggested to exclude high cardinality columns from the dictionary to -enhance load performance. Creation of dictionary for high cardinality columns at time of load will degrade load performance due to -excessive memory usage. - - By default CarbonData determines the cardinality at the first data load and allows for dictionary creation only if the cardinality is less than -1 million. - - -## Configurations for Optimizing CarbonData Performance - -Recently we did some performance POC on CarbonData for Finance and telecommunication Field. It involved detailed queries and aggregation -scenarios. After the completion of POC, some of the configurations impacting the performance have been identified and tabulated below : - -| Parameter | Location | Used For | Description | Tuning | -|----------------------------------------------|-----------------------------------|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| carbon.sort.intermediate.files.limit | spark/carbonlib/carbon.properties | Data loading | During the loading of data, local temp is used to sort the data. This number specifies the minimum number of intermediate files after which the merge sort has to be initiated. | Increasing the parameter to a higher value will improve the load performance. For example, when we increase the value from 20 to 100, it increases the data load performance from 35MB/S to more than 50MB/S. Higher values of this parameter consumes more memory during the load. | -| carbon.number.of.cores.while.loading | spark/carbonlib/carbon.properties | Data loading | Specifies the number of cores used for data processing during data loading in CarbonData. | If you have more number of CPUs, then you can increase the number of CPUs, which will increase the performance. For example if we increase the value from 2 to 4 then the CSV reading performance can increase about 1 times | -| carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data loading and Querying | For minor compaction, specifies the number of segments to be merged in stage 1 and number of compacted segments to be merged in stage 2. | Each CarbonData load will create one segment, if every load is small in size it will generate many small file over a period of time impacting the query performance. Configuring this parameter will merge the small segment to one big segment which will sort the data and improve the performance. For Example in one telecommunication scenario, the performance improves about 2 times after minor compaction. | -| spark.sql.shuffle.partitions | spark/con/spark-defaults.conf | Querying | The number of task started when spark shuffle. | The value can be 1 to 2 times as much as the executor cores. In an aggregation scenario, reducing the number from 200 to 32 reduced the query time from 17 to 9 seconds. | -| num-executors/executor-cores/executor-memory | spark/con/spark-defaults.conf | Querying | The number of executors, CPU cores, and memory used for CarbonData query. | In the bank scenario, we provide the 4 CPUs cores and 15 GB for each executor which can get good performance. This 2 value does not mean more the better. It needs to be configured properly in case of limited resources. For example, In the bank scenario, it has enough CPU 32 cores each node but less memory 64 GB each node. So we cannot give more CPU but less memory. For example, when 4 cores and 12GB for each executor. It sometimes happens GC during the query which impact the query performance very much from the 3 second to more than 15 seconds. In this scenario need to increase the memory or decrease the CPU cores. | -| carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading | The buffer size to store records, returned from the block scan. | In limit scenario this parameter is very important. For example your query limit is 1000. But if we set this value to 3000 that means we get 3000 records from scan but spark will only take 1000 rows. So the 2000 remaining are useless. In one Finance test case after we set it to 100, in the limit 1000 scenario the performance increase about 2 times in comparison to if we set this value to 12000. | -| carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether use YARN local directories for multi-table load disk load balance | If this is set it to true CarbonData will use YARN local directories for multi-table load disk load balance, that will improve the data load performance. | - - - \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/user-guide-toc.md ---------------------------------------------------------------------- diff --git a/src/site/markdown/user-guide-toc.md b/src/site/markdown/user-guide-toc.md deleted file mode 100755 index 5771e10..0000000 --- a/src/site/markdown/user-guide-toc.md +++ /dev/null @@ -1,46 +0,0 @@ - -# User Guide -Welcome to Apache CarbonData. Apache CarbonData(incubating) is a new big data file format for faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, which helps in speeding up queries by an order of magnitude faster over PetaBytes of data. -This user guide provides a detailed description about the CarbonData and its features. - -Let's get started ! - -* [Overview](overview-of-carbondata.md) - * Introduction - * Features - * [Data Types](supported-data-types-in-carbondata.md) - * [CarbonData File Structure](file-structure-of-carbondata.md) -* [Installation Guide](installation-guide.md) - * Installing and Configuring CarbonData on Standalone Spark Cluster - * Installing and Configuring CarbonData on "Spark on YARN Cluster -* [Configuring CarbonData](configuration-parameters.md) - * System Configuration - * Performance Configuration - * Miscellaneous Configuration - * Spark Configuration -* [Using CarbonData](using-carbondata.md) - * [Data Management](data-management.md) - * [DDL Operations on CarbonData](ddl-operation-on-carbondata.md ) - * [DML Operations on CarbonData](dml-operation-on-carbondata.md ) - - - - - http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/markdown/using-carbondata.md ---------------------------------------------------------------------- diff --git a/src/site/markdown/using-carbondata.md b/src/site/markdown/using-carbondata.md deleted file mode 100644 index 83a3655..0000000 --- a/src/site/markdown/using-carbondata.md +++ /dev/null @@ -1,35 +0,0 @@ -# Using CarbonData -This tutorial discusses the disciplines related to management of data in Apache CarbonData. -Following below each section is a brief introduction to respective disciplines related to data -management. - -## Data Management -This section shall be dealing with the disciplines related to managing data in the application, -focusing on conceptual details related to operations like load data, delete data, update data -and Compacting Data. - -For complete details refer to [Data Management](data-management.md) - -## Data Definition Language Support -This section deals with the aspects related to creation and modification of the structure of database. -It shall discuss in detail about - -* Table creation -* Table deletion -* Table description -* Compaction - -For complete details refer to [DDL Operations on CarbonData](ddl-operation-on-carbondata.md ) - -## Data Manipulation Language Support -This section deals with the aspects related to data manipulation in database. It shall discuss in detail about selecting, loading and deleting in a database. -This manipulation comprises of - -* Loading data into database tables -* Retrieving existing data -* Deleting data from existing tables -* Deleting segments from existing tables -* Updating data in existing tables - -For complete details refer to [DML Operations on CarbonData](dml-operation-on-carbondata.md) - http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/pdf.xml ---------------------------------------------------------------------- diff --git a/src/site/pdf.xml b/src/site/pdf.xml deleted file mode 100644 index 710e7c7..0000000 --- a/src/site/pdf.xml +++ /dev/null @@ -1,38 +0,0 @@ - - - - CarbonData Documentation - The Apache CarbonData Community - - - - - - - - - - - - - - - - - - - - - - ../../src/site/projectLogo/ApacheLogo.png - ../../src/site/projectLogo/CarbonDataLogo.png - Apache CarbonData - Ver 1.0 - Documentation - Apache CarbonData - The Apache Software Foundation - - \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/projectLogo/ApacheLogo.png ---------------------------------------------------------------------- diff --git a/src/site/projectLogo/ApacheLogo.png b/src/site/projectLogo/ApacheLogo.png deleted file mode 100644 index 9d25899..0000000 Binary files a/src/site/projectLogo/ApacheLogo.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/projectLogo/CarbonDataLogo.png ---------------------------------------------------------------------- diff --git a/src/site/projectLogo/CarbonDataLogo.png b/src/site/projectLogo/CarbonDataLogo.png deleted file mode 100644 index bc09b23..0000000 Binary files a/src/site/projectLogo/CarbonDataLogo.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/4f8753c1/src/site/site.xml ---------------------------------------------------------------------- diff --git a/src/site/site.xml b/src/site/site.xml deleted file mode 100644 index 997caa6..0000000 --- a/src/site/site.xml +++ /dev/null @@ -1,11 +0,0 @@ - - - - - - - - - - -