Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E722C8E7 for ; Wed, 12 Nov 2014 19:55:30 +0000 (UTC) Received: (qmail 65130 invoked by uid 500); 12 Nov 2014 19:55:26 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 65022 invoked by uid 500); 12 Nov 2014 19:55:26 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 65012 invoked by uid 99); 12 Nov 2014 19:55:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2014 19:55:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of david.novogrodsky@gmail.com designates 209.85.216.42 as permitted sender) Received: from [209.85.216.42] (HELO mail-qa0-f42.google.com) (209.85.216.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2014 19:55:21 +0000 Received: by mail-qa0-f42.google.com with SMTP id k15so9063010qaq.15 for ; Wed, 12 Nov 2014 11:52:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=gha/OglMNj/ZybjdbNJ4x12UNtgHzCZhBQ8KJHY68G0=; b=u3pqei+MNmBhmdJXTfokGW+ckpiwUeTCqXSM/+60Rj3H2idgCCbqRP3RixNE2BHais DEF75gveEKT2mj63j8SlZYXtkbRzJ4bc4Q5xQS+jx3BFxrq9bTf5FpdDUdAvIiYGmtmq zyLQ1AxbB37zDZkrgILmzK5PgoxDHAmPx4WCRtKCYMOyVD/T3T4fAg0dcubayJS4xdLA +kM5DZQ4nMmYMXeD3HjvQ6DfUBZvn6xyvE9D4Ldg26B1GFRWeYLPfhjcD6u1EP9Atbq+ pbI4xMzF0EbINtZa2fsQM1ZICbgmrqYOel6fXK1nmMLdsjpeNEKcykbOWYfz2J8PrxnY spJg== X-Received: by 10.224.12.145 with SMTP id x17mr64156508qax.13.1415821965925; Wed, 12 Nov 2014 11:52:45 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.144.215 with HTTP; Wed, 12 Nov 2014 11:52:25 -0800 (PST) From: David Novogrodsky Date: Wed, 12 Nov 2014 13:52:25 -0600 Message-ID: Subject: ingesting unstructured data into Hadoop, problem creating tables using Hive To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0149bc06583f5f0507aebd06 X-Virus-Checked: Checked by ClamAV on apache.org --089e0149bc06583f5f0507aebd06 Content-Type: text/plain; charset=UTF-8 I am trying to ingest unstructured data into Hive so it can be queried. I am trying to follow the steps in Tutorial Exercise 3, I am having some problems. The created tables has no data in it. Here is a sample of the unstructured data: 560)211-5250 437)810-5830 04:35 21 May 2014 17:26:39 356)539-2237 889)650-7326 30:29 26 Feb 2014 11:56:08 the data is tab-delimited. Here are the steps I am following: 1. a. make destination folder sudo -u hdfs hadoop fs -mkdir /user/cloudera/vector/callRecords b. copy data into destination folder sudo -u hdfs hadoop fs -copyFromLocal ~/Desktop/CDRecords.txt /user/cloudera/vector/callRecords/ 2. create Hive tables using the command line: CREATE EXTERNAL TABLE intermediate_call_records ( callFrom STRING, callTo STRING, callDuration STRING, date STRING, timeOfCall STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\n", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s" ) LOCATION '/user/cloudera/vector/callRecords'; David Novogrodsky david.novogrodsky@gmail.com http://www.linkedin.com/in/davidnovogrodsky --089e0149bc06583f5f0507aebd06 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

I am trying to ingest unstructured data into Hive so it can be queried.=C2= =A0 I am trying to follow the steps in Tutorial Exercise 3, I am having som= e problems.=C2=A0 The created tables has no data in it.=C2=A0 Here is a sam= ple of the unstructured data:

560)211-5250 437)810-5830 04= :35 21 May 2014 17:26:39
356)539-2237 889)650-7326 30:29 26 Feb 2014 11:= 56:08

=C2=A0

the data is tab-delimited.

=C2=A0

= =C2=A0

Here are the steps I am following:

1. a. make destinat= ion folder
sudo -u hdfs hadoop fs -mkdir /user/cloudera/vector/callRecor= ds

b. copy data into destination folder
sudo -u hdfs hadoop fs -c= opyFromLocal ~/Desktop/CDRecords.txt /user/cloudera/vector/callRecords/

=

=C2=A0

2. create Hive tables using the command line:

CREA= TE EXTERNAL TABLE intermediate_call_records (
callFrom STRING,
callTo= STRING,
callDuration STRING,
date STRING,
timeOfCall STRING)
R= OW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'<= br>WITH SERDEPROPERTIES (
"input.regex" =3D "([^\t]*)\t([= ^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\n",
"output.format.string= " =3D "%1$s %2$s %3$s %4$s %5$s"
)
LOCATION '/user= /cloudera/vector/callRecords';


--089e0149bc06583f5f0507aebd06--