Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of david.novogrodsky@gmail.com
 designates 209.85.216.42 as permitted sender)
MIME-Version: 1.0
From: David Novogrodsky <david.novogrodsky@gmail.com>
Date: Wed, 12 Nov 2014 13:52:25 -0600
Message-ID: 
 <CALUpvHWETKuHx8aBwp9bCdUXXP+qj+zS5kw_uotisfSvkF4bng@mail.gmail.com>
Subject: ingesting unstructured data into Hadoop, problem creating tables
 using Hive
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=089e0149bc06583f5f0507aebd06

--089e0149bc06583f5f0507aebd06
Content-Type: text/plain; charset=UTF-8

I am trying to ingest unstructured data into Hive so it can be queried.  I
am trying to follow the steps in Tutorial Exercise 3, I am having some
problems.  The created tables has no data in it.  Here is a sample of the
unstructured data&colon;

560)211-5250 437)810-5830 04:35 21 May 2014 17:26:39
356)539-2237 889)650-7326 30:29 26 Feb 2014 11:56:08


the data is tab-delimited.


Here are the steps I am following:

1. a. make destination folder
sudo -u hdfs hadoop fs -mkdir /user/cloudera/vector/callRecords

b. copy data into destination folder
sudo -u hdfs hadoop fs -copyFromLocal ~/Desktop/CDRecords.txt
/user/cloudera/vector/callRecords/


2. create Hive tables using the command line:

CREATE EXTERNAL TABLE intermediate_call_records (
callFrom STRING,
callTo STRING,
callDuration STRING,
date STRING,
timeOfCall STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\n",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
LOCATION '/user/cloudera/vector/callRecords';


David Novogrodsky
david.novogrodsky@gmail.com
http://www.linkedin.com/in/davidnovogrodsky

--089e0149bc06583f5f0507aebd06
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-size:large"><p =
style=3D"margin:0px;padding:0px;color:rgb(102,102,102);font-family:arial,sa=
ns-serif;font-size:11px;line-height:14px;background-color:rgb(253,253,253)"=
>I am trying to ingest unstructured data into Hive so it can be queried.=C2=
=A0 I am trying to follow the steps in Tutorial Exercise 3, I am having som=
e problems.=C2=A0 The created tables has no data in it.=C2=A0 Here is a sam=
ple of the unstructured data&amp;colon;</p><p style=3D"margin:0px;padding:0=
px;color:rgb(102,102,102);font-family:arial,sans-serif;font-size:11px;line-=
height:14px;background-color:rgb(253,253,253)">560)211-5250 437)810-5830 04=
:35 21 May 2014 17:26:39<br>356)539-2237 889)650-7326 30:29 26 Feb 2014 11:=
56:08</p><p style=3D"margin:0px;padding:0px;color:rgb(102,102,102);font-fam=
ily:arial,sans-serif;font-size:11px;line-height:14px;background-color:rgb(2=
53,253,253)">=C2=A0</p><p style=3D"margin:0px;padding:0px;color:rgb(102,102=
,102);font-family:arial,sans-serif;font-size:11px;line-height:14px;backgrou=
nd-color:rgb(253,253,253)">the data is tab-delimited.</p><p style=3D"margin=
:0px;padding:0px;color:rgb(102,102,102);font-family:arial,sans-serif;font-s=
ize:11px;line-height:14px;background-color:rgb(253,253,253)">=C2=A0</p><p s=
tyle=3D"margin:0px;padding:0px;color:rgb(102,102,102);font-family:arial,san=
s-serif;font-size:11px;line-height:14px;background-color:rgb(253,253,253)">=
=C2=A0</p><p style=3D"margin:0px;padding:0px;color:rgb(102,102,102);font-fa=
mily:arial,sans-serif;font-size:11px;line-height:14px;background-color:rgb(=
253,253,253)">Here are the steps I am following:</p><p style=3D"margin:0px;=
padding:0px;color:rgb(102,102,102);font-family:arial,sans-serif;font-size:1=
1px;line-height:14px;background-color:rgb(253,253,253)">1. a. make destinat=
ion folder<br>sudo -u hdfs hadoop fs -mkdir /user/cloudera/vector/callRecor=
ds<br><br>b. copy data into destination folder<br>sudo -u hdfs hadoop fs -c=
opyFromLocal ~/Desktop/CDRecords.txt /user/cloudera/vector/callRecords/</p>=
<p style=3D"margin:0px;padding:0px;color:rgb(102,102,102);font-family:arial=
,sans-serif;font-size:11px;line-height:14px;background-color:rgb(253,253,25=
3)">=C2=A0</p><p style=3D"margin:0px;padding:0px;color:rgb(102,102,102);fon=
t-family:arial,sans-serif;font-size:11px;line-height:14px;background-color:=
rgb(253,253,253)">2. create Hive tables using the command line:</p><p style=
=3D"margin:0px;padding:0px;color:rgb(102,102,102);font-family:arial,sans-se=
rif;font-size:11px;line-height:14px;background-color:rgb(253,253,253)">CREA=
TE EXTERNAL TABLE intermediate_call_records (<br>callFrom STRING,<br>callTo=
 STRING,<br>callDuration STRING,<br>date STRING,<br>timeOfCall STRING)<br>R=
OW FORMAT SERDE &#39;org.apache.hadoop.hive.contrib.serde2.RegexSerDe&#39;<=
br>WITH SERDEPROPERTIES (<br>&quot;input.regex&quot; =3D &quot;([^\t]*)\t([=
^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\n&quot;,<br>&quot;output.format.string=
&quot; =3D &quot;%1$s %2$s %3$s %4$s %5$s&quot;<br>)<br>LOCATION &#39;/user=
/cloudera/vector/callRecords&#39;;</p><p style=3D"margin:0px;padding:0px;co=
lor:rgb(102,102,102);font-family:arial,sans-serif;font-size:11px;line-heigh=
t:14px;background-color:rgb(253,253,253)"><br></p></div><div><div class=3D"=
gmail_signature">David Novogrodsky<br><a href=3D"mailto:david.novogrodsky@g=
mail.com">david.novogrodsky@gmail.com</a><br><a href=3D"http://www.linkedin=
.com/in/davidnovogrodsky">http://www.linkedin.com/in/davidnovogrodsky</a></=
div></div>
</div>

--089e0149bc06583f5f0507aebd06--