hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mperformer <codevally.mail.l...@gmail.com>
Subject Re: Data are not displayed correctly on hive tables
Date Tue, 08 May 2012 22:22:03 GMT
Hi Mark

Thanks for the reply.

In the HDFS, the row looks like:

*1~Order Conf Req~<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<message scope="" type="">
    <messageId>0</messageId>
    <creatorId>0</creatorId>
    <creatorType/>
    <workspaceId>0</workspaceId>
    <workspaceType/>
    <createdTime>2006-06-01T17:59:09.413+10:00</createdTime>
    <status>0</status>
    <from/>
    <to/>
    <cc/>
    <bcc/>
    <nickname/><subject>Order Conf </subject>
    <body>Prod

Qty Amt

Dlvry Date

Price
</body>
    <smspreview/>
 <email bool="true">
<type/>
<body>Prod

Qty Amt

Dlvry Date

Price
</body>
 <footer/>
 </email>
 <voice bool="true">
 <type/>
 <header/>

<body/>
 <footer/>
 <other/>
 </voice><web
bool="false"><type></type><header></header><body></body><footer></footer></web>
 <notes/>
    <alert bool="false">
        <value/>
    </alert>
    <dlr bool="true">
        <type>ReportUnDelivered</type>
        <rule>Read</rule>
        <period>5
mins</period><publishToWeb>false</publishToWeb><expiryValue>0</expiryValue><expiryPeriod>0</expiryPeriod>
    </dlr>
</message>

~PROJECT~General Message~No description~null~true~2~28~0*

I used "~" as my field separator. But when I do a select query to my hive
table, it only shows the *"**<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>" *part from XML data.*

*Seems there are new line character (\n) in my XML piece and it is the
default row terminator in HIVE. If this is the case, is there any way to
use another character as the row terminator?

Many Thanks.

/Roshan *

*
On Mon, May 7, 2012 at 11:38 PM, Mark Grover <mgrover@oanda.com> wrote:

> Hi Roshan,
> The following snippet summarizes the delimiters for your Hive table:
>         colelction.delim        \u0002
>        field.delim             \u0001
>        mapkey.delim            \u0003
>        serialization.format    \u0001
>
>
> Your fields are delimited by \u0001, collections are delimited by \u0002
> and the delimiter between the key and value in any maps is \u0003. Can you
> verify that your XML content doesn't contain any of these characters?
>
> If this still doesn't help, could you pick an affected row and share what
> the XML appears as in Hive and what it is expected to be?
>
> Good luck!
> Mark
>
> Mark Grover, Business Intelligence Analyst
> OANDA Corporation
>
> www: oanda.com www: fxtrade.com
>
> ----- Original Message -----
> From: "mperformer" <codevally.mail.list@gmail.com>
> To: user@hive.apache.org
> Sent: Sunday, May 6, 2012 11:34:55 PM
> Subject: Re: Data are not displayed correctly on hive tables
>
> Hi Mark
>
> Many thanks for your reply. Please find the below output.
>
> hive> describe formatted messagetemplate;
> OK
> # col_name data_type comment
>
> messagetemplateid bigint None
> messagetemplatename string None
> datacol string None
> messagetemplatetype string None
> messagetype string None
> messagetemplatedescription string None
> originatingtemplateid bigint None
> edited boolean None
> userid bigint None
> projectid bigint None
> responsetemplateid bigint None
>
> # Detailed Table Information
> Database: default
> Owner: root
> CreateTime: Mon May 07 12:06:59 EST 2012
> LastAccessTime: UNKNOWN
> Protect Mode: None
> Retention: 0
> Location: hdfs://app6:9100/mnt/hive-test/warehouse/messagetemplate
> Table Type: MANAGED_TABLE
> Table Parameters:
> comment This is the messagetemplate table
> transient_lastDdlTime 1336356473
>
> # Storage Information
> SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat: org.apache.hadoop.mapred.TextInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Compressed: No
> Num Buckets: -1
> Bucket Columns: []
> Sort Columns: []
> Storage Desc Params:
> colelction.delim \u0002
> field.delim \u0001
> mapkey.delim \u0003
> serialization.format \u0001
> Time taken: 3.2 seconds
>
> Thanks again.
>
> ./Roshan.
>
>
> On Mon, May 7, 2012 at 1:06 PM, Mark Grover < mgrover@oanda.com > wrote:
>
>
> Could you share the output of the following command in Hive:
> describe formatted messagetemplate
>
> My hunch is that your Hive table is using a delimiter (e.g. '\t') that
> appears in the content of your XML.
>
> Mark Grover, Business Intelligence Analyst
> OANDA Corporation
>
> www: oanda.com www: fxtrade.com
>
>
>
> ----- Original Message -----
> From: "mperformer" < codevally.mail.list@gmail.com >
> To: user@hive.apache.org
> Sent: Sunday, May 6, 2012 8:34:27 PM
> Subject: Data are not displayed correctly on hive tables
>
> Hi
>
> I am using
>
> • Hadoop 0.20.2
> • Hive 0.8.1
> • Sqoop 1.4.1-incubating
>
>
>
> in my sample project. Currently I am importing data from PostgreSQL to
> Hive table using Sqoop. My database table in PostgreSQL has 4 columns and
> one column stores a bit large XML file as TEXT data type. The same column
> defined in HIVE as string, but after that column data is not importing and
> shows as null;
>
> Table structure in PostgreSQL
>
> CREATE TABLE public.messagetemplate (
> messagetemplateid BIGSERIAL,
> messagetemplatename TEXT,
> data TEXT,
> messagetemplatetype TEXT,
> CONSTRAINT pk_messagetemplate PRIMARY KEY(messagetemplateid)
> ) WITHOUT OIDS;
>
> Table structure in Hive
>
> hive> desc messagetemplate;
> OK
> messagetemplateid bigint
> messagetemplatename string
> data string
> messagetemplatetype string
>
>
> The data column store the XML file as text, but during the import to hive,
> all data are imported properly (checked the files in HDFS). But using HIVE
> select statement, it only shows small part from the XML text and the rest
> column (last column) is null.
>
> Could someone please help me to sort this out. Thanks.
>
>
>
>
>
>

Mime
View raw message