Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
MIME-Version: 1.0
In-Reply-To: <CAENxBwzTL-GESVMDDk536u-5Dw6BgaAZ6yWT4psrX982rDRLFA@mail.gmail.com>
References: <CAJuNEJB72Q0BJK+Q22b-vNCQW0AugLmN+SHrzMm-AoZ6WUPoHw@mail.gmail.com>
 <CAENxBwzTL-GESVMDDk536u-5Dw6BgaAZ6yWT4psrX982rDRLFA@mail.gmail.com>
From: Nishanth S <nishanth.2884@gmail.com>
Date: Fri, 2 Jun 2017 11:01:10 -0600
Message-ID: <CAJuNEJB8SaYH-zfC5q7FUxaGk3ztq_4RjkO-fwj1bjKgScWW2Q@mail.gmail.com>
Subject: Re: Migrating Variable Length Files to Hive
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary="f403045f1a88af47450550fd1971"
archived-at: Fri, 02 Jun 2017 17:01:23 -0000

--f403045f1a88af47450550fd1971
Content-Type: text/plain; charset="UTF-8"

Thanks Edward .  I am leaning towards using array .My nested data does not
have a schema .It  is a collection of strings and the number of strings can
vary.


On Fri, Jun 2, 2017 at 10:41 AM, Edward Capriolo <edlinuxguru@gmail.com>
wrote:

>
>
> On Fri, Jun 2, 2017 at 12:07 PM, Nishanth S <nishanth.2884@gmail.com>
> wrote:
>
>> Hello hive users,
>>
>> We are looking at migrating  files(less than 5 Mb of data in total) with
>> variable record lengths from a mainframe system to hive.You could think of
>> this as metadata.Each of these records can have columns  ranging from 3 to
>>  n( means  each record type have different number of columns) based on
>> record type.What would be the best strategy to migrate this  to hive .I was
>> thinking of converting these files  into one  variable length csv file and
>> then importing them to a hive table .Hive table will consist of 4 columns
>> with the 4th column having comma separated list of  values from column
>> column 4 to n.Are there other alternative or better approaches for this
>> solution.Appreciate any  feedback on this.
>>
>> Thanks,
>> Nishanth
>>
>
> Hive supports complex types like List, Map, and Struct and they can be
> arbitrarily nested. If the nested data has a schema that may be your best
> option. Potentially using thrift/avro/parquet/protobuf support.
>
> Otherwise you can store the data as Json and at read time parse things out
> using json udfs.
>
> Edward
>

--f403045f1a88af47450550fd1971
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks Edward .=C2=A0 I am leaning towards using array .My=
 nested data does not have a schema .It =C2=A0is a collection of strings an=
d the number of strings can vary.<div><br></div><div><br></div></div><div c=
lass=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, Jun 2, 2017 at =
10:41 AM, Edward Capriolo <span dir=3D"ltr">&lt;<a href=3D"mailto:edlinuxgu=
ru@gmail.com" target=3D"_blank">edlinuxguru@gmail.com</a>&gt;</span> wrote:=
<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef=
t:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><br><div class=3D"gmail=
_extra"><div><div class=3D"h5"><br><div class=3D"gmail_quote">On Fri, Jun 2=
, 2017 at 12:07 PM, Nishanth S <span dir=3D"ltr">&lt;<a href=3D"mailto:nish=
anth.2884@gmail.com" target=3D"_blank">nishanth.2884@gmail.com</a>&gt;</spa=
n> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px =
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"=
ltr">Hello hive users,<div><br></div><div>We are looking at migrating =C2=
=A0files(less than 5 Mb of data in total) with variable record lengths from=
 a mainframe system to hive.You could think of this as metadata.Each of the=
se records can have columns =C2=A0ranging from 3 to =C2=A0n( means =C2=A0ea=
ch record type have different number of columns) based on record type.What =
would be the best strategy to migrate this =C2=A0to hive .I was thinking of=
 converting these files =C2=A0into one =C2=A0variable length csv file and t=
hen importing them to a hive table .Hive table will consist of 4 columns wi=
th the 4th column having comma separated list of =C2=A0values from column c=
olumn 4 to n.Are there other alternative or better approaches for this solu=
tion.Appreciate any =C2=A0feedback on this.</div><div><br></div><div>Thanks=
,</div><div>Nishanth</div></div>
</blockquote></div><br></div></div>Hive supports complex types like List, M=
ap, and Struct and they can be=20
arbitrarily nested. If the nested data has a schema that may be your=20
best option. Potentially using thrift/avro/parquet/protobuf support.<br><br=
></div><div class=3D"gmail_extra">Otherwise you can store the data as Json =
and at read time parse things out using json udfs.<span class=3D"HOEnZb"><f=
ont color=3D"#888888"><br><br></font></span></div><span class=3D"HOEnZb"><f=
ont color=3D"#888888"><div class=3D"gmail_extra">Edward<br></div></font></s=
pan></div>
</blockquote></div><br></div>

--f403045f1a88af47450550fd1971--