Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 01ADD200C88 for ; Fri, 2 Jun 2017 19:01:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F330A160BD2; Fri, 2 Jun 2017 17:01:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1ECCB160BBA for ; Fri, 2 Jun 2017 19:01:21 +0200 (CEST) Received: (qmail 49705 invoked by uid 500); 2 Jun 2017 17:01:20 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 49695 invoked by uid 99); 2 Jun 2017 17:01:20 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Jun 2017 17:01:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 66290C028B for ; Fri, 2 Jun 2017 17:01:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.647 X-Spam-Level: X-Spam-Status: No, score=-0.647 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.796, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id cmdq9WPbSIeE for ; Fri, 2 Jun 2017 17:01:19 +0000 (UTC) Received: from mail-wr0-f180.google.com (mail-wr0-f180.google.com [209.85.128.180]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id D68295FB5C for ; Fri, 2 Jun 2017 17:01:18 +0000 (UTC) Received: by mail-wr0-f180.google.com with SMTP id v111so3422684wrc.3 for ; Fri, 02 Jun 2017 10:01:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=XITS5+6m7dd8kuWnrTicxy/FZ45yw9bkkaPHzAilxPg=; b=foNtRWRGZboz6FX2KGqPlQJdbkJlD3+M1TCs5x5HTTtUra34xar2eXVj3AnifOOVpB B+n6sOg0flDCBS4uU9r9hLWaz/bJiTR+TzhX5BmF11+EVQFGnohB8C3MYWw4kcg/bnpK lCLqKeQqP5EcM6I7GgvZjxdpx7s4GLKMm4EJxKRT044se7e0I5igJ4KzlX05U/O6oh0g hUNUgJqHi868Wc7FAueBwp9pCGrUlq3T0XOLTpdaRdc1vcA2YmyCKIqigeVNYjmOXJtE DQZA/r8b8YxzaE1rL/NAnYu3V0yUwDOw9q2DPeybmTUqzmjwwzgd81y+T/S0zwv9sdiY nxuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=XITS5+6m7dd8kuWnrTicxy/FZ45yw9bkkaPHzAilxPg=; b=gfO2Ig4NO1RayR6mzlMy2/Bzf3GGiewyUtcfAyrTpUOvuAVQ90Pi4QxcGAQsZLTSwS Q7rY0UlZGeRo7P4zzwbwYeChChRGv1E5N9MYzxpD3k0E8m7P6BYL4H2TWfGrzEYA5YaD QLUOs8tza6giU82/wQ5nnuXOi5Hgt8JSct1CZxYXlALzwUIlTV8zck7GTzDh1U4Flvir 2+3EZia1D7uO1i/a9yFMkKIGJ7fSUsaFS92oqdGoGQFlzqQcyIBAadAieqzRyEx+IStk pV3i/725DoihKrGUN63eKSaG7gN5L/eCBrU9+vop+Hi6ZlW5nEhX8HFkCDGN1ILuK6X9 LYtA== X-Gm-Message-State: AODbwcBdg47hKHLd8cF9txB4A5/bPVlAiXHMNaTCAvY3Z6Oj5BcuESPF QpydMKKSCVbAcJriZSce6tmqVAg/bA== X-Received: by 10.223.163.13 with SMTP id c13mr1516986wrb.56.1496422871407; Fri, 02 Jun 2017 10:01:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.136.215 with HTTP; Fri, 2 Jun 2017 10:01:10 -0700 (PDT) In-Reply-To: References: From: Nishanth S Date: Fri, 2 Jun 2017 11:01:10 -0600 Message-ID: Subject: Re: Migrating Variable Length Files to Hive To: user@hive.apache.org Content-Type: multipart/alternative; boundary="f403045f1a88af47450550fd1971" archived-at: Fri, 02 Jun 2017 17:01:23 -0000 --f403045f1a88af47450550fd1971 Content-Type: text/plain; charset="UTF-8" Thanks Edward . I am leaning towards using array .My nested data does not have a schema .It is a collection of strings and the number of strings can vary. On Fri, Jun 2, 2017 at 10:41 AM, Edward Capriolo wrote: > > > On Fri, Jun 2, 2017 at 12:07 PM, Nishanth S > wrote: > >> Hello hive users, >> >> We are looking at migrating files(less than 5 Mb of data in total) with >> variable record lengths from a mainframe system to hive.You could think of >> this as metadata.Each of these records can have columns ranging from 3 to >> n( means each record type have different number of columns) based on >> record type.What would be the best strategy to migrate this to hive .I was >> thinking of converting these files into one variable length csv file and >> then importing them to a hive table .Hive table will consist of 4 columns >> with the 4th column having comma separated list of values from column >> column 4 to n.Are there other alternative or better approaches for this >> solution.Appreciate any feedback on this. >> >> Thanks, >> Nishanth >> > > Hive supports complex types like List, Map, and Struct and they can be > arbitrarily nested. If the nested data has a schema that may be your best > option. Potentially using thrift/avro/parquet/protobuf support. > > Otherwise you can store the data as Json and at read time parse things out > using json udfs. > > Edward > --f403045f1a88af47450550fd1971 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Edward .=C2=A0 I am leaning towards using array .My= nested data does not have a schema .It =C2=A0is a collection of strings an= d the number of strings can vary.



On Fri, Jun 2, 2017 at = 10:41 AM, Edward Capriolo <edlinuxguru@gmail.com> wrote:=


On Fri, Jun 2= , 2017 at 12:07 PM, Nishanth S <nishanth.2884@gmail.com> wrote:
Hello hive users,

We are looking at migrating =C2= =A0files(less than 5 Mb of data in total) with variable record lengths from= a mainframe system to hive.You could think of this as metadata.Each of the= se records can have columns =C2=A0ranging from 3 to =C2=A0n( means =C2=A0ea= ch record type have different number of columns) based on record type.What = would be the best strategy to migrate this =C2=A0to hive .I was thinking of= converting these files =C2=A0into one =C2=A0variable length csv file and t= hen importing them to a hive table .Hive table will consist of 4 columns wi= th the 4th column having comma separated list of =C2=A0values from column c= olumn 4 to n.Are there other alternative or better approaches for this solu= tion.Appreciate any =C2=A0feedback on this.

Thanks= ,
Nishanth

Hive supports complex types like List, M= ap, and Struct and they can be=20 arbitrarily nested. If the nested data has a schema that may be your=20 best option. Potentially using thrift/avro/parquet/protobuf support.
Otherwise you can store the data as Json = and at read time parse things out using json udfs.

Edward

--f403045f1a88af47450550fd1971--