incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From agateaaa <>
Subject Attach metadata to files in HDFS
Date Wed, 27 Jun 2012 17:20:37 GMT

I am evaluating HCatalog and have a specific use case. I like the fact that
HCatalog gives a consistent interface to the data in hdfs
across different tools like hive, pig and map reduce

We want to be able to associate metadata with the log files that we are
currently storing on hdfs.

We are pulling in thousands of log files and since the data in the log
files lacks certain
fields we end up adding those fields to the data before ingesting them in
hdfs before processing them further.

I was reading through the documentation, mailing lists and articles on
HCatalog I could find [1] and [2] below
which imply that it is possible to associate metadata with your data using

My questions are

1.) Can I define a schema and associate it with individual files or group
of files on hdfs ?

2.) Can I change this metadata schema over time and not affect existing

3.) Are these metadata fields available in pig scripts processing that data
so we could filter data using fields in
the metadata defined for these files?

I have used hive before and one possible solution I see is to use
partitions to define your metadata fields but I was just
wondering if there is any other HCatalog way of defining this metadata
which does not involve partitions.

Thanks in advance for your help,



View raw message