hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franc Carter <franc.car...@sirca.org.au>
Subject Parts of a file as input
Date Tue, 27 Mar 2012 06:02:36 GMT

I'm very new to Hadoop and am working through how we may be able to apply
it to our data set.

One of the things that I am struggling with is understanding if it is
possible to pass tell Hadoop that only parts of the input file will be
needed for a specific job. The reason I believe I may need this is that we
have two big dimensions in our data set. Queries may want only one of these
dimensions and while some un-needed reading is unavoidable there are cases
that reading the entire data set presents a very significant overhead.

Or have I just misunderstood something ;-(



*Franc Carter* | Systems architect | Sirca Ltd

franc.carter@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message