hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <jason.had...@gmail.com>
Subject Re: What if an XML file cross boundary of HDFS chunks?
Date Sat, 31 Oct 2009 15:18:32 GMT
I use the StreamXMLRecordReader out of the streaming contrib package, it
works very well. Your key becomes the stanza you are looking for.

On Sat, Oct 31, 2009 at 7:38 AM, Oliver B. Fischer <o.b.fischer@swe-blog.net
> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello Jeff,
>
> does it means, that there is no programmatic possibility to define where
> a logical file will be splitted independent of the distribution of it
> blocks in the HDFS?
>
> Regards
>
> Oliver
>
> Jeff Zhang schrieb:
> > Hi Steve,
> >
> > When you want to read xml, you should provide your custom InputFormat
> which
> > extends FileInputFormat.
> >
> > and override the method isSplitable to not split a file , that means one
> xml
> > file for one mapper.
> >
> >
> >   protected boolean isSplitable(FileSystem fs, Path filename) {
> >     return false;
> >   }
>
>
> - --
> Oliver B. Fischer, Schönhauser Allee 64, 10437 Berlin
> Tel. +49 30 44793251, Mobil: +49 178 7903538
> Mail: o.b.fischer@swe-blog.net Blog: http://www.swe-blog.net
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJK7EwBAAoJELeiwuwqd1DGO/wIAJl8wwf6uNgm/ZwsGh8M1xvz
> wSEH9sD2cfjUSV3rmpHndKEfSTEOeHvvaJmJn24K9HhB9w8QyDogAgHawCdBY2TE
> K27n4wqSGlbLpQz4XmKUOVtFSooeEPUT58Jn2aMAno+nrWHM7oq9tuCJAAYkBexV
> wCrc7eE+o55TlAlx+LDWWS9mJrdTNBYqzoHh0gnWsEGm98CWvzn08tNA/L2moJbQ
> HZwnWzfgEBKBwAZUOYLFt2GigIYN3GE0pMp33BgjWi91zPwGSk7Bcq7XhObLK7o/
> uYxS+s3BTkLy+R6ngjOW1NLvg6STX37FpFNZowDmPt8Bzd8GxAefnqcxkVcnb90=
> =t6vV
> -----END PGP SIGNATURE-----
>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message