flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Tzoumas <ktzou...@apache.org>
Subject Re: Read XML from HDFS
Date Wed, 15 Jul 2015 11:56:11 GMT
Perhaps there is also an existing HadoopInputFormat for XML that you might
be able to reuse for your purposes (Flink supports Hadoop input formats).

For example, there is an XMLInputFormat in the Apache Mahout codebase that
you could take a look at:
https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java




On Wed, Jul 15, 2015 at 1:37 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Santosh,
>
> yes that is possible, if you want to read a complete file without splitting
> it into records. However, you need to implement a custom InputFormat for
> that which extends Flink's FileInputFormat.
>
> If you want to split it into records, you need a character sequence that
> delimits two records. Depending on the schema and format of your data this
> might not be possible. If you have such a delimiting character sequence,
> you can use Flink's DelimitedInputFormat.
>
> Cheers, Fabian
>
>
> 2015-07-15 12:15 GMT+02:00 santosh_rajaguru <sanit4u@gmail.com>:
>
> > Hi,
> >
> > Is there any way to read the complete XML string or file from HDFS using
> > flink?
> >
> > Thanks and Regards,
> > Santosh
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-tp7023.html
> > Sent from the Apache Flink Mailing List archive. mailing list archive at
> > Nabble.com.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message