Mailing-List: contact dev-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAAdrtT04pF1XTCmHfBeDDK0v6DdTDMKFfpSpOPYSps_dLBsUnA@mail.gmail.com>
References: <1436955336018-7023.post@n3.nabble.com>
	<CAAdrtT04pF1XTCmHfBeDDK0v6DdTDMKFfpSpOPYSps_dLBsUnA@mail.gmail.com>
Date: Wed, 15 Jul 2015 13:56:11 +0200
Message-ID: 
 <CAGWx-_vEDRj7rT4GHjtr+YK22gcLWgnjakLfuUHZS71HX5NpEQ@mail.gmail.com>
Subject: Re: Read XML from HDFS
From: Kostas Tzoumas <ktzoumas@apache.org>
To: "dev@flink.apache.org" <dev@flink.apache.org>
Content-Type: multipart/alternative; boundary=001a113596141dda19051ae8a4e8

--001a113596141dda19051ae8a4e8
Content-Type: text/plain; charset=UTF-8

Perhaps there is also an existing HadoopInputFormat for XML that you might
be able to reuse for your purposes (Flink supports Hadoop input formats).

For example, there is an XMLInputFormat in the Apache Mahout codebase that
you could take a look at:
https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java


On Wed, Jul 15, 2015 at 1:37 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Santosh,
>
> yes that is possible, if you want to read a complete file without splitting
> it into records. However, you need to implement a custom InputFormat for
> that which extends Flink's FileInputFormat.
>
> If you want to split it into records, you need a character sequence that
> delimits two records. Depending on the schema and format of your data this
> might not be possible. If you have such a delimiting character sequence,
> you can use Flink's DelimitedInputFormat.
>
> Cheers, Fabian
>
>
> 2015-07-15 12:15 GMT+02:00 santosh_rajaguru <sanit4u@gmail.com>:
>
> > Hi,
> >
> > Is there any way to read the complete XML string or file from HDFS using
> > flink?
> >
> > Thanks and Regards,
> > Santosh
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-tp7023.html
> > Sent from the Apache Flink Mailing List archive. mailing list archive at
> > Nabble.com.
> >
>

--001a113596141dda19051ae8a4e8--