Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 150251828B for ; Wed, 15 Jul 2015 11:56:13 +0000 (UTC) Received: (qmail 17926 invoked by uid 500); 15 Jul 2015 11:56:13 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 17863 invoked by uid 500); 15 Jul 2015 11:56:13 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 17852 invoked by uid 99); 15 Jul 2015 11:56:12 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jul 2015 11:56:12 +0000 Received: from mail-qk0-f175.google.com (mail-qk0-f175.google.com [209.85.220.175]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 923BD1A0323 for ; Wed, 15 Jul 2015 11:56:12 +0000 (UTC) Received: by qkdv3 with SMTP id v3so25897089qkd.3 for ; Wed, 15 Jul 2015 04:56:11 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.140.237.136 with SMTP id i130mr708280qhc.87.1436961371705; Wed, 15 Jul 2015 04:56:11 -0700 (PDT) Received: by 10.96.132.97 with HTTP; Wed, 15 Jul 2015 04:56:11 -0700 (PDT) In-Reply-To: References: <1436955336018-7023.post@n3.nabble.com> Date: Wed, 15 Jul 2015 13:56:11 +0200 Message-ID: Subject: Re: Read XML from HDFS From: Kostas Tzoumas To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=001a113596141dda19051ae8a4e8 --001a113596141dda19051ae8a4e8 Content-Type: text/plain; charset=UTF-8 Perhaps there is also an existing HadoopInputFormat for XML that you might be able to reuse for your purposes (Flink supports Hadoop input formats). For example, there is an XMLInputFormat in the Apache Mahout codebase that you could take a look at: https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java On Wed, Jul 15, 2015 at 1:37 PM, Fabian Hueske wrote: > Hi Santosh, > > yes that is possible, if you want to read a complete file without splitting > it into records. However, you need to implement a custom InputFormat for > that which extends Flink's FileInputFormat. > > If you want to split it into records, you need a character sequence that > delimits two records. Depending on the schema and format of your data this > might not be possible. If you have such a delimiting character sequence, > you can use Flink's DelimitedInputFormat. > > Cheers, Fabian > > > 2015-07-15 12:15 GMT+02:00 santosh_rajaguru : > > > Hi, > > > > Is there any way to read the complete XML string or file from HDFS using > > flink? > > > > Thanks and Regards, > > Santosh > > > > > > > > -- > > View this message in context: > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-tp7023.html > > Sent from the Apache Flink Mailing List archive. mailing list archive at > > Nabble.com. > > > --001a113596141dda19051ae8a4e8--