Return-Path: X-Original-To: apmail-apex-dev-archive@minotaur.apache.org Delivered-To: apmail-apex-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C22BB18E1C for ; Fri, 11 Mar 2016 08:01:45 +0000 (UTC) Received: (qmail 70693 invoked by uid 500); 11 Mar 2016 08:01:45 -0000 Delivered-To: apmail-apex-dev-archive@apex.apache.org Received: (qmail 70626 invoked by uid 500); 11 Mar 2016 08:01:45 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 70615 invoked by uid 99); 11 Mar 2016 08:01:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2016 08:01:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 2DFF7C0DA2 for ; Fri, 11 Mar 2016 08:01:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.349 X-Spam-Level: X-Spam-Status: No, score=-4.349 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.329] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 4w9IE3yRsKSV for ; Fri, 11 Mar 2016 08:01:44 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with SMTP id E39055FB2E for ; Fri, 11 Mar 2016 08:01:42 +0000 (UTC) Received: (qmail 69882 invoked by uid 99); 11 Mar 2016 08:01:42 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2016 08:01:42 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D4B7D2C1F54 for ; Fri, 11 Mar 2016 08:01:41 +0000 (UTC) Date: Fri, 11 Mar 2016 08:01:41 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: dev@apex.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (APEXMALHAR-2008) Create hdfs file input module MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/APEXMALHAR-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190608#comment-15190608 ] ASF GitHub Bot commented on APEXMALHAR-2008: -------------------------------------------- Github user DT-Priyanka commented on a diff in the pull request: https://github.com/apache/incubator-apex-malhar/pull/207#discussion_r55799890 --- Diff: library/src/main/java/com/datatorrent/lib/io/block/BlockReader.java --- @@ -0,0 +1,81 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package com.datatorrent.lib.io.block; + +import java.io.IOException; +import java.net.URI; + +import org.apache.hadoop.fs.FileSystem; + +import com.google.common.base.Splitter; + +/** + * BlockReader extends {@link FSSliceReader} to accept case insensitive uri + */ +public class BlockReader extends FSSliceReader +{ + protected String uri; + + @Override + protected FileSystem getFSInstance() throws IOException + { + return FileSystem.newInstance(URI.create(uri), configuration); + } + + /** + * Sets the uri + * + * @param uri of form hdfs://hostname:port/path/to/input + */ + public void setUri(String uri) + { + this.uri = convertSchemeToLowerCase(uri); + } + + public String getUri() + { + return uri; + } + + /** + * Converts Scheme part of the URI to lower case. Multiple URI can be comma separated. If no scheme is there, no + * change is made. + * + * @param uri + * @return String with scheme part as lower case + */ + private String convertSchemeToLowerCase(String uri) + { + if (uri == null) { + return null; + } + StringBuilder uriList = new StringBuilder(); + for (String f : Splitter.on(",").omitEmptyStrings().split(uri)) { --- End diff -- We expect uses to configure input file(s)/directories separated by ",". Should we parse it in list during set method? Does it have any advantage? > Create hdfs file input module > ------------------------------ > > Key: APEXMALHAR-2008 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2008 > Project: Apache Apex Malhar > Issue Type: Task > Reporter: Priyanka Gugale > Assignee: Priyanka Gugale > Priority: Minor > Original Estimate: 72h > Remaining Estimate: 72h > > To read HDFS files in parallel using Apex we normally use FileSplitter and FileReader module. It would be a good idea to combine those operators as a unit in module. Having a module will give us readily usable set of operators to read HDFS files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)