Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1E301043D for ; Thu, 27 Aug 2015 10:52:34 +0000 (UTC) Received: (qmail 73239 invoked by uid 500); 27 Aug 2015 10:52:31 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 73159 invoked by uid 500); 27 Aug 2015 10:52:31 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 73149 invoked by uid 99); 27 Aug 2015 10:52:31 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Aug 2015 10:52:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 136761AAE80 for ; Thu, 27 Aug 2015 10:52:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id vmq_BK67K6Jh for ; Thu, 27 Aug 2015 10:52:21 +0000 (UTC) Received: from mail-la0-f48.google.com (mail-la0-f48.google.com [209.85.215.48]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 6F9F02576E for ; Thu, 27 Aug 2015 10:52:20 +0000 (UTC) Received: by labia3 with SMTP id ia3so9701153lab.3 for ; Thu, 27 Aug 2015 03:52:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=tho7Npr9ItfM8qABxRAx0q7U79Enb0tAR2fgnnOgICw=; b=Xu0W2SLNHWYRmyG+6q2VEkdIqW+vGyUfL8bzYXIvJv+8fjlyjtN/aD3vAk29MV6qts 3n7ZRliiBlqVYEccDtjPkHP+O4UPQZgEViraijns3/TbI3o+crEiyeD827jMyzHes6kV zCvE5NP4K5dY/nGnHEL55wYnmAioEmHlcWi1jLrg6fcA9qZtDAeUZOGPbsZqA3yzWHBA sQzKxGAC8HhqrLJ9zvPlbQfkKGKdkNtdtpsFaoYcwf0xg19BedTBpWsw7Oexd5vnx1IT jRgrcG9THzvb2s1XdLUVtt1+vgDRJbaNQaUyWy9viAUZ4kwIn5obk6e6H3tZ6L2TT9Gw q1Ag== X-Received: by 10.112.168.7 with SMTP id zs7mr1076664lbb.26.1440672734373; Thu, 27 Aug 2015 03:52:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.156.197 with HTTP; Thu, 27 Aug 2015 03:51:35 -0700 (PDT) In-Reply-To: References: From: Masf Date: Thu, 27 Aug 2015 12:51:35 +0200 Message-ID: Subject: Re: SQLContext load. Filtering files To: Akhil Das Cc: "user@spark.apache.org" Content-Type: multipart/alternative; boundary=001a11c1aa9091ed81051e48c24d --001a11c1aa9091ed81051e48c24d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks Akhil, I will have a look. I have a dude regarding to spark streaming and filestream. If spark streaming crashs and while spark was down new files are created in input folder, when spark streaming is launched again, how can I process these files? Thanks. Regards. Miguel. On Thu, Aug 27, 2015 at 12:29 PM, Akhil Das wrote: > Have a look at the spark streaming. You can make use of the ssc.fileStrea= m. > > Eg: > > val avroStream =3D ssc.fileStream[AvroKey[GenericRecord], NullWritable, > AvroKeyInputFormat[GenericRecord]](input) > > You can also specify a filter function > > as the second argument. > > Thanks > Best Regards > > On Wed, Aug 19, 2015 at 10:46 PM, Masf wrote: > >> Hi. >> >> I'd like to read Avro files using this library >> https://github.com/databricks/spark-avro >> >> I need to load several files from a folder, not all files. Is there some >> functionality to filter the files to load? >> >> And... Is is possible to know the name of the files loaded from a folder= ? >> >> My problem is that I have a folder where an external process is insertin= g >> files every X minutes and I need process these files once, and I can't >> move, rename or copy the source files. >> >> >> Thanks >> -- >> >> Regards >> Miguel =C3=81ngel >> > > --=20 Saludos. Miguel =C3=81ngel --001a11c1aa9091ed81051e48c24d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks Akhil, I will have a look.

I hav= e a dude regarding to spark streaming and filestream. If spark streaming cr= ashs and while spark was down new files are created in input folder, when s= park streaming is launched again, how can I process these files?
=
Thanks.
Regards.
Miguel.

<= /div>


On Thu, Aug 27, 2015 at 12:29 PM, Akhil Das <= akhil@sigmo= idanalytics.com> wrote:
Have a look at the spark streaming. = You can make use of the ssc.fileStream.

<= /div>
Eg:

val avroStream =3D s= sc.fileStream[AvroKey[GenericRecord], NullWritable,=C2=A0
=C2=A0 =C2=A0 =C2=A0 AvroKeyInputFormat[GenericRecord]](= input)

You can also specify a filter function as the second argument.=C2=A0

Th= anks
Best Regards

On Wed, Aug 19, 2015 at 10:46 PM, Masf <m= asfworld@gmail.com> wrote:
=
Hi.

I'd like to read Avro files usi= ng this library https://github.com/databricks/spark-avro

I need to load several files from a folder, not all files. Is there = some functionality to filter the files to load?

An= d... Is is possible to know the name of the files loaded from a folder?

My problem is that I have a folder where an external = process is inserting files every X minutes and I need process these files o= nce, and I can't move, rename or copy the source files.
=

Thanks
-- =

Regards
Miguel =C3=81ngel
=




--


Saludos.
Miguel =C3= =81ngel
--001a11c1aa9091ed81051e48c24d--