From dev-return-4856-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org  Tue Nov 20 20:21:00 2018
Return-Path: <dev-return-4856-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 3A97318064E
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 20 Nov 2018 20:21:00 +0100 (CET)
Received: (qmail 2363 invoked by uid 500); 20 Nov 2018 19:20:59 -0000
Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@mxnet.incubator.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@mxnet.incubator.apache.org>
List-Post: <mailto:dev@mxnet.incubator.apache.org>
List-Id: <dev.mxnet.incubator.apache.org>
Reply-To: dev@mxnet.incubator.apache.org
Delivered-To: mailing list dev@mxnet.incubator.apache.org
Received: (qmail 2350 invoked by uid 99); 20 Nov 2018 19:20:58 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2018 19:20:58 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 471AD18EA30
	for <dev@mxnet.apache.org>; Tue, 20 Nov 2018 19:20:58 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 0.339
X-Spam-Level:
X-Spam-Status: No, score=0.339 tagged_above=-999 required=6.31
	tests=[DKIMWL_WL_MED=-1.46, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
	DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id CGaPO18fh1Zt for <dev@mxnet.apache.org>;
	Tue, 20 Nov 2018 19:20:56 +0000 (UTC)
Received: from mail-yw1-f45.google.com (mail-yw1-f45.google.com [209.85.161.45])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id DD8875F283
	for <dev@mxnet.incubator.apache.org>; Tue, 20 Nov 2018 19:20:55 +0000 (UTC)
Received: by mail-yw1-f45.google.com with SMTP id g75so1242205ywb.1
        for <dev@mxnet.incubator.apache.org>; Tue, 20 Nov 2018 11:20:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=yH8C6PHNv8PvGBJ3niWvrtdjc8YwQv+0iLacORp0Opo=;
        b=krpARaHCIvsoYv9JoW5zPN0vZs3e13p0pUESD2UbvfB9GIBYzWpIOTDmiY75ptPIiL
         kG7wDCfRsROqGTHhU+ov+XuKwGJxQmTek/02ggToUeToOx3UddT7I/HTv9QL9BTkwo32
         J7ayKgCLD6fnyjcGWBay7me+CRwxkAsyGjdEXQzz/P8EF8XtaE6kLEIXvXIwJ6s/9mqX
         WI9hCAjzLV5cO+H4HSbUQDMfLybkguTfsobQm5qJKjejhA9+f1B7AbSbt9bD3leCvOKn
         VYyzndiL+ggGvs6T69hd+2aXG8ymzQUYrAiQLOglR3Ei8ReWMVH0mNmz34/9jBMnfW4R
         m/og==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=yH8C6PHNv8PvGBJ3niWvrtdjc8YwQv+0iLacORp0Opo=;
        b=SjsvcPVik4E1qa6by9NepgxRhl5RUI2hq+0Tg5RxtMfw+TLhq/Y3row8h6aPGEA1MX
         dZsR2vx5TDs0xvF/h+nTIYr1NSEhrpq+pn/6ETyoSdXFULn1sNEfQPbVkL+Dm01BQM18
         L+56JpASkzDaYsJ7AfVuNhBSfA+X8LBf+wyvohhS/S+0a8d//9k92rUFNupsl2Wchw5U
         E8xwvAmfweOIUWLQP66usthfAww3zmxOel6iFcEKV3GNHX2/Zh/ft25B0OnybMc8QJgU
         OexMahDkjOfu0trebN+v8xPCcowWNu199UhX0s1qPEtxVqmDDEVbtMlEkCAIHlert/5H
         7Qpw==
X-Gm-Message-State: AGRZ1gIzggIGfegDlP+iWOIbLmUo4JRB5TxQ/Q1ARUeGUuOZ+3razfhb
	cdbeDu4o0iV1orI/Dvi7VS+gZzARM7FXexiblKZazKZ+
X-Google-Smtp-Source: AJdET5fWBwV3yRS6CjrRVFEIk2NP7WzDtXOL6LAIOU3NO2Q1tRWFnFFjm/7vw5WoR1atOJzL8Fcgme825nZYyC8vL6g=
X-Received: by 2002:a81:2803:: with SMTP id o3mr3440566ywo.358.1542741654568;
 Tue, 20 Nov 2018 11:20:54 -0800 (PST)
MIME-Version: 1.0
References: <CACybZXJd1t6S_dHtSM8wp31ihNfemZ2bGZDJJWhDCBTfKUnUVw@mail.gmail.com>
 <CABC2O9qXjJoxi_UvLRenAxByWKAB1gE6hyCDCDXHYGwqhLRoaQ@mail.gmail.com>
 <CAMwxjUb2NcrVnd4yQChtcs=wDqGZVVuYH1jiD90r8i8VpkAQMw@mail.gmail.com> <CACybZXL-US_jWHXFJ3-oyhXGh=r5OT+xb=7i6CyYDSVZyGPAYg@mail.gmail.com>
In-Reply-To: <CACybZXL-US_jWHXFJ3-oyhXGh=r5OT+xb=7i6CyYDSVZyGPAYg@mail.gmail.com>
From: Gaurav Gireesh <gaurav.gireesh@gmail.com>
Date: Tue, 20 Nov 2018 11:20:18 -0800
Message-ID: <CACybZXJWmxHZJfYz5MfGuh2cYoenbc8+aL160mdiV6hkJkOjAg@mail.gmail.com>
Subject: Re: MXNet - Gluon - Audio
To: dev@mxnet.incubator.apache.org
Content-Type: multipart/alternative; boundary="0000000000004d049f057b1d8891"

--0000000000004d049f057b1d8891
Content-Type: text/plain; charset="UTF-8"

Hi All!
Following up on this PR:
https://github.com/apache/incubator-mxnet/pull/13241
I would need some comments or feedback regarding the API design :
https://cwiki.apache.org/confluence/display/MXNET/Gluon+-+Audio

The comments on the PR were mostly around *librosa *and its performance
being a blocker if and when the designed API can be tested with bigger ASR
models DeepSpeech 2, DeepSpeech 3.
I would appreciate if the community provides their expertise/knowledge on
loading audio data and feature extraction used currently with bigger ARS
models.
If there is anything in design which may be changed/improved that will
improve the performance, I ll be happy to look into this.

Thanks and regards,
Gaurav Gireesh

On Thu, Nov 15, 2018 at 10:47 AM Gaurav Gireesh <gaurav.gireesh@gmail.com>
wrote:

> Hi Lai!
> Thank you for your comments!
> Below are the answers to your comments/queries:
> 1) That's a good suggestion. However, I have added an example in the Pull
> request related to this:
> https://github.com/apache/incubator-mxnet/pull/13241/commits/eabb68256d8fd603a0075eafcd8947d92e7df27f
> .
> I would be happy to include a dataset similar to MNIST to support that. I
> have come across an example dataset used in tensor flow speech
> related example here
> <https://www.tensorflow.org/tutorials/sequences/audio_recognition>. This
> could be included.
>
> 2) Thank you for the suggestion, I shall look into the FFT operator that
> you have pointed out. However, there are other kind of features like, mfcc,
> mels and so on which are popular in audio data feature extraction, which
> will find utility if implemented. I am not sure if we have operators for
> this.
>
> 3) The references look good too. I shall look into them. Thank you for
> bringing them into my notice.
>
> Regards,
> Gaurav
>
> On Tue, Nov 13, 2018 at 11:22 AM Lai Wei <royweilai@gmail.com> wrote:
>
>> Hi Gaurav,
>>
>> Thanks for starting this. I see the PR is out
>> <https://github.com/apache/incubator-mxnet/pull/13241>, left some initial
>> reviews, good work!
>>
>> In addition to Sandeep's queries, I have the following:
>> 1. Can we include some simple classic audio dataset for users to directly
>> import and try out? like MNIST in vision. (e.g.:
>> http://pytorch.org/audio/datasets.html#yesno)
>> 2. Librosa provides some good audio feature extractions, we can use it for
>> now. But it's slow as you have to do conversions between ndarray and
>> numpy.
>> In the long term, can we make transforms to use mxnet operators and change
>> your transforms to hybrid blocks? For example, mxnet FFT
>> <
>> https://mxnet.apache.org/api/python/ndarray/contrib.html?highlight=fft#mxnet.ndarray.contrib.fft
>> >
>> operator
>> can be used in a hybrid block transformer, which will be a lot faster.
>>
>> Some additional references on users already using mxnet on audio, we
>> should
>> aim to make it easier and automate the file load/preprocess/transform
>> process.
>> 1. https://github.com/chen0040/mxnet-audio
>> 2. https://github.com/shuokay/mxnet-wavenet
>>
>> Looking forward to seeing this feature out.
>> Thanks!
>>
>> Best Regards
>>
>> Lai
>>
>>
>> On Tue, Nov 13, 2018 at 9:09 AM sandeep krishnamurthy <
>> sandeep.krishna98@gmail.com> wrote:
>>
>> > Thanks, Gaurav for starting this initiative. The design document is
>> > detailed and gives all the information.
>> > Starting to add this in "Contrib" is a good idea while we expect a few
>> > rough edges and cleanups to follow.
>> >
>> > I had the following queries:
>> > 1. Is there any analysis comparing LibROSA with other libraries? w.r.t
>> > features, performance, community usage in audio data domain.
>> > 2. What is the recommendation of LibROSA dependency? Part of MXNet PyPi
>> or
>> > ask the user to install if required? I prefer the latter, similar to
>> > protobuf in ONNX-MXNet.
>> > 3. I see LibROSA is a fully Python-based library. Are we getting
>> blocked on
>> > the dependency for future use cases when we want to make
>> transformations as
>> > operators and allow for cross-language support?
>> > 4. In performance design considerations, with lazy=True / False the
>> > performance difference is too scary ( 8 minutes to 4 hours!!) This
>> requires
>> > some more analysis. If we known turning a flag off/on has 24X
>> performance
>> > degradation, should we need to provide that control to user? What is the
>> > impact of this on Memory usage?
>> > 5. I see LibROSA has ISC license (
>> > https://github.com/librosa/librosa/blob/master/LICENSE.md) which says
>> free
>> > to use with same license notification. I am not sure if this is ok. I
>> > request other committers/mentors to suggest.
>> >
>> > Best,
>> > Sandeep
>> >
>> > On Fri, Nov 9, 2018 at 5:45 PM Gaurav Gireesh <gaurav.gireesh@gmail.com
>> >
>> > wrote:
>> >
>> > > Dear MXNet Community,
>> > >
>> > > I recently started looking into performing some simple sound
>> multi-class
>> > > classification tasks with Audio Data and realized that as a user, I
>> would
>> > > like MXNet to have an out of the box feature which allows us to load
>> > audio
>> > > data(at least 1 file format), extract features( or apply some common
>> > > transforms/feature extraction) and train a model using the Audio
>> Dataset.
>> > > This could be a first step towards building and supporting APIs
>> similar
>> > to
>> > > what we have for "vision" related use cases in MXNet.
>> > >
>> > > Below is the design proposal :
>> > >
>> > > Gluon - Audio Design Proposal
>> > > <https://cwiki.apache.org/confluence/display/MXNET/Gluon+-+Audio>
>> > >
>> > > I would highly appreciate your taking time to review and provide
>> > feedback,
>> > > comments/suggestions on this.
>> > > Looking forward to your support.
>> > >
>> > >
>> > > Best Regards,
>> > >
>> > > Gaurav Gireesh
>> > >
>> >
>> >
>> > --
>> > Sandeep Krishnamurthy
>> >
>>
>

--0000000000004d049f057b1d8891--