From dev-return-4856-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org Tue Nov 20 20:21:00 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3A97318064E for ; Tue, 20 Nov 2018 20:21:00 +0100 (CET) Received: (qmail 2363 invoked by uid 500); 20 Nov 2018 19:20:59 -0000 Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mxnet.incubator.apache.org Delivered-To: mailing list dev@mxnet.incubator.apache.org Received: (qmail 2350 invoked by uid 99); 20 Nov 2018 19:20:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2018 19:20:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 471AD18EA30 for ; Tue, 20 Nov 2018 19:20:58 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.339 X-Spam-Level: X-Spam-Status: No, score=0.339 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-1.46, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id CGaPO18fh1Zt for ; Tue, 20 Nov 2018 19:20:56 +0000 (UTC) Received: from mail-yw1-f45.google.com (mail-yw1-f45.google.com [209.85.161.45]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id DD8875F283 for ; Tue, 20 Nov 2018 19:20:55 +0000 (UTC) Received: by mail-yw1-f45.google.com with SMTP id g75so1242205ywb.1 for ; Tue, 20 Nov 2018 11:20:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=yH8C6PHNv8PvGBJ3niWvrtdjc8YwQv+0iLacORp0Opo=; b=krpARaHCIvsoYv9JoW5zPN0vZs3e13p0pUESD2UbvfB9GIBYzWpIOTDmiY75ptPIiL kG7wDCfRsROqGTHhU+ov+XuKwGJxQmTek/02ggToUeToOx3UddT7I/HTv9QL9BTkwo32 J7ayKgCLD6fnyjcGWBay7me+CRwxkAsyGjdEXQzz/P8EF8XtaE6kLEIXvXIwJ6s/9mqX WI9hCAjzLV5cO+H4HSbUQDMfLybkguTfsobQm5qJKjejhA9+f1B7AbSbt9bD3leCvOKn VYyzndiL+ggGvs6T69hd+2aXG8ymzQUYrAiQLOglR3Ei8ReWMVH0mNmz34/9jBMnfW4R m/og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=yH8C6PHNv8PvGBJ3niWvrtdjc8YwQv+0iLacORp0Opo=; b=SjsvcPVik4E1qa6by9NepgxRhl5RUI2hq+0Tg5RxtMfw+TLhq/Y3row8h6aPGEA1MX dZsR2vx5TDs0xvF/h+nTIYr1NSEhrpq+pn/6ETyoSdXFULn1sNEfQPbVkL+Dm01BQM18 L+56JpASkzDaYsJ7AfVuNhBSfA+X8LBf+wyvohhS/S+0a8d//9k92rUFNupsl2Wchw5U E8xwvAmfweOIUWLQP66usthfAww3zmxOel6iFcEKV3GNHX2/Zh/ft25B0OnybMc8QJgU OexMahDkjOfu0trebN+v8xPCcowWNu199UhX0s1qPEtxVqmDDEVbtMlEkCAIHlert/5H 7Qpw== X-Gm-Message-State: AGRZ1gIzggIGfegDlP+iWOIbLmUo4JRB5TxQ/Q1ARUeGUuOZ+3razfhb cdbeDu4o0iV1orI/Dvi7VS+gZzARM7FXexiblKZazKZ+ X-Google-Smtp-Source: AJdET5fWBwV3yRS6CjrRVFEIk2NP7WzDtXOL6LAIOU3NO2Q1tRWFnFFjm/7vw5WoR1atOJzL8Fcgme825nZYyC8vL6g= X-Received: by 2002:a81:2803:: with SMTP id o3mr3440566ywo.358.1542741654568; Tue, 20 Nov 2018 11:20:54 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Gaurav Gireesh Date: Tue, 20 Nov 2018 11:20:18 -0800 Message-ID: Subject: Re: MXNet - Gluon - Audio To: dev@mxnet.incubator.apache.org Content-Type: multipart/alternative; boundary="0000000000004d049f057b1d8891" --0000000000004d049f057b1d8891 Content-Type: text/plain; charset="UTF-8" Hi All! Following up on this PR: https://github.com/apache/incubator-mxnet/pull/13241 I would need some comments or feedback regarding the API design : https://cwiki.apache.org/confluence/display/MXNET/Gluon+-+Audio The comments on the PR were mostly around *librosa *and its performance being a blocker if and when the designed API can be tested with bigger ASR models DeepSpeech 2, DeepSpeech 3. I would appreciate if the community provides their expertise/knowledge on loading audio data and feature extraction used currently with bigger ARS models. If there is anything in design which may be changed/improved that will improve the performance, I ll be happy to look into this. Thanks and regards, Gaurav Gireesh On Thu, Nov 15, 2018 at 10:47 AM Gaurav Gireesh wrote: > Hi Lai! > Thank you for your comments! > Below are the answers to your comments/queries: > 1) That's a good suggestion. However, I have added an example in the Pull > request related to this: > https://github.com/apache/incubator-mxnet/pull/13241/commits/eabb68256d8fd603a0075eafcd8947d92e7df27f > . > I would be happy to include a dataset similar to MNIST to support that. I > have come across an example dataset used in tensor flow speech > related example here > . This > could be included. > > 2) Thank you for the suggestion, I shall look into the FFT operator that > you have pointed out. However, there are other kind of features like, mfcc, > mels and so on which are popular in audio data feature extraction, which > will find utility if implemented. I am not sure if we have operators for > this. > > 3) The references look good too. I shall look into them. Thank you for > bringing them into my notice. > > Regards, > Gaurav > > On Tue, Nov 13, 2018 at 11:22 AM Lai Wei wrote: > >> Hi Gaurav, >> >> Thanks for starting this. I see the PR is out >> , left some initial >> reviews, good work! >> >> In addition to Sandeep's queries, I have the following: >> 1. Can we include some simple classic audio dataset for users to directly >> import and try out? like MNIST in vision. (e.g.: >> http://pytorch.org/audio/datasets.html#yesno) >> 2. Librosa provides some good audio feature extractions, we can use it for >> now. But it's slow as you have to do conversions between ndarray and >> numpy. >> In the long term, can we make transforms to use mxnet operators and change >> your transforms to hybrid blocks? For example, mxnet FFT >> < >> https://mxnet.apache.org/api/python/ndarray/contrib.html?highlight=fft#mxnet.ndarray.contrib.fft >> > >> operator >> can be used in a hybrid block transformer, which will be a lot faster. >> >> Some additional references on users already using mxnet on audio, we >> should >> aim to make it easier and automate the file load/preprocess/transform >> process. >> 1. https://github.com/chen0040/mxnet-audio >> 2. https://github.com/shuokay/mxnet-wavenet >> >> Looking forward to seeing this feature out. >> Thanks! >> >> Best Regards >> >> Lai >> >> >> On Tue, Nov 13, 2018 at 9:09 AM sandeep krishnamurthy < >> sandeep.krishna98@gmail.com> wrote: >> >> > Thanks, Gaurav for starting this initiative. The design document is >> > detailed and gives all the information. >> > Starting to add this in "Contrib" is a good idea while we expect a few >> > rough edges and cleanups to follow. >> > >> > I had the following queries: >> > 1. Is there any analysis comparing LibROSA with other libraries? w.r.t >> > features, performance, community usage in audio data domain. >> > 2. What is the recommendation of LibROSA dependency? Part of MXNet PyPi >> or >> > ask the user to install if required? I prefer the latter, similar to >> > protobuf in ONNX-MXNet. >> > 3. I see LibROSA is a fully Python-based library. Are we getting >> blocked on >> > the dependency for future use cases when we want to make >> transformations as >> > operators and allow for cross-language support? >> > 4. In performance design considerations, with lazy=True / False the >> > performance difference is too scary ( 8 minutes to 4 hours!!) This >> requires >> > some more analysis. If we known turning a flag off/on has 24X >> performance >> > degradation, should we need to provide that control to user? What is the >> > impact of this on Memory usage? >> > 5. I see LibROSA has ISC license ( >> > https://github.com/librosa/librosa/blob/master/LICENSE.md) which says >> free >> > to use with same license notification. I am not sure if this is ok. I >> > request other committers/mentors to suggest. >> > >> > Best, >> > Sandeep >> > >> > On Fri, Nov 9, 2018 at 5:45 PM Gaurav Gireesh > > >> > wrote: >> > >> > > Dear MXNet Community, >> > > >> > > I recently started looking into performing some simple sound >> multi-class >> > > classification tasks with Audio Data and realized that as a user, I >> would >> > > like MXNet to have an out of the box feature which allows us to load >> > audio >> > > data(at least 1 file format), extract features( or apply some common >> > > transforms/feature extraction) and train a model using the Audio >> Dataset. >> > > This could be a first step towards building and supporting APIs >> similar >> > to >> > > what we have for "vision" related use cases in MXNet. >> > > >> > > Below is the design proposal : >> > > >> > > Gluon - Audio Design Proposal >> > > >> > > >> > > I would highly appreciate your taking time to review and provide >> > feedback, >> > > comments/suggestions on this. >> > > Looking forward to your support. >> > > >> > > >> > > Best Regards, >> > > >> > > Gaurav Gireesh >> > > >> > >> > >> > -- >> > Sandeep Krishnamurthy >> > >> > --0000000000004d049f057b1d8891--