Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F0A37D28F for ; Tue, 11 Sep 2012 14:25:57 +0000 (UTC) Received: (qmail 34916 invoked by uid 500); 11 Sep 2012 14:25:52 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 34639 invoked by uid 500); 11 Sep 2012 14:25:52 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 34632 invoked by uid 99); 11 Sep 2012 14:25:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Sep 2012 14:25:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yaron.gonen@gmail.com designates 74.125.83.48 as permitted sender) Received: from [74.125.83.48] (HELO mail-ee0-f48.google.com) (74.125.83.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Sep 2012 14:25:45 +0000 Received: by eekd41 with SMTP id d41so519848eek.35 for ; Tue, 11 Sep 2012 07:25:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=uM72G0qdg0NLY1ELjwFb176UwttzXCgmHERrdIam2Xs=; b=rG//hv4ex9Iy1Hdd5NVoqRcFFp/1GEGOsxne+b5Sqt15rLvI8Reqp20g2snYnzvcf7 +rQiavtvY7uVwCHC+DkUz2Kqrg65G5SGqTy3PoxGN6kfRzKMd61Xw3LJWsk//Q1fYLO0 jSlYzCWGxBq8Fz7Hwfgv9/YJ9zwFGuFVitylrod0vBQLwDy168h/AkzPSnVEYWOtRle6 ZgWoa9eu7ZWc9Nvfmtfzdc6STTRSjZ73c1zGmUFDQlHjsCYHtez32PJnbn/u75tlv7Qj u7YeCcWqsYq94f/ZEInHkKLI1aUZIM4chDhsQYD/YHJNwtrCVopdmtQIprHy5uyRZ6Q5 Of2Q== MIME-Version: 1.0 Received: by 10.204.128.202 with SMTP id l10mr4803572bks.127.1347373525176; Tue, 11 Sep 2012 07:25:25 -0700 (PDT) Received: by 10.204.127.142 with HTTP; Tue, 11 Sep 2012 07:25:25 -0700 (PDT) In-Reply-To: <504F35EF.8050702@amd.com> References: <504F35EF.8050702@amd.com> Date: Tue, 11 Sep 2012 17:25:25 +0300 Message-ID: Subject: Re: Some general questions about DBInputFormat From: Yaron Gonen To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00151747b49a5910d004c96dd8bb --00151747b49a5910d004c96dd8bb Content-Type: text/plain; charset=ISO-8859-1 Thanks for the fast response. Nick, regarding locking a table: as far as I understood from the code, each mapper opens its own connection to the DB. I didn't see any code such that the job creates a transaction and passes it to the mapper. Did I miss something? again, thanks! On Tue, Sep 11, 2012 at 4:00 PM, Nick Jones wrote: > Hi Yaron > > Replies inline below. > > > On 09/11/2012 07:41 AM, Yaron Gonen wrote: > >> Hi, >> After reviewing the class's (not very complicated) code, I have some >> questions I hope someone can answer: >> >> * (more general question) Are there many use-cases for using >> >> DBInputFormat? Do most Hadoop jobs take their input from files or DBs? >> >> Bejoy's right, most jobs utilize data across HDFS or some other > distributed architecture to feed M/R at a sufficient rate. DBInputFormat > could be helpful in pulling pointers to other sources of data (e.g. file > paths for filers where actual binary content is stored). > >> >> * What happens when the database is updated during mappers' data >> >> retrieval phase? is there a way to lock the database before the >> data retrieval phase and release it afterwords? >> >> The whole job creates a transaction against the RBDMS that ensures > consistent state throughout the job. Depending on the source and settings, > this might entirely lock a table or lock the selected rows by the query. > >> >> * Since all mappers open a connection to the same DBS, one cannot >> >> use hundreds of mapper. Is there a solution to this problem? >> >> Depends on the connection limits and the number of rows requested. I've > found that the server suffered other problems first before connection count > limitations. > >> >> Thanks, >> Yaron >> > > > --00151747b49a5910d004c96dd8bb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks for the fast response.
Nick, regarding locking = a table: as far as I understood from the code, each mapper opens its own co= nnection to the DB. I didn't see any code such that the job creates a t= ransaction and passes it to the mapper. Did I miss=A0something?
again, thanks!


On Tue, Se= p 11, 2012 at 4:00 PM, Nick Jones <nick.jones@amd.com> wrot= e:
Hi Yaron

Replies inline below.


On 09/11/2012 07:41 AM, Yaron Gonen wrote:
Hi,
After reviewing the class's (not very complicated) code, I have some qu= estions I hope someone can answer:

=A0 * (more general question) Are there many use-cases for using

=A0 =A0 DBInputFormat? Do most Hadoop jobs take their input from files or D= Bs?

Bejoy's right, most jobs utilize data across HDFS or some other distrib= uted architecture to feed M/R at a sufficient rate. DBInputFormat could be = helpful in pulling pointers to other sources of data (e.g. file paths for f= ilers where actual binary content is stored).

=A0 * What happens when the database is updated during mappers' data
=A0 =A0 retrieval phase? is there a way to lock the database before the
=A0 =A0 data retrieval phase and release it afterwords?

The whole job creates a transaction against the RBDMS that ensures consiste= nt state throughout the job. =A0Depending on the source and settings, this = might entirely lock a table or lock the selected rows by the query.

=A0 * Since all mappers open a connection to the same DBS, one cannot

=A0 =A0 use hundreds of mapper. Is there a solution to this problem?

Depends on the connection limits and the number of rows requested. I've= found that the server suffered other problems first before connection coun= t limitations.

Thanks,
Yaron



--00151747b49a5910d004c96dd8bb--