Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0394EEA0B for ; Wed, 16 Jan 2013 14:57:38 +0000 (UTC) Received: (qmail 15840 invoked by uid 500); 16 Jan 2013 14:57:33 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 15546 invoked by uid 500); 16 Jan 2013 14:57:33 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 15539 invoked by uid 99); 16 Jan 2013 14:57:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jan 2013 14:57:32 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of john.lilley@redpoint.net designates 206.225.164.233 as permitted sender) Received: from [206.225.164.233] (HELO hub021-nj-8.exch021.serverdata.net) (206.225.164.233) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jan 2013 14:57:26 +0000 Received: from MBX021-E3-NJ-2.exch021.domain.local ([10.240.4.78]) by HUB021-NJ-8.exch021.domain.local ([10.240.4.117]) with mapi id 14.02.0318.001; Wed, 16 Jan 2013 06:56:59 -0800 From: John Lilley To: "user@hadoop.apache.org" Subject: RE: Query mongodb Thread-Topic: Query mongodb Thread-Index: AQHN89MrLlZ9GUL6zkaUk4OrdfRWeZhMRvQA//+/99CAAIhBgP//eokA Date: Wed, 16 Jan 2013 14:56:59 +0000 Message-ID: <869970D71E26D7498BDAC4E1CA92226B3FCDF04D@MBX021-E3-NJ-2.exch021.domain.local> References: <869970D71E26D7498BDAC4E1CA92226B3FCDEF55@MBX021-E3-NJ-2.exch021.domain.local> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [206.168.224.109] Content-Type: multipart/alternative; boundary="_000_869970D71E26D7498BDAC4E1CA92226B3FCDF04DMBX021E3NJ2exch_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_869970D71E26D7498BDAC4E1CA92226B3FCDF04DMBX021E3NJ2exch_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Um, I think you and I are talking about the same thing, but maybe not? Certainly HBase/MongoDB are HDFS-aware, so I would expect that if I am a cl= ient program running outside of the Hadoop cluster and I do a query, the da= tabase tools will construct query processing such that data is read and pro= cessed in an optimal fashion (using MapReduce?), before the aggregated info= rmation is shipped to me on the client side. The question I was asking is a little different although hopefully the answ= er is just as simple. Can I write mapper/reducer that queries HBase/MongoD= B and have MR schedule my mappers such that each mapper is receiving tuples= that have been read in a locality-aware fashion? john From: Mohammad Tariq [mailto:dontariq@gmail.com] Sent: Wednesday, January 16, 2013 7:47 AM To: user@hadoop.apache.org Subject: Re: Query mongodb MapReduce framework tries its best to run the jobs on the nodes where data is located. It is its fundamental nature. You don't have to do anything extra. *I am sorry if I misunderstood the question. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, Jan 16, 2013 at 8:10 PM, John Lilley > wrote: How does one schedule mappers to read MongoDB or HBase in a data-locality-a= ware fashion? -john From: Mohammad Tariq [mailto:dontariq@gmail.com] Sent: Wednesday, January 16, 2013 3:29 AM To: user@hadoop.apache.org Subject: Re: Query mongodb Yes. You can use MongoDB-Hadoop adapter to achieve that. Through this adapt= er you can pull the data, process it and push it back to your MongoDB backe= d datastore by writing MR jobs. It is also 100% possible to query Hbase or JSON files, or anything else for= that matter, stored in HDFS. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Wed, Jan 16, 2013 at 3:50 PM, Panshul Whisper > wrote: Hello, Is it possible or how is it possible to query mongodb directly from hadoop. Or is it possible to query hbase or json files stored in hdfs in a similar = way as we can query the json documents in mongodb. Suggestions please. Thank you. Regards, Panshul. --_000_869970D71E26D7498BDAC4E1CA92226B3FCDF04DMBX021E3NJ2exch_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Um, I think you and I are= talking about the same thing, but maybe not?

 <= /p>

Certainly HBase/MongoDB a= re HDFS-aware, so I would expect that if I am a client program running outs= ide of the Hadoop cluster and I do a query, the database tools will construct query processing such that data is read and processed= in an optimal fashion (using MapReduce?), before the aggregated informatio= n is shipped to me on the client side. 

 <= /p>

The question I was asking= is a little different although hopefully the answer is just as simple.&nbs= p; Can I write mapper/reducer that queries HBase/MongoDB and have MR schedule my mappers such that each mapper is receiving tuples that= have been read in a locality-aware fashion?

 <= /p>

john

 <= /p>

From: Mohammad= Tariq [mailto:dontariq@gmail.com]
Sent: Wednesday, January 16, 2013 7:47 AM
To: user@hadoop.apache.org
Subject: Re: Query mongodb

 

MapReduce framework tries its best to run the jobs o= n the nodes 

where  data is located. It is its fundamental n= ature. You don't have 

to do anything extra.

 

*I am sorry if I misunderstood the question.

 


 

On Wed, Jan 16, 2013 at 8:10 PM, John Lilley <john.lilley@redp= oint.net> wrote:

How does one schedule mappers to read M= ongoDB or HBase in a data-locality-aware fashion?

-john

 

From: Mohammad Tariq [mailto= :dontariq@gmail.com= ]
Sent: Wednesday, January 16, 2013 3:29 AM
To: user= @hadoop.apache.org
Subject: Re: Query mongodb

 

Yes. You can use MongoDB-Hadoop adapter to achieve that. Thro= ugh this adapter you can pull the data, process it and push it back to your= MongoDB backed datastore by writing MR jobs.

 

It is also 100% possible to query Hbase or JSON files, or anything= else for that matter, stored in HDFS.


 

On Wed, Jan 16, 2013 at 3:50 PM, Panshul Whisper <ouchwhisper@gmail.com>= wrote:

Hello,
Is it possible or how is it possible to query mongodb directly from hadoop.=

Or is it possible to query hbase or json files stored in hdfs in a simil= ar way as we can query the json documents in mongodb.

Suggestions please.

Thank you.
Regards,
Panshul.

 

 

--_000_869970D71E26D7498BDAC4E1CA92226B3FCDF04DMBX021E3NJ2exch_--