Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of john.lilley@redpoint.net
 designates 206.225.164.233 as permitted sender)
From: John Lilley <john.lilley@redpoint.net>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: RE: Query mongodb
Thread-Topic: Query mongodb
Thread-Index: AQHN89MrLlZ9GUL6zkaUk4OrdfRWeZhMRvQA//+/99CAAIhBgP//eokA
Date: Wed, 16 Jan 2013 14:56:59 +0000
Message-ID: 
 <869970D71E26D7498BDAC4E1CA92226B3FCDF04D@MBX021-E3-NJ-2.exch021.domain.local>
References: 
 <CA+NDPefRAVVqshnqHY4iY1fAxLdKsRLYxjxBTdTafSmHD9ruKA@mail.gmail.com>
 <CAMVC6ROUDYCFzcT6Jb3EeSQWAEwJbbUAxB+Y399QBTCKS6i+nQ@mail.gmail.com>
 <869970D71E26D7498BDAC4E1CA92226B3FCDEF55@MBX021-E3-NJ-2.exch021.domain.local>
 <CAMVC6RMRfo6aM7zqysTqCzhcKjdSn8W3q3jQ2JfzgWJSuuvRPA@mail.gmail.com>
In-Reply-To: 
 <CAMVC6RMRfo6aM7zqysTqCzhcKjdSn8W3q3jQ2JfzgWJSuuvRPA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_869970D71E26D7498BDAC4E1CA92226B3FCDF04DMBX021E3NJ2exch_"
MIME-Version: 1.0

--_000_869970D71E26D7498BDAC4E1CA92226B3FCDF04DMBX021E3NJ2exch_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Um, I think you and I are talking about the same thing, but maybe not?

Certainly HBase/MongoDB are HDFS-aware, so I would expect that if I am a cl=
ient program running outside of the Hadoop cluster and I do a query, the da=
tabase tools will construct query processing such that data is read and pro=
cessed in an optimal fashion (using MapReduce?), before the aggregated info=
rmation is shipped to me on the client side.

The question I was asking is a little different although hopefully the answ=
er is just as simple.  Can I write mapper/reducer that queries HBase/MongoD=
B and have MR schedule my mappers such that each mapper is receiving tuples=
 that have been read in a locality-aware fashion?

john

From: Mohammad Tariq [mailto:dontariq@gmail.com]
Sent: Wednesday, January 16, 2013 7:47 AM
To: user@hadoop.apache.org
Subject: Re: Query mongodb

MapReduce framework tries its best to run the jobs on the nodes
where  data is located. It is its fundamental nature. You don't have
to do anything extra.

*I am sorry if I misunderstood the question.


Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com>

On Wed, Jan 16, 2013 at 8:10 PM, John Lilley <john.lilley@redpoint.net<mail=
to:john.lilley@redpoint.net>> wrote:
How does one schedule mappers to read MongoDB or HBase in a data-locality-a=
ware fashion?
-john

From: Mohammad Tariq [mailto:dontariq@gmail.com<mailto:dontariq@gmail.com>]
Sent: Wednesday, January 16, 2013 3:29 AM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Query mongodb

Yes. You can use MongoDB-Hadoop adapter to achieve that. Through this adapt=
er you can pull the data, process it and push it back to your MongoDB backe=
d datastore by writing MR jobs.

It is also 100% possible to query Hbase or JSON files, or anything else for=
 that matter, stored in HDFS.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com>

On Wed, Jan 16, 2013 at 3:50 PM, Panshul Whisper <ouchwhisper@gmail.com<mai=
lto:ouchwhisper@gmail.com>> wrote:

Hello,
Is it possible or how is it possible to query mongodb directly from hadoop.

Or is it possible to query hbase or json files stored in hdfs in a similar =
way as we can query the json documents in mongodb.

Suggestions please.

Thank you.
Regards,
Panshul.


--_000_869970D71E26D7498BDAC4E1CA92226B3FCDF04DMBX021E3NJ2exch_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
p
	{mso-style-priority:99;
	mso-margin-top-alt:auto;
	margin-right:0in;
	mso-margin-bottom-alt:auto;
	margin-left:0in;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
	{mso-style-priority:99;
	mso-style-link:"Balloon Text Char";
	margin:0in;
	margin-bottom:.0001pt;
	font-size:8.0pt;
	font-family:"Tahoma","sans-serif";}
span.EmailStyle18
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
span.BalloonTextChar
	{mso-style-name:"Balloon Text Char";
	mso-style-priority:99;
	mso-style-link:"Balloon Text";
	font-family:"Tahoma","sans-serif";}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri","sans-serif";}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div class=3D"WordSection1">
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Um, I think you and I are=
 talking about the same thing, but maybe not?<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Certainly HBase/MongoDB a=
re HDFS-aware, so I would expect that if I am a client program running outs=
ide of the Hadoop cluster and I do a query, the database
 tools will construct query processing such that data is read and processed=
 in an optimal fashion (using MapReduce?), before the aggregated informatio=
n is shipped to me on the client side.&nbsp;
<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">The question I was asking=
 is a little different although hopefully the answer is just as simple.&nbs=
p; Can I write mapper/reducer that queries HBase/MongoDB and
 have MR schedule my mappers such that each mapper is receiving tuples that=
 have been read in a locality-aware fashion?<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">john<o:p></o:p></span></p=
>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Mohammad=
 Tariq [mailto:dontariq@gmail.com]
<br>
<b>Sent:</b> Wednesday, January 16, 2013 7:47 AM<br>
<b>To:</b> user@hadoop.apache.org<br>
<b>Subject:</b> Re: Query mongodb<o:p></o:p></span></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<p class=3D"MsoNormal">MapReduce framework tries its best to run the jobs o=
n the nodes&nbsp;<o:p></o:p></p>
<div>
<p class=3D"MsoNormal">where &nbsp;data is located. It is its fundamental n=
ature. You don't have&nbsp;<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">to do anything extra.<o:p></o:p></p>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class=3D"MsoNormal">*I am sorry if I misunderstood&nbsp;the question.<o:=
p></o:p></p>
</div>
<div>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
</div>
</div>
</div>
<div>
<p class=3D"MsoNormal"><br clear=3D"all">
<o:p></o:p></p>
<div>
<div>
<p class=3D"MsoNormal">Warm Regards,<o:p></o:p></p>
<div>
<p class=3D"MsoNormal">Tariq<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"https://mtariq.jux.com/" target=3D"_blank=
">https://mtariq.jux.com/</a><o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"http://cloudfront.blogspot.com" target=3D=
"_blank">cloudfront.blogspot.com</a><o:p></o:p></p>
</div>
</div>
</div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt"><o:p>&nbsp;</o:p></p>
<div>
<p class=3D"MsoNormal">On Wed, Jan 16, 2013 at 8:10 PM, John Lilley &lt;<a =
href=3D"mailto:john.lilley@redpoint.net" target=3D"_blank">john.lilley@redp=
oint.net</a>&gt; wrote:<o:p></o:p></p>
<div>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&q=
uot;sans-serif&quot;;color:#1F497D">How does one schedule mappers to read M=
ongoDB or HBase in a data-locality-aware fashion?</span><o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&q=
uot;sans-serif&quot;;color:#1F497D">-john</span><o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&q=
uot;sans-serif&quot;;color:#1F497D">&nbsp;</span><o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><b><span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,=
&quot;sans-serif&quot;">From:</span></b><span style=3D"font-size:10.0pt;fon=
t-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Mohammad Tariq [mailto=
:<a href=3D"mailto:dontariq@gmail.com" target=3D"_blank">dontariq@gmail.com=
</a>]
<br>
<b>Sent:</b> Wednesday, January 16, 2013 3:29 AM<br>
<b>To:</b> <a href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user=
@hadoop.apache.org</a><br>
<b>Subject:</b> Re: Query mongodb</span><o:p></o:p></p>
<div>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">&nbsp;<o:p></o:p></p>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">Yes.&nbsp;You can use MongoDB-Hadoop adapter to achieve that. Thro=
ugh this adapter you can pull the data, process it and push it back to your=
 MongoDB backed datastore by writing MR jobs.<o:p></o:p></p>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">&nbsp;<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">It is also 100% possible to query Hbase or JSON files, or anything=
 else for that matter, stored in HDFS.<o:p></o:p></p>
</div>
</div>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><br clear=3D"all">
<o:p></o:p></p>
<div>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">Warm Regards,<o:p></o:p></p>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">Tariq<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><a href=3D"https://mtariq.jux.com/" target=3D"_blank">https://mtar=
iq.jux.com/</a><o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto"><a href=3D"http://cloudfront.blogspot.com" target=3D"_blank">cloud=
front.blogspot.com</a><o:p></o:p></p>
</div>
</div>
</div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;margin-bottom:12.0p=
t">&nbsp;<o:p></o:p></p>
<div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">On Wed, Jan 16, 2013 at 3:50 PM, Panshul Whisper &lt;<a href=3D"ma=
ilto:ouchwhisper@gmail.com" target=3D"_blank">ouchwhisper@gmail.com</a>&gt;=
 wrote:<o:p></o:p></p>
<p>Hello,<br>
Is it possible or how is it possible to query mongodb directly from hadoop.=
<o:p></o:p></p>
<p>Or is it possible to query hbase or json files stored in hdfs in a simil=
ar way as we can query the json documents in mongodb.<o:p></o:p></p>
<p>Suggestions please.<o:p></o:p></p>
<p>Thank you.<br>
Regards,<br>
Panshul.<o:p></o:p></p>
</div>
<p class=3D"MsoNormal" style=3D"mso-margin-top-alt:auto;mso-margin-bottom-a=
lt:auto">&nbsp;<o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
</div>
</body>
</html>

--_000_869970D71E26D7498BDAC4E1CA92226B3FCDF04DMBX021E3NJ2exch_--