Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 3208 invoked from network); 19 Mar 2007 09:41:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Mar 2007 09:41:17 -0000 Received: (qmail 95250 invoked by uid 500); 19 Mar 2007 09:41:25 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 94983 invoked by uid 500); 19 Mar 2007 09:41:23 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 94974 invoked by uid 99); 19 Mar 2007 09:41:23 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Mar 2007 02:41:23 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [82.140.109.53] (HELO caesar.e-legion.com) (82.140.109.53) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Mar 2007 02:41:13 -0700 Subject: Global information in mapreduce MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C76A0A.AED5CAA6" Date: Mon, 19 Mar 2007 12:40:51 +0300 Content-class: urn:content-classes:message X-MimeOLE: Produced By Microsoft Exchange V6.5 Message-ID: <410ACBBB22A25A4CAE7D1ABC316125D00EE362@caesar.elegion.local> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Global information in mapreduce Thread-Index: AcdqCq6yGOOAZ7+/QIiUPoZ+fQoZuw== From: "Ilya Vishnevsky" To: X-Virus-Checked: Checked by ClamAV on apache.org ------_=_NextPart_001_01C76A0A.AED5CAA6 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hello! My question is about mapreduce. Is it possible to pass to the map function some global information? For example I have a set of words and a large set of documents. I want the map function to get each document as value and emit pairs (word-frequency) for each word in the set, where "frequency" is frequency of this word in the document. To do this I need map function to have access to the set of words each time it runs. Is it possible to do that? ------_=_NextPart_001_01C76A0A.AED5CAA6--