Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 154B5F171 for ; Sun, 14 Apr 2013 09:33:12 +0000 (UTC) Received: (qmail 51981 invoked by uid 500); 14 Apr 2013 09:33:07 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 51543 invoked by uid 500); 14 Apr 2013 09:33:06 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Delivered-To: moderator for user@hadoop.apache.org Received: (qmail 19865 invoked by uid 99); 14 Apr 2013 09:11:44 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) From: Joachim Van den Bogaert Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Test methods IR task on real-time content Date: Sun, 14 Apr 2013 11:11:14 +0200 Message-Id: To: user Mime-Version: 1.0 (Apple Message framework v1283) X-Mailer: Apple Mail (2.1283) X-Authenticated-Sender: joachim@inqa.be X-Virus-Scanned: Clear (ClamAV 0.97.6/17009/Sun Apr 14 06:39:19 2013) X-Virus-Checked: Checked by ClamAV on apache.org Hi all, I was wondering whether anyone has ever used information retrieval = metrics on real-time big data with variable amounts of data. The main idea would be to test whether you can find relevant information = for a given time frame for two data repositories: one baseline repository and one with extra content. The question here = would be how to do this in a fair way: chances are that the extra content will contain more relevant documents = than the baseline. So how can you be sure that finding more relevant documents is really related to the quality of your = search system and not to the size of your data repository? Regards, Joachim=