Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: Joachim Van den Bogaert <joachim@inqa.be>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Subject: Test methods IR task on real-time content
Date: Sun, 14 Apr 2013 11:11:14 +0200
Message-Id: <B3E44427-8AFF-4D4C-BB0D-B3536C2395E0@inqa.be>
To: user <user@hadoop.apache.org>
Mime-Version: 1.0 (Apple Message framework v1283)

Hi all,

I was wondering whether anyone has ever used information retrieval =
metrics on real-time big data
with variable amounts of data.

The main idea would be to test whether you can find relevant information =
for a given time frame for two data repositories:
one baseline repository and one with extra content. The question here =
would be how to do this in a fair way:
chances are that the extra content will contain more relevant documents =
than the baseline. So how can you be sure that
finding more relevant documents is really related to the quality of your =
search system and not to the size of your data repository?

Regards,
Joachim=