hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joachim Van den Bogaert <joac...@inqa.be>
Subject Test methods IR task on real-time content
Date Sun, 14 Apr 2013 09:11:14 GMT
Hi all,

I was wondering whether anyone has ever used information retrieval metrics on real-time big
with variable amounts of data.

The main idea would be to test whether you can find relevant information for a given time
frame for two data repositories:
one baseline repository and one with extra content. The question here would be how to do this
in a fair way:
chances are that the extra content will contain more relevant documents than the baseline.
So how can you be sure that
finding more relevant documents is really related to the quality of your search system and
not to the size of your data repository?

View raw message