lucene-openrelevance-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Itamar Syn-Hershko <>
Subject OpenRelevance Viewer (Orev)
Date Thu, 08 Jul 2010 15:50:30 GMT
Hi all,

Following a discussion with Robert, I have started working on a viewer 
application intended to make viewing and judgment of corpora and topics 
as easy as possible. The intention is to make this development as rapid 
as it can possibly be. I'm building this with .NET (NHibernate / ASP.NET 

Following are several remarks / high-level description. I'm interested 
in capturing some early feedback and ideas, but please note my intention 
is to start with something functional first.

While FILEFORMATS.txt defines file structures, since the viewer is 
working against a DB those will only be honored via export functions. 
See attached image for a domain model.

A corpus DB entry points to a FS path (could also be remote via HTTP for 
example). The viewer, in turn, will load the files one by one and the 
judgment will be saved with the Corpus ID, Topic ID and a string 
representation of the document filename. The former 2 are integers, and 
document ID is defined as a string, so document file-names can use a 
base-24 ID representation for generated corpora (i.e. exporting from a 

Unlike what was stated in FILEFORMATS.TXT, a corpus will not reside in a 
gzipped file.

The above approach may allow for more than one people judging the same 
document for the same topic at once - which is bad since it could waste 
the users time (no need for double-judgment). I'll probably have to 
resolve this by implementing a HiLo-like mechanism (or pooling), but I'm 
leaving this for later.

The web application will allow for submitting new topics per language, 
and to judge documents for a topic. The Judgment screen will show the 
topic at top, navigation at left, and the document in rest of the 
screen. The user can choose "Relevant", "Irrelevant", "Skip".

A user can filter by language, so he sees only topics relevant to him. 
Language filtering can be applied using a language string ("en-US") per 
topic and corpus.

Thats about it for now, looking forward to some feedback.


  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message