jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Henß" <stefan.he...@googlemail.com>
Subject Automatically extracted Jackrabbit FAQs
Date Tue, 08 Mar 2011 20:54:13 GMT
Hi everybody,

I'm currently doing research for my bachelor thesis on how to 
automatically extract FAQs from unstructured data.

For this I've built a system automatically performing the following:
- Load thousands of conversations from forums and mailing lists (don't 
mind the categories there, don't discriminate between sources).
- Build new categorization solely based on the conversation's texts (by 
clustering).
- Pick the best modelled categories as basis for one FAQ each.
- For each question (first entry in a thread) find the best reply from 
its answers.
- Select the most relevant and well formatted question/answer-pairs for 
each FAQ.

For the evaluation I'm interested in expert's perceptions of the 
results, e.g. if the questions are relevant, correctly answered, etc.
Also as I'll release a paper about the approach I'd be happy if you 
could rate one or two questions (stars on the details pages) so I'd have 
some statistics to present.


Here's the direct link to the Jackrabbit FAQs:
http://faqcluster.com/jackrabbit-node-jcr-repository-apache

(There are some other interesting FAQs as well at http://faqcluster.com/)


Thanks for your help

Stefan

Mime
View raw message