incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Henß" <>
Subject Automatically extracted CouchDB FAQs
Date Wed, 23 Feb 2011 04:24:48 GMT
Hi everybody,

I'm currently doing research for my bachelor thesis on how to 
automatically extract FAQs from unstructured data.

For this I've built a system automatically performing the following:
- Load thousands of conversations from forums and mailing lists (don't 
mind the categories there).
- Build categorization solely based on the conversation's texts (by 
- Pick the best modelled categories as basis for one FAQ each.
- For each question (first entry in a conversation) find the best reply 
from its answers.
- Select the most relevant and well formatted question/answer-pairs for 
each FAQ.

For the evaluation part I'd like to ask you for having a look at one or 
two FAQs and maybe give some comments on how far the questions matched 
the FAQ's title, how relevant they were etc.

Here's the direct link to the CouchDB FAQs:

And here a quite good example in my opinion:

(There are some other interesting FAQs as well at

Thanks for your help


View raw message