couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Stevens (Gmail)" <wickedg...@gmail.com>
Subject Re: Automatically extracted CouchDB FAQs
Date Wed, 23 Feb 2011 20:10:18 GMT
Interesting project.  :)

I didn't get a very strong sense of correlation between the topic
categories and the questions in them.  For example,

http://faqcluster.com/couchdb-replication-couch-databases-database
"Questions & Answers about Couchdb, Couch, Replication, Databases and Database."

Had the following question:

http://faqcluster.com/question1996757514

"I'm looking for a recommendation for ruby gem that will enable me to
use couchdb from rails. I'd like to have couch documents be modeled by
ActiveRecord."

This didn't have any mention of replication (or databases), so I can
only guess that it was clustering on "couch" or "couchdb".

Do you do any screening of common terms from the clustering?  I'd
imagine that if you looked at the user@couchdb mailing list, you could
find a list of very common terms (like couch, couchdb, database, etc.)
and discard or ignore those when trying to cluster the messages (in
the same way that words like "the" and "and" shouldn't be used).
Basically, a per-mailing-list set of generic terms.

The questions and answers themselves seemed to be a nice, readable "I
have X problem" "here is an answer" pair, so that was cool.  :)

HTH,
Eli


On Tue, Feb 22, 2011 at 8:24 PM, Stefan Henß
<stefan.henss@googlemail.com> wrote:
> Hi everybody,
>
> I'm currently doing research for my bachelor thesis on how to automatically
> extract FAQs from unstructured data.
>
> For this I've built a system automatically performing the following:
> - Load thousands of conversations from forums and mailing lists (don't mind
> the categories there).
> - Build categorization solely based on the conversation's texts (by
> clustering).
> - Pick the best modelled categories as basis for one FAQ each.
> - For each question (first entry in a conversation) find the best reply from
> its answers.
> - Select the most relevant and well formatted question/answer-pairs for each
> FAQ.
>
> For the evaluation part I'd like to ask you for having a look at one or two
> FAQs and maybe give some comments on how far the questions matched the FAQ's
> title, how relevant they were etc.
>
>
> Here's the direct link to the CouchDB FAQs:
> http://faqcluster.com/couchdb-view-document-doc-couch
>
> And here a quite good example in my opinion:
> http://faqcluster.com/question1516894006
>
> (There are some other interesting FAQs as well at http://faqcluster.com/)
>
>
> Thanks for your help
>
> Stefan
>

Mime
View raw message