jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: Automatically extracted Jackrabbit FAQs
Date Mon, 14 Mar 2011 14:03:47 GMT
Hello Stefan,

On Tue, Mar 8, 2011 at 9:54 PM, Stefan Henß <stefan.henss@googlemail.com> wrote:
> Hi everybody,
> I'm currently doing research for my bachelor thesis on how to automatically
> extract FAQs from unstructured data.
> For this I've built a system automatically performing the following:
> - Load thousands of conversations from forums and mailing lists (don't mind
> the categories there, don't discriminate between sources).
> - Build new categorization solely based on the conversation's texts (by
> clustering).
> - Pick the best modelled categories as basis for one FAQ each.
> - For each question (first entry in a thread) find the best reply from its
> answers.
> - Select the most relevant and well formatted question/answer-pairs for each
> FAQ.
> For the evaluation I'm interested in expert's perceptions of the results,
> e.g. if the questions are relevant, correctly answered, etc.

I think the clusters contain pretty well the correct set of emails, so
well done!

I assume the answer to questions are correct because you can take the
second mail as the answer to the first, isn't? What seems to be
confusing in the answer, is that it is quite hard sometimes to see
where the answer starts and stops: Perhaps because we use to comment
most of the time in line in emails.

Out of curiosity: What did you use for the clustering? Did you look at
or use Mahout for it?

> Also as I'll release a paper about the approach I'd be happy if you could
> rate one or two questions (stars on the details pages) so I'd have some
> statistics to present.


Will it be a publicly available release?

Regards Ard

> Here's the direct link to the Jackrabbit FAQs:
> http://faqcluster.com/jackrabbit-node-jcr-repository-apache
> (There are some other interesting FAQs as well at http://faqcluster.com/)
> Thanks for your help
> Stefan

Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco 755 Baywood Drive, Second Floor •  Petaluma, CA.
94954 •  +1 877 414 4776 (toll free)
Canada    •   Montréal  5369 Boulevard St-Laurent #430 •  Montréal QC
H2T 1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com
This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.

View raw message