mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burcu Buyukkagnici <>
Subject mahout for enterprise search project
Date Tue, 15 Nov 2011 07:12:51 GMT
I'm new to this community. I want to use mahout as a component of an
enterprise search project. The project is at conceptual phase. My business
need is to be able to find everything about a related task and reorganize
the output as a new view. The results should be actionable. Also the system
should be integrated with software development environment tools;
Subversion; JIRA and Redmine; Sharepoint Blogs; wikis and people ( active
Everything means, files, tools and people. Files are mostly text based
(word, pdf, source files);to search audio and video files are further needs.

Where does mahout; Lucene/solr and UIMA framework fit in the following
scenario? And what are the system requirements to setup a development

X is a new project team member in a software development firm. Her project
is a 10 years-old maintainence project mainly; however customers want small
development requests on that platform. Her boss wants her to prepare a
software requirement specification document for a new request. Since she
hasn't prepared an SRS before; she wants to find previously prepared
documents, and asks her collegues to give her a sample.
Her friend gives her a sample based on a very ancient version of SRS from
her local computer. The company has Windows file server, a new content
management system (portal); also some projects use Subversion to store the
docs and also wikis.

   1. There should be a platform that can search files in all these
   2. The system should understand SRS is an outcome of software
   requirements engineering or analysis process.  The system should understand
   SRS, software requirements specification and functional design descriptions
   are similar terms.
   3. The company has manuals, templates and process definitions about
   requirements engineering and has an SRS template which supersedes other
   versions. While searching the system should list organizational docs and
   then project docs related to SRSes.
   4. The project has different SRSes written through 10 years. So the
   system should list that specific projectsSRS templates indicationg version
   conflicts between org. document templates and projects...
   5. Also the system should list the people who involve requirements
   engineering process previously in that project first; then in other
   6. Also system should have a suggestion mechanism. The system should
   know the domain of the project X is workin on and its sub parts. For ex, X
   is working on an e-commerce project. And the new request is about mobile
   payments. In the same company but in a different project; a project team is
   working on e-wallet projects for a bank. Based on her profile, system
   should be able to suggest people, tools and outcomes from the other project
   relating with payments domain.

The domain identification and grouping the related docs, tools and people
in an existing system is nearly not possible manually. I want the system
can identify and cluster the related things itself and also learn and
improve the results by user feedback. Also, some people should give input
to the system by classifying the concepts for the system. Like for example;
I have organizational assets; document; tools; people. The documents are
project docs and organizational docs and they are related. This can be a
guidance for the system.

I think carrot2 is doing sth very similar to what I say; but it has got
file limitation.Anyway, I need a roadmap to initiate a project like
this.Where should I start?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message