ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sandeep rg <sandeep.f...@gmail.com>
Subject Re: to involve in your development group
Date Mon, 22 Jul 2013 14:04:45 GMT
sir,
 i have gone through some of the medical record such as bills,patient
details etc. most of them are printed using dot matrix printer,which is
very hard to extract such type text from scanned images.i have done testing
with some professional software such as abbyy fine reader which also given
a poor output.

but sir i have the confidence to do it.but i need more knowledge about
image processing capabilities.so can you suggest any one who is good in
image processing programming in your team?


On Thu, Jul 18, 2013 at 1:22 AM, sandeep rg <sandeep.foss@gmail.com> wrote:

> i hava done sequence diagram and done some small changes,please go through
> it and tell me if any more thing is to be included
>
>
> On Wed, Jul 17, 2013 at 9:37 PM, sandeep rg <sandeep.foss@gmail.com>wrote:
>
>> it just a skeleton of original proposal
>>
>>
>> On Wed, Jul 17, 2013 at 9:31 PM, sandeep rg <sandeep.foss@gmail.com>wrote:
>>
>>> the sample work is shared with you both.any more details to be included
>>> please tell me.
>>> In which,GUI design,schedule and implementation flow chart design is to
>>> added which is under construction and will be uploaded within few hours.
>>>
>>>
>>> On Wed, Jul 17, 2013 at 7:56 PM, Chen, Pei <
>>> Pei.Chen@childrens.harvard.edu> wrote:
>>>
>>>> pei.station@gmail.com
>>>>
>>>> > -----Original Message-----
>>>> > From: Mattmann, Chris A (398J) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>> > Sent: Wednesday, July 17, 2013 10:22 AM
>>>> > To: dev@ctakes.apache.org
>>>> > Subject: Re: to involve in your development group
>>>> >
>>>> > chris.mattmann@gmail.com
>>>> >
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > ++++++++
>>>> > Chris Mattmann, Ph.D.
>>>> > Senior Computer Scientist
>>>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> > Office: 171-266B, Mailstop: 171-246
>>>> > Email: chris.a.mattmann@nasa.gov
>>>> > WWW:  http://sunset.usc.edu/~mattmann/
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > ++++++++
>>>> > Adjunct Assistant Professor, Computer Science Department University of
>>>> > Southern California, Los Angeles, CA 90089 USA
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > ++++++++
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > -----Original Message-----
>>>> > From: sandeep rg <sandeep.foss@gmail.com>
>>>> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>> > Date: Wednesday, July 17, 2013 6:53 AM
>>>> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>> > Subject: Re: to involve in your development group
>>>> >
>>>> > >can you provide your gmail id to share the proposal document with
>>>> you?
>>>> > >
>>>> > >
>>>> > >
>>>> > >On Tue, Jul 16, 2013 at 11:33 PM, sandeep rg <sandeep.foss@gmail.com
>>>> >
>>>> > >wrote:
>>>> > >
>>>> > >> sir,
>>>> > >> i am providing proposal by two days.now i am mainly going through
>>>> > >>ASF-ICFOSS gateway because if i gone through their way and my
>>>> proposal
>>>> > >>is  get selected,ICFOSS will provide some sort of support such as
>>>> > >>certificates,small financial support etc. to us.
>>>> > >>
>>>> > >>
>>>> > >> but,main thing is i like programming,i like to explore through the
>>>> > >> new technologies in coding and like to interact with the coding.so
>>>> if
>>>> > >> my proposal is got rejected,then also i like to work in your
>>>> project
>>>> > >> as a volunteer if you allow me..
>>>> > >>
>>>> > >> now i am preparing a proposal,within 2 days i will submit
>>>> > >> it..Mattmann chris helped me to know more about the format of
>>>> > proposal.
>>>> > >>
>>>> > >>
>>>> > >> On Tue, Jul 16, 2013 at 8:12 PM, Chen, Pei
>>>> > >><Pei.Chen@childrens.harvard.edu
>>>> > >> > wrote:
>>>> > >>
>>>> > >>> Chris/Sandeep,
>>>> > >>> According to ASF-ICFOSS, I believe the deadline for submitting
>>>> > >>>proposals  is this coming Friday (July 19).
>>>> > >>> After which point, mentors will have 2 weeks to review and
>>>> > >>>score/accept.
>>>> > >>> Just curious, are we planning to follow the same process here?  Or
>>>> > >>>since  it's all volunteer work, technically- sandeep and still
>>>> > >>>contribute code to  the community and participate in the dev group
>>>> > >>>here.
>>>> > >>>
>>>> > >>> Looking forward to it.
>>>> > >>> --Pei
>>>> > >>>
>>>> > >>>
>>>> > >>> > -----Original Message-----
>>>> > >>> > From: sandeep rg [mailto:sandeep.foss@gmail.com]
>>>> > >>> > Sent: Monday, July 15, 2013 1:05 PM
>>>> > >>> > To: dev@ctakes.apache.org
>>>> > >>> > Subject: Re: to involve in your development group
>>>> > >>> >
>>>> > >>> > sir,
>>>> > >>> > i gone through most of the ocr technologies and reached a
>>>> > >>>conclusion.i
>>>> > >>> > would like to use apache tika and java ocr for this pupose.
>>>> > >>> >
>>>> > >>> > Tessearact is a ocr tool,it can be used for extracting from
>>>> > >>> > multiple languages.it is implemented in vc++.so it can acceded
>>>> > >>> > using java
>>>> > >>>native
>>>> > >>> > function.they provided another  tool tess4j but review says that
>>>> > >>> > it
>>>> > >>>has
>>>> > >>> > many bugs.
>>>> > >>> >
>>>> > >>> > Apache tika developed in java language.it can be used to
>>>> extract
>>>> > >>> > text
>>>> > >>> data
>>>> > >>> > from .xls,word,txt,pdf and other many formats.it is easy for
>>>> > >>> implementing
>>>> > >>> > in project also.i have just gone through its implementation way.
>>>> > >>> >
>>>> > >>> > then about javaocr,its good for extrating text from a jpeg or
>>>> > >>> > scanned images.we can train it with various fonts.more we train
>>>> > >>> > more will be
>>>> > >>>its
>>>> > >>> > accuracy but its speed will get decreased.i didn't find any
>>>> > >>>particular
>>>> > >>> > documentation for that.
>>>> > >>> >
>>>> > >>> >
>>>> > >>> >
>>>> > >>> > On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg
>>>> > >>> > <sandeep.foss@gmail.com>
>>>> > >>> > wrote:
>>>> > >>> >
>>>> > >>> > > thanks a lot for both of your support.I will do my best to
>>>> find
>>>> > >>> solution
>>>> > >>> > > for jira problem.i will share the proposal with both of you..
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei
>>>> > >>> > <Pei.Chen@childrens.harvard.edu
>>>> > >>> > > > wrote:
>>>> > >>> > >
>>>> > >>> > >> Sandeep,
>>>> > >>> > >> Its great to have Chris on board as well- he was one of the
>>>> > >>> coordinators
>>>> > >>> > >> of GSoC.
>>>> > >>> > >> Looking forward to it.
>>>> > >>> > >>
>>>> > >>> > >> Sent from my iPhone
>>>> > >>> > >>
>>>> > >>> > >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" <
>>>> > >>> > >> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>> > >>> > >>
>>>> > >>> > >> > Hi Sandeep,
>>>> > >>> > >> >
>>>> > >>> > >> > That is great news, and good job. OK, for some ideas about
>>>> > >>> developing
>>>> > >>> > >> > your proposal, you may want to simply start with a Google
>>>> > >>> > >> > Docs,
>>>> > >>>and
>>>> > >>> > then
>>>> > >>> > >> > share it with Pei. I'd be happy to help co-mentor if Pei
>>>> and
>>>> > >>> > >> > you
>>>> > >>> think
>>>> > >>> > >> > it's useful too.
>>>> > >>> > >> >
>>>> > >>> > >> > Your proposal should likely cover:
>>>> > >>> > >> >
>>>> > >>> > >> > 1. Background - what's the state of CTAKES-189 and what's
>>>> it
>>>> > >>> trying to
>>>> > >>> > >> > accomplish
>>>> > >>> > >> >  (include some figures, etc. along with your text)
>>>> > >>> > >> >
>>>> > >>> > >> > 2. Approach - what are you going to do to solve CTAKES-189.
>>>> > >>> > >> > Be
>>>> > >>> specific,
>>>> > >>> > >> > and
>>>> > >>> > >> >  try to break it down into smaller, easily reversible steps
>>>> > >>> > >> >
>>>> > >>> > >> > 3. Schedule - how long and what is the schedule for
>>>> achieving
>>>> > >>>this?
>>>> > >>> > >> >
>>>> > >>> > >> > 4. Risks/etc. - any known risks like are you taking a
>>>> > >>> > >> > vacation
>>>> > >>> anytime
>>>> > >>> > >> > soon :)
>>>> > >>> > >> >  or are there other time constraints?
>>>> > >>> > >> >
>>>> > >>> > >> > 5. References, etc.
>>>> > >>> > >> >
>>>> > >>> > >> > HTH and I'd be happy if you want to share the GDocs with me
>>>> > >>> > >> > as
>>>> > >>>you
>>>> > >>> > >> develop
>>>> > >>> > >> > it.
>>>> > >>> > >> >
>>>> > >>> > >> > Cheers!
>>>> > >>> > >> >
>>>> > >>> > >> > Chris
>>>> > >>> > >> >
>>>> > >>> > >> >
>>>> > >>> >
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > >>> > ++++++++
>>>> > >>> > >> > Chris Mattmann, Ph.D.
>>>> > >>> > >> > Senior Computer Scientist
>>>> > >>> > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> > >>> > >> > Office: 171-266B, Mailstop: 171-246
>>>> > >>> > >> > Email: chris.a.mattmann@nasa.gov
>>>> > >>> > >> > WWW:  http://sunset.usc.edu/~mattmann/
>>>> > >>> > >> >
>>>> > >>> >
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > >>> > ++++++++
>>>> > >>> > >> > Adjunct Assistant Professor, Computer Science Department
>>>> > >>> > >> > University of Southern California, Los Angeles, CA 90089
>>>> USA
>>>> > >>> > >> >
>>>> > >>> >
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > >>> > ++++++++
>>>> > >>> > >> >
>>>> > >>> > >> >
>>>> > >>> > >> >
>>>> > >>> > >> >
>>>> > >>> > >> >
>>>> > >>> > >> >
>>>> > >>> > >> > -----Original Message-----
>>>> > >>> > >> > From: sandeep rg <sandeep.foss@gmail.com>
>>>> > >>> > >> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>> > >>> > >> > Date: Saturday, July 13, 2013 8:57 AM
>>>> > >>> > >> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>> > >>> > >> > Subject: Re: to involve in your development group
>>>> > >>> > >> >
>>>> > >>> > >> >> i have also gone through the technologies available for
>>>> > >>> development
>>>> > >>> > of
>>>> > >>> > >> >> ocr,from that i think apache tika and tessearact is best
>>>> for
>>>> > >>> resolving
>>>> > >>> > >> the
>>>> > >>> > >> >> problem.
>>>> > >>> > >> >>
>>>> > >>> > >> >>
>>>> > >>> > >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg
>>>> > >>> > <sandeep.foss@gmail.com>
>>>> > >>> > >> >> wrote:
>>>> > >>> > >> >>
>>>> > >>> > >> >>> hi Mattamann Chris,
>>>> > >>> > >> >>> i has participated in the event coordinated by luciano
>>>> > >>> > >> >>> resende
>>>> > >>> > >> >>>
>>>> > >>> > >> >>> http://community.apache.org/mentoringprogramme-icfoss-
>>>> > >>> > pilot.html
>>>> > >>> > >> >>>
>>>> > >>> > >> >>> and from that i learned about open source and like to
>>>> work
>>>> > >>> > >> >>> on
>>>> > >>> your
>>>> > >>> > >> >>> project
>>>> > >>> > >> >>> ctakes.i would like to fix the jira
>>>> > >>> > >> >>>
>>>> > >>> > >> >>> https://issues.apache.org/jira/browse/CTAKES-189
>>>> > >>> > >> >>>
>>>> > >>> > >> >>> chen pei accepted my requested to be my mentor.now i want
>>>> > >>> > >> >>> to
>>>> > >>>give
>>>> > >>> > a
>>>> > >>> > >> >>> proposal to apache about the project i am going to work
>>>> > >>> > >> >>> on.can
>>>> > >>> you
>>>> > >>> > >> help
>>>> > >>> > >> >>> me
>>>> > >>> > >> >>> to prepare a proposal to be submitted before 18 th of
>>>> this
>>>> > >>>july.
>>>> > >>> > >> >>>
>>>> > >>> > >> >>>
>>>> > >>> > >> >>>
>>>> > >>> > >> >>>
>>>> > >>> > >> >>>
>>>> > >>> > >> >>>
>>>> > >>> > >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A
>>>> (398J) <
>>>> > >>> > >> >>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>> > >>> > >> >>>
>>>> > >>> > >> >>>> Hi Sandeep,
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>> I think the best thing to do is:
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>> 1. Develop a JIRA issue here:
>>>> > >>> > >> >>>> https://issues.apache.org/jira/browse/CTAKES
>>>> > >>> > >> >>>> 1a. you can register for a new account on JIRA 2. Once
>>>> > >>> > >> >>>> your JIRA issue is created, feel free to start a
>>>> > >>> [DISCUSS]
>>>> > >>> > >> >>>> thread
>>>> > >>> > >> >>>> (e.g., with subject [DISCUSS] "some topic" where "some
>>>> > >>>topic" is
>>>> > >>> > >> >>>> perhaps
>>>> > >>> > >> >>>> the main idea you have) on dev@ctakes.apache.org,
>>>> > >>> > >> >>>> referencing
>>>> > >>> > your
>>>> > >>> > >> >>>> issue
>>>> > >>> > >> >>>> and
>>>> > >>> > >> >>>> asking for feedback
>>>> > >>> > >> >>>> 3. Work with the Apache cTAKES PMC and committers to get
>>>> > >>> > >> >>>> your
>>>> > >>> > patches
>>>> > >>> > >> >>>> and
>>>> > >>> > >> >>>> other items attached to your issue from #1 committed
>>>> into
>>>> > >>> > >> >>>> the
>>>> > >>> > sources
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>> Ideally if 1-3 happen and it's a good interaction,
>>>> Apache
>>>> > >>> > >> >>>> is
>>>> > >>> built on
>>>> > >>> > >> >>>> meritocracy and you could possibly earn the merit to
>>>> > >>> > >> >>>> become a
>>>> > >>> PMC
>>>> > >>> > >> >>>> member
>>>> > >>> > >> >>>> or committer on the project.
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>> Cheers,
>>>> > >>> > >> >>>> Chris
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>>
>>>> > >>> >
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > >>> > ++++++++
>>>> > >>> > >> >>>> Chris Mattmann, Ph.D.
>>>> > >>> > >> >>>> Senior Computer Scientist
>>>> > >>> > >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> > >>> > >> >>>> Office: 171-266B, Mailstop: 171-246
>>>> > >>> > >> >>>> Email: chris.a.mattmann@nasa.gov
>>>> > >>> > >> >>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>> > >>> > >> >>>>
>>>> > >>> >
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > >>> > ++++++++
>>>> > >>> > >> >>>> Adjunct Assistant Professor, Computer Science Department
>>>> > >>> > >> >>>> University of Southern California, Los Angeles, CA 90089
>>>> > >>> > >> >>>> USA
>>>> > >>> > >> >>>>
>>>> > >>> >
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > >>> > ++++++++
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>> -----Original Message-----
>>>> > >>> > >> >>>> From: sandeep rg <sandeep.foss@gmail.com>
>>>> > >>> > >> >>>> Reply-To: "dev@ctakes.apache.org"
>>>> > <dev@ctakes.apache.org>
>>>> > >>> > >> >>>> Date: Thursday, July 11, 2013 11:30 AM
>>>> > >>> > >> >>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>> > >>> > >> >>>> Subject: Re: to involve in your development group
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>>> can you provide what all details i should include in a
>>>> > >>> > >> >>>> proposal?whether i
>>>> > >>> > >> >>>>> wanted to include all implemetation(technical) details
>>>> in
>>>> > >>>the
>>>> > >>> > >> >>>> proposal?
>>>> > >>> > >> >>>>>
>>>> > >>> > >> >>>>>
>>>> > >>> > >> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A
>>>> (398J)
>>>> > >>> > >> >>>>> < chris.a.mattmann@jpl.nasa.gov> wrote:
>>>> > >>> > >> >>>>>
>>>> > >>> > >> >>>>>> Dear Sandeep,
>>>> > >>> > >> >>>>>>
>>>> > >>> > >> >>>>>> Thanks for your interest in cTAKES. We would welcome
>>>> > >>> > >> >>>>>> your
>>>> > >>> > >> >>>> contribution
>>>> > >>> > >> >>>>>> and are happy to have your interest in the project.
>>>> > >>> > >> >>>>>>
>>>> > >>> > >> >>>>>> Cheers,
>>>> > >>> > >> >>>>>> Chris
>>>> > >>> > >> >>>>>>
>>>> > >>> > >> >>>>>>
>>>> > >>> >
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > >>> > ++++++++
>>>> > >>> > >> >>>>>> Chris Mattmann, Ph.D.
>>>> > >>> > >> >>>>>> Senior Computer Scientist NASA Jet Propulsion
>>>> Laboratory
>>>> > >>> > >> >>>>>> Pasadena, CA 91109 USA
>>>> > >>> > >> >>>>>> Office: 171-266B, Mailstop: 171-246
>>>> > >>> > >> >>>>>> Email: chris.a.mattmann@nasa.gov
>>>> > >>> > >> >>>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>> > >>> > >> >>>>>>
>>>> > >>> >
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > >>> > ++++++++
>>>> > >>> > >> >>>>>> Adjunct Assistant Professor, Computer Science
>>>> > Department
>>>> > >>> > >> >>>>>> University of Southern California, Los Angeles, CA
>>>> 90089
>>>> > >>>USA
>>>> > >>> > >> >>>>>>
>>>> > >>> >
>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> > >>> > ++++++++
>>>> > >>> > >> >>>>>>
>>>> > >>> > >> >>>>>>
>>>> > >>> > >> >>>>>>
>>>> > >>> > >> >>>>>>
>>>> > >>> > >> >>>>>>
>>>> > >>> > >> >>>>>>
>>>> > >>> > >> >>>>>> -----Original Message-----
>>>> > >>> > >> >>>>>> From: sandeep rg <sandeep.foss@gmail.com>
>>>> > >>> > >> >>>>>> Reply-To: "dev@ctakes.apache.org"
>>>> > >>> > >> >>>>>> <dev@ctakes.apache.org>
>>>> > >>> > >> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM
>>>> > >>> > >> >>>>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>>>> > >>> > >> >>>>>> Subject: Re: to involve in your development group
>>>> > >>> > >> >>>>>>
>>>> > >>> > >> >>>>>>> sir,
>>>> > >>> > >> >>>>>>>
>>>> > >>> > >> >>>>>>> My name is sandeep rg.i am a btech graduate in
>>>> computer
>>>> > >>> > >> science.now
>>>> > >>> > >> >>>>>> doing
>>>> > >>> > >> >>>>>>> an internship in a company in java language.
>>>> > >>> > >> >>>>>>>
>>>> > >>> > >> >>>>>>> then  i had installed all things succesfully,now
>>>> > >>>downloading
>>>> > >>> the
>>>> > >>> > >> >>>>>>> resource.ittake too much time.
>>>> > >>> > >> >>>>>>>
>>>> > >>> > >> >>>>>>> i have gone through the suggested ocr technologies.
>>>> > >>> > >> >>>>>>> Javaocr has some good user review.
>>>> > >>> > >> >>>>>>> Apache tika has a capability to process different
>>>> types
>>>> > >>> > >> >>>>>>> of
>>>> > >>> format.
>>>> > >>> > >> >>>>>>> More than that there is tesserract which are also
>>>> used
>>>> > >>> > >> >>>>>>> for
>>>> > >>> ocr
>>>> > >>> > >> >>>> purpose.
>>>> > >>> > >> >>>>>>> then apache pdfbox is also used for text extratcion
>>>> but
>>>> > >>>only
>>>> > >>> for
>>>> > >>> > >> >>>> pdf
>>>> > >>> > >> >>>>>>> files.
>>>> > >>> > >> >>>>>>> now i am going through every thing to find out best
>>>> > >>> technology
>>>> > >>> > >> from
>>>> > >>> > >> >>>>>> this.
>>>> > >>> > >> >>>>>>>
>>>> > >>> > >> >>>>>>>
>>>> > >>> > >> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei
>>>> > >>> > >> >>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
>>>> > >>> > >> >>>>>>>
>>>> > >>> > >> >>>>>>>> Hi Sandeep,
>>>> > >>> > >> >>>>>>>> I am delighted to work with you on this project.
>>>> > >>> > >> >>>>>>>>
>>>> > >>> > >> >>>>>>>> I was not sure if I understood you correctly- did
>>>> you
>>>> > >>>mean
>>>> > >>> to
>>>> > >>> > say
>>>> > >>> > >> >>>>>> that
>>>> > >>> > >> >>>>>>>> you
>>>> > >>> > >> >>>>>>>> have already tried using cTAKES and it's components?
>>>> > >>> > >> >>>>>>>> If not, you can do an svn checkout of the code and
>>>> try
>>>> > >>> running
>>>> > >>> > >> >>>> the
>>>> > >>> > >> >>>>>>>> debugger gui from the command line (or eclipseide)
>>>> > >>> > >> >>>>>>>> that
>>>> > >>>will
>>>> > >>> > >> >>>> allow
>>>> > >>> > >> >>>>>> you
>>>> > >>> > >> >>>>>>>> to
>>>> > >>> > >> >>>>>>>> type in plain text and get back the different
>>>> > >>> > >> >>>>>>>> structured
>>>> > >>> content
>>>> > >>> > >> >>>>>> (types)
>>>> > >>> > >> >>>>>>>> that cTAKES produces:
>>>> > >>> > >> >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g"
>>>> > >>> > >> >>>>>>>> mvn -PrunCVD compile
>>>> > >>> > >> >>>>>>>> From the guide:
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>>
>>>> > >>> > >>
>>>> > >>> >
>>>> > >>>
>>>> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Devel
>>>> > op
>>>> > >>>e
>>>> > >>> > r+
>>>> > >>> > >> >>>> I
>>>> > >>> > >> >>>>>>>> nstall+Guide
>>>> > >>> > >> >>>>>>>>
>>>> > >>> > >> >>>>>>>> A bit of background:
>>>> > >>> > >> >>>>>>>> Apache cTAKES uses SVN for version on control:
>>>> > >>> > >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/
>>>> > >>> > >> >>>>>>>> Jira for issues tracking:
>>>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes
>>>> > >>> > >> >>>>>>>> Maven for building and dependency management.
>>>> > >>> > >> >>>>>>>> A lot of the developers use Eclipse IDE for their
>>>> > >>> development.
>>>> > >>> > >> >>>>>>>> More info on ctakes.apache.org
>>>> > >>> > >> >>>>>>>>
>>>> > >>> > >> >>>>>>>> cTAKES is built on top of the Apache UIMA Framework.
>>>> > >>> > >> >>>> Essentially,
>>>> > >>> > >> >>>>>>>> cTAKES
>>>> > >>> > >> >>>>>>>> is a collection of Annotators (Java Classes) and
>>>> wired
>>>> > >>> together
>>>> > >>> > >> >>>> to
>>>> > >>> > >> >>>>>> into
>>>> > >>> > >> >>>>>>>> a
>>>> > >>> > >> >>>>>>>> pipeline.
>>>> > >>> > >> >>>>>>>> It's goal in a nutshell is to turn unstructured
>>>> plain
>>>> > >>>text
>>>> > >>> into
>>>> > >>> > >> >>>>>>>> structured/normalized form and specially trained for
>>>> > >>>medical
>>>> > >>> > >> >>>> notes.
>>>> > >>> > >> >>>>>>>> Right now- the input cTAKES expects would be in
>>>> plain
>>>> > >>>text
>>>> > >>> > form
>>>> > >>> > >> >>>> and
>>>> > >>> > >> >>>>>>>> cTAKES
>>>> > >>> > >> >>>>>>>> does not have an OCR component.
>>>> > >>> > >> >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize
>>>> text
>>>> > >>> > inputs was
>>>> > >>> > >> >>>> an
>>>> > >>> > >> >>>>>> idea
>>>> > >>> > >> >>>>>>>> to allow cTAKES to take in any type of input (PDF,
>>>> > >>>Images,
>>>> > >>> > Word,
>>>> > >>> > >> >>>> XLS,
>>>> > >>> > >> >>>>>>>> etc.)
>>>> > >>> > >> >>>>>>>> and pass the text for cTAKES processing.
>>>> > >>> > >> >>>>>>>> [I was originally thinking this could be done in
>>>> some
>>>> > >>>kind
>>>> > >>> of
>>>> > >>> > >> >>>>>>>> preprocessing, or an optional Annotator that could
>>>> be
>>>> > >>>added
>>>> > >>> in
>>>> > >>> > >> >>>> the
>>>> > >>> > >> >>>>>>>> beginning of a pipeline].  There may be some
>>>> existing
>>>> > >>>work
>>>> > >>> > that
>>>> > >>> > >> >>>>>> could be
>>>> > >>> > >> >>>>>>>> potentially reused: Apache Tika (
>>>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93 ) as
>>>> > >>> > >> >>>>>>>> well
>>>> > >>>as
>>>> > >>> > some
>>>> > >>> > >> >>>> open
>>>> > >>> > >> >>>>>>>> source OCR toolkits (JavaOCR).
>>>> > >>> > >> >>>>>>>>
>>>> > >>> > >> >>>>>>>> About Me:
>>>> > >>> > >> >>>>
>>>> > >>> > >> >>>>
>>>> > >>> > >>
>>>> > >>> >
>>>> > >>>
>>>> > >>>
>>>> http://childrenshospital.org/cfapps/research/data_admin/Site3240/main
>>>> > >>>pag
>>>> > >>> > >> >>>> e
>>>> > >>> > >> >>>>>>>> S3240P8.html
>>>> > >>> > >> >>>>>>>> http://www.linkedin.com/in/peistation
>>>> > >>> > >> >>>>>>>> http://people.apache.org/committer-
>>>> > index.html#chenpei
>>>> > >>> > >> >>>>>>>>
>>>> > >>> > >> >>>>>>>>> -----Original Message-----
>>>> > >>> > >> >>>>>>>>> From: sandeep rg [mailto:sandeep.foss@gmail.com]
>>>> > >>> > >> >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19 PM
>>>> > >>> > >> >>>>>>>>> To: dev@ctakes.apache.org
>>>> > >>> > >> >>>>>>>>> Subject: Re: to involve in your development group
>>>> > >>> > >> >>>>>>>>>
>>>> > >>> > >> >>>>>>>>> Thanks a lot for giving me support.i like to work
>>>> > >>> > >> >>>>>>>>> with
>>>> > >>>you.
>>>> > >>> > >> >>>>>>>>>
>>>> > >>> > >> >>>>>>>>> I have gone through the objectives of the
>>>> > >>> > >> >>>>>>>>> software,used
>>>> > >>>the
>>>> > >>> > >> >>>>>> software
>>>> > >>> > >> >>>>>>>> and
>>>> > >>> > >> >>>>>>>>> gone through various components of the project.can
>>>> > >>> > >> >>>>>>>>> you
>>>> > >>> > provide
>>>> > >>> > >> >>>> me
>>>> > >>> > >> >>>>>>>> starting
>>>> > >>> > >> >>>>>>>>> point from where i should start to know more about
>>>> > >>> > >> >>>>>>>>> the
>>>> > >>> > coding
>>>> > >>> > >> >>>> part
>>>> > >>> > >> >>>>>> of
>>>> > >>> > >> >>>>>>>> the
>>>> > >>> > >> >>>>>>>>> project.
>>>> > >>> > >> >>>>>>>>>
>>>> > >>> > >> >>>>>>>>> can you tell me more about the project and about
>>>> you
>>>> > >>>also?
>>>> > >>> > >> >>>>>>>>>
>>>> > >>> > >> >>>>>>>>>
>>>> > >>> > >> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei
>>>> > >>> > >> >>>>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
>>>> > >>> > >> >>>>>>>>>
>>>> > >>> > >> >>>>>>>>>> Hi Sandeep,
>>>> > >>> > >> >>>>>>>>>> Thank you for the interest.  I just had a quick
>>>> look
>>>> > >>> > >> >>>>>>>>>> at
>>>> > >>> the
>>>> > >>> > >> >>>>>> ICFOSS
>>>> > >>> > >> >>>>>>>>>> pilot mentoring program and will be happy to serve
>>>> > >>> > >> >>>>>>>>>> as a
>>>> > >>> > >> >>>> mentor
>>>> > >>> > >> >>>>>> for
>>>> > >>> > >> >>>>>>>>>> your project
>>>> > >>> > >> >>>>>>>>>> proposal(s) if you are interested.
>>>> > >>> > >> >>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>> --Pei
>>>> > >>> > >> >>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>> -----Original Message-----
>>>> > >>> > >> >>>>>>>>>>> From: sandeep rg [mailto:sandeep.foss@gmail.com]
>>>> > >>> > >> >>>>>>>>>>> Sent: Monday, July 08, 2013 2:24 PM
>>>> > >>> > >> >>>>>>>>>>> To: dev@ctakes.apache.org
>>>> > >>> > >> >>>>>>>>>>> Subject: Re: to involve in your development group
>>>> > >>> > >> >>>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>> sir,
>>>> > >>> > >> >>>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>> details of the program Pilot mentoring programme
>>>> > >>> > >> >>>>>>>>>>> with
>>>> > >>> > india
>>>> > >>> > >> >>>>>> ICFOSS
>>>> > >>> > >> >>>>>>>>>>> is
>>>> > >>> > >> >>>>>>>>>> given
>>>> > >>> > >> >>>>>>>>>>> in the below web address
>>>> > >>> > >> >>>>>> http://community.apache.org/mentoringprogramme-
>>>> > icfoss-
>>>> > >>> > pilot.html
>>>> > >>> > >> >>>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>> I am new to this community so i need a mentor for
>>>> > >>> > >> >>>>>>>>>>> the
>>>> > >>> > >> >>>>>> project.It
>>>> > >>> > >> >>>>>>>>>>> will be
>>>> > >>> > >> >>>>>>>>>> more
>>>> > >>> > >> >>>>>>>>>>> helpful for me..
>>>> > >>> > >> >>>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei
>>>> > >>> > >> >>>>>>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
>>>> > >>> > >> >>>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>>> Hi Sandeep,
>>>> > >>> > >> >>>>>>>>>>>> Welcome!  I am not familiar with the details of
>>>> > >>> > >> >>>>>> icfoss-apache,
>>>> > >>> > >> >>>>>>>> but
>>>> > >>> > >> >>>>>>>>>>>> please- you are more than welcome to work on the
>>>> > >>> > >> >>>>>>>>>>>> code
>>>> > >>> > and
>>>> > >>> > >> >>>>>>>>>>>> contributions will be greatly appreciated!
>>>> > >>> > >> >>>>>>>>>>>> There may be a learning curve, but feel free let
>>>> > >>> > >> >>>>>>>>>>>> us
>>>> > >>>know
>>>> > >>> > >> >>>> if
>>>> > >>> > >> >>>>>> you
>>>> > >>> > >> >>>>>>>>>>>> have any questions/issues.
>>>> > >>> > >> >>>>>>>>>>>> Thanks,
>>>> > >>> > >> >>>>>>>>>>>> Pei
>>>> > >>> > >> >>>>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>>>> -----Original Message-----
>>>> > >>> > >> >>>>>>>>>>>>> From: sandeep rg
>>>> > [mailto:sandeep.foss@gmail.com]
>>>> > >>> > >> >>>>>>>>>>>>> Sent: Saturday, July 06, 2013 11:50 AM
>>>> > >>> > >> >>>>>>>>>>>>> To: dev@ctakes.apache.org
>>>> > >>> > >> >>>>>>>>>>>>> Subject: to involve in your development group
>>>> > >>> > >> >>>>>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>>>> my name is sandeep.i am btech graduate.i had
>>>> > >>> > >> >>>> participated
>>>> > >>> > >> >>>>>> in
>>>> > >>> > >> >>>>>>>> a
>>>> > >>> > >> >>>>>>>>>>>>> camp coordinated in kerala,India in association
>>>> > >>> > >> >>>>>>>>>>>>> with icfoss-apache called as
>>>> > >>> > >> >>>>>>>>>>>> youth
>>>> > >>> > >> >>>>>>>>>>>>> mentoring programme coordinated by Luciano
>>>> > resende.
>>>> > >>> > >> >>>>>>>>>>>>>
>>>> > >>> > >> >>>>>>>>>>>>>                                        i like
>>>> the
>>>> > >>> > >> >>>> project
>>>> > >>> > >> >>>>>> and
>>>> > >>> > >> >>>>>>>>>>>>> like to
>>>> > >>> > >> >>>>>>>>>>>> involve in your project as a
>>>> > >>> > >> >>>>>>>>>>>>> programmer.i have gone through the your project
>>>> > >>> > >> >>>>>>>>>>>>> and
>>>> > >>> > >> >>>> gone
>>>> > >>> > >> >>>>>>>> through
>>>> > >>> > >> >>>>>>>>>>>>> the bugs list.I like to work on the bug
>>>> > >>> > >> >>>>>>>>>>>>> "cTAKE-189:GSoC:implement OCR/tika to
>>>> > standardize
>>>> > >>> > text
>>>> > >>> > >> >>>>>> inputs
>>>> > >>> > >> >>>>>>>>>>>>> for cTAKES".can you allow me to
>>>> > >>> > >> >>>>>>>>>> work
>>>> > >>> > >> >>>>>>>>>>> on that?
>>>> > >>> > >> >
>>>> > >>> > >>
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>>
>>>> > >>
>>>> > >>
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message