ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sandeep rg <sandeep.f...@gmail.com>
Subject Re: to involve in your development group
Date Tue, 23 Jul 2013 15:56:52 GMT
thank you Finan sean, for your suggestion,i am now just going through the
JAI,i think it has more features then javaocr..



On Mon, Jul 22, 2013 at 10:22 PM, Mattmann, Chris A (398J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Sandeep,
>
> I'll try and review this today.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: sandeep rg <sandeep.foss@gmail.com>
> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> Date: Monday, July 22, 2013 7:04 AM
> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> Subject: Re: to involve in your development group
>
> >sir,
> > i have gone through some of the medical record such as bills,patient
> >details etc. most of them are printed using dot matrix printer,which is
> >very hard to extract such type text from scanned images.i have done
> >testing
> >with some professional software such as abbyy fine reader which also given
> >a poor output.
> >
> >but sir i have the confidence to do it.but i need more knowledge about
> >image processing capabilities.so can you suggest any one who is good in
> >image processing programming in your team?
> >
> >
> >On Thu, Jul 18, 2013 at 1:22 AM, sandeep rg <sandeep.foss@gmail.com>
> >wrote:
> >
> >> i hava done sequence diagram and done some small changes,please go
> >>through
> >> it and tell me if any more thing is to be included
> >>
> >>
> >> On Wed, Jul 17, 2013 at 9:37 PM, sandeep rg
> >><sandeep.foss@gmail.com>wrote:
> >>
> >>> it just a skeleton of original proposal
> >>>
> >>>
> >>> On Wed, Jul 17, 2013 at 9:31 PM, sandeep rg
> >>><sandeep.foss@gmail.com>wrote:
> >>>
> >>>> the sample work is shared with you both.any more details to be
> >>>>included
> >>>> please tell me.
> >>>> In which,GUI design,schedule and implementation flow chart design is
> >>>>to
> >>>> added which is under construction and will be uploaded within few
> >>>>hours.
> >>>>
> >>>>
> >>>> On Wed, Jul 17, 2013 at 7:56 PM, Chen, Pei <
> >>>> Pei.Chen@childrens.harvard.edu> wrote:
> >>>>
> >>>>> pei.station@gmail.com
> >>>>>
> >>>>> > -----Original Message-----
> >>>>> > From: Mattmann, Chris A (398J)
> >>>>>[mailto:chris.a.mattmann@jpl.nasa.gov]
> >>>>> > Sent: Wednesday, July 17, 2013 10:22 AM
> >>>>> > To: dev@ctakes.apache.org
> >>>>> > Subject: Re: to involve in your development group
> >>>>> >
> >>>>> > chris.mattmann@gmail.com
> >>>>> >
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > ++++++++
> >>>>> > Chris Mattmann, Ph.D.
> >>>>> > Senior Computer Scientist
> >>>>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>>> > Office: 171-266B, Mailstop: 171-246
> >>>>> > Email: chris.a.mattmann@nasa.gov
> >>>>> > WWW:  http://sunset.usc.edu/~mattmann/
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > ++++++++
> >>>>> > Adjunct Assistant Professor, Computer Science Department
> >>>>>University of
> >>>>> > Southern California, Los Angeles, CA 90089 USA
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > ++++++++
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > -----Original Message-----
> >>>>> > From: sandeep rg <sandeep.foss@gmail.com>
> >>>>> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> >>>>> > Date: Wednesday, July 17, 2013 6:53 AM
> >>>>> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> >>>>> > Subject: Re: to involve in your development group
> >>>>> >
> >>>>> > >can you provide your gmail id to share the proposal document with
> >>>>> you?
> >>>>> > >
> >>>>> > >
> >>>>> > >
> >>>>> > >On Tue, Jul 16, 2013 at 11:33 PM, sandeep rg
> >>>>><sandeep.foss@gmail.com
> >>>>> >
> >>>>> > >wrote:
> >>>>> > >
> >>>>> > >> sir,
> >>>>> > >> i am providing proposal by two days.now i am mainly going
> >>>>>through
> >>>>> > >>ASF-ICFOSS gateway because if i gone through their way and my
> >>>>> proposal
> >>>>> > >>is  get selected,ICFOSS will provide some sort of support such as
> >>>>> > >>certificates,small financial support etc. to us.
> >>>>> > >>
> >>>>> > >>
> >>>>> > >> but,main thing is i like programming,i like to explore through
> >>>>>the
> >>>>> > >> new technologies in coding and like to interact with the
> >>>>>coding.so
> >>>>> if
> >>>>> > >> my proposal is got rejected,then also i like to work in your
> >>>>> project
> >>>>> > >> as a volunteer if you allow me..
> >>>>> > >>
> >>>>> > >> now i am preparing a proposal,within 2 days i will submit
> >>>>> > >> it..Mattmann chris helped me to know more about the format of
> >>>>> > proposal.
> >>>>> > >>
> >>>>> > >>
> >>>>> > >> On Tue, Jul 16, 2013 at 8:12 PM, Chen, Pei
> >>>>> > >><Pei.Chen@childrens.harvard.edu
> >>>>> > >> > wrote:
> >>>>> > >>
> >>>>> > >>> Chris/Sandeep,
> >>>>> > >>> According to ASF-ICFOSS, I believe the deadline for submitting
> >>>>> > >>>proposals  is this coming Friday (July 19).
> >>>>> > >>> After which point, mentors will have 2 weeks to review and
> >>>>> > >>>score/accept.
> >>>>> > >>> Just curious, are we planning to follow the same process here?
> >>>>> Or
> >>>>> > >>>since  it's all volunteer work, technically- sandeep and still
> >>>>> > >>>contribute code to  the community and participate in the dev
> >>>>>group
> >>>>> > >>>here.
> >>>>> > >>>
> >>>>> > >>> Looking forward to it.
> >>>>> > >>> --Pei
> >>>>> > >>>
> >>>>> > >>>
> >>>>> > >>> > -----Original Message-----
> >>>>> > >>> > From: sandeep rg [mailto:sandeep.foss@gmail.com]
> >>>>> > >>> > Sent: Monday, July 15, 2013 1:05 PM
> >>>>> > >>> > To: dev@ctakes.apache.org
> >>>>> > >>> > Subject: Re: to involve in your development group
> >>>>> > >>> >
> >>>>> > >>> > sir,
> >>>>> > >>> > i gone through most of the ocr technologies and reached a
> >>>>> > >>>conclusion.i
> >>>>> > >>> > would like to use apache tika and java ocr for this pupose.
> >>>>> > >>> >
> >>>>> > >>> > Tessearact is a ocr tool,it can be used for extracting from
> >>>>> > >>> > multiple languages.it is implemented in vc++.so it can
> >>>>>acceded
> >>>>> > >>> > using java
> >>>>> > >>>native
> >>>>> > >>> > function.they provided another  tool tess4j but review says
> >>>>>that
> >>>>> > >>> > it
> >>>>> > >>>has
> >>>>> > >>> > many bugs.
> >>>>> > >>> >
> >>>>> > >>> > Apache tika developed in java language.it can be used to
> >>>>> extract
> >>>>> > >>> > text
> >>>>> > >>> data
> >>>>> > >>> > from .xls,word,txt,pdf and other many formats.it is easy for
> >>>>> > >>> implementing
> >>>>> > >>> > in project also.i have just gone through its implementation
> >>>>>way.
> >>>>> > >>> >
> >>>>> > >>> > then about javaocr,its good for extrating text from a jpeg or
> >>>>> > >>> > scanned images.we can train it with various fonts.more we
> >>>>>train
> >>>>> > >>> > more will be
> >>>>> > >>>its
> >>>>> > >>> > accuracy but its speed will get decreased.i didn't find any
> >>>>> > >>>particular
> >>>>> > >>> > documentation for that.
> >>>>> > >>> >
> >>>>> > >>> >
> >>>>> > >>> >
> >>>>> > >>> > On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg
> >>>>> > >>> > <sandeep.foss@gmail.com>
> >>>>> > >>> > wrote:
> >>>>> > >>> >
> >>>>> > >>> > > thanks a lot for both of your support.I will do my best to
> >>>>> find
> >>>>> > >>> solution
> >>>>> > >>> > > for jira problem.i will share the proposal with both of
> >>>>>you..
> >>>>> > >>> > >
> >>>>> > >>> > >
> >>>>> > >>> > >
> >>>>> > >>> > > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei
> >>>>> > >>> > <Pei.Chen@childrens.harvard.edu
> >>>>> > >>> > > > wrote:
> >>>>> > >>> > >
> >>>>> > >>> > >> Sandeep,
> >>>>> > >>> > >> Its great to have Chris on board as well- he was one of
> >>>>>the
> >>>>> > >>> coordinators
> >>>>> > >>> > >> of GSoC.
> >>>>> > >>> > >> Looking forward to it.
> >>>>> > >>> > >>
> >>>>> > >>> > >> Sent from my iPhone
> >>>>> > >>> > >>
> >>>>> > >>> > >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" <
> >>>>> > >>> > >> chris.a.mattmann@jpl.nasa.gov> wrote:
> >>>>> > >>> > >>
> >>>>> > >>> > >> > Hi Sandeep,
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > That is great news, and good job. OK, for some ideas
> >>>>>about
> >>>>> > >>> developing
> >>>>> > >>> > >> > your proposal, you may want to simply start with a
> >>>>>Google
> >>>>> > >>> > >> > Docs,
> >>>>> > >>>and
> >>>>> > >>> > then
> >>>>> > >>> > >> > share it with Pei. I'd be happy to help co-mentor if Pei
> >>>>> and
> >>>>> > >>> > >> > you
> >>>>> > >>> think
> >>>>> > >>> > >> > it's useful too.
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > Your proposal should likely cover:
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > 1. Background - what's the state of CTAKES-189 and
> >>>>>what's
> >>>>> it
> >>>>> > >>> trying to
> >>>>> > >>> > >> > accomplish
> >>>>> > >>> > >> >  (include some figures, etc. along with your text)
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > 2. Approach - what are you going to do to solve
> >>>>>CTAKES-189.
> >>>>> > >>> > >> > Be
> >>>>> > >>> specific,
> >>>>> > >>> > >> > and
> >>>>> > >>> > >> >  try to break it down into smaller, easily reversible
> >>>>>steps
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > 3. Schedule - how long and what is the schedule for
> >>>>> achieving
> >>>>> > >>>this?
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > 4. Risks/etc. - any known risks like are you taking a
> >>>>> > >>> > >> > vacation
> >>>>> > >>> anytime
> >>>>> > >>> > >> > soon :)
> >>>>> > >>> > >> >  or are there other time constraints?
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > 5. References, etc.
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > HTH and I'd be happy if you want to share the GDocs
> >>>>>with me
> >>>>> > >>> > >> > as
> >>>>> > >>>you
> >>>>> > >>> > >> develop
> >>>>> > >>> > >> > it.
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > Cheers!
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > Chris
> >>>>> > >>> > >> >
> >>>>> > >>> > >> >
> >>>>> > >>> >
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > >>> > ++++++++
> >>>>> > >>> > >> > Chris Mattmann, Ph.D.
> >>>>> > >>> > >> > Senior Computer Scientist
> >>>>> > >>> > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>>> > >>> > >> > Office: 171-266B, Mailstop: 171-246
> >>>>> > >>> > >> > Email: chris.a.mattmann@nasa.gov
> >>>>> > >>> > >> > WWW:  http://sunset.usc.edu/~mattmann/
> >>>>> > >>> > >> >
> >>>>> > >>> >
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > >>> > ++++++++
> >>>>> > >>> > >> > Adjunct Assistant Professor, Computer Science Department
> >>>>> > >>> > >> > University of Southern California, Los Angeles, CA 90089
> >>>>> USA
> >>>>> > >>> > >> >
> >>>>> > >>> >
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > >>> > ++++++++
> >>>>> > >>> > >> >
> >>>>> > >>> > >> >
> >>>>> > >>> > >> >
> >>>>> > >>> > >> >
> >>>>> > >>> > >> >
> >>>>> > >>> > >> >
> >>>>> > >>> > >> > -----Original Message-----
> >>>>> > >>> > >> > From: sandeep rg <sandeep.foss@gmail.com>
> >>>>> > >>> > >> > Reply-To: "dev@ctakes.apache.org"
> >>>>><dev@ctakes.apache.org>
> >>>>> > >>> > >> > Date: Saturday, July 13, 2013 8:57 AM
> >>>>> > >>> > >> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> >>>>> > >>> > >> > Subject: Re: to involve in your development group
> >>>>> > >>> > >> >
> >>>>> > >>> > >> >> i have also gone through the technologies available for
> >>>>> > >>> development
> >>>>> > >>> > of
> >>>>> > >>> > >> >> ocr,from that i think apache tika and tessearact is
> >>>>>best
> >>>>> for
> >>>>> > >>> resolving
> >>>>> > >>> > >> the
> >>>>> > >>> > >> >> problem.
> >>>>> > >>> > >> >>
> >>>>> > >>> > >> >>
> >>>>> > >>> > >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg
> >>>>> > >>> > <sandeep.foss@gmail.com>
> >>>>> > >>> > >> >> wrote:
> >>>>> > >>> > >> >>
> >>>>> > >>> > >> >>> hi Mattamann Chris,
> >>>>> > >>> > >> >>> i has participated in the event coordinated by luciano
> >>>>> > >>> > >> >>> resende
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>>
> http://community.apache.org/mentoringprogramme-icfoss-
> >>>>> > >>> > pilot.html
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>> and from that i learned about open source and like to
> >>>>> work
> >>>>> > >>> > >> >>> on
> >>>>> > >>> your
> >>>>> > >>> > >> >>> project
> >>>>> > >>> > >> >>> ctakes.i would like to fix the jira
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>> https://issues.apache.org/jira/browse/CTAKES-189
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>> chen pei accepted my requested to be my mentor.now i
> >>>>>want
> >>>>> > >>> > >> >>> to
> >>>>> > >>>give
> >>>>> > >>> > a
> >>>>> > >>> > >> >>> proposal to apache about the project i am going to
> >>>>>work
> >>>>> > >>> > >> >>> on.can
> >>>>> > >>> you
> >>>>> > >>> > >> help
> >>>>> > >>> > >> >>> me
> >>>>> > >>> > >> >>> to prepare a proposal to be submitted before 18 th of
> >>>>> this
> >>>>> > >>>july.
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A
> >>>>> (398J) <
> >>>>> > >>> > >> >>> chris.a.mattmann@jpl.nasa.gov> wrote:
> >>>>> > >>> > >> >>>
> >>>>> > >>> > >> >>>> Hi Sandeep,
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>> I think the best thing to do is:
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>> 1. Develop a JIRA issue here:
> >>>>> > >>> > >> >>>> https://issues.apache.org/jira/browse/CTAKES
> >>>>> > >>> > >> >>>> 1a. you can register for a new account on JIRA 2.
> >>>>>Once
> >>>>> > >>> > >> >>>> your JIRA issue is created, feel free to start a
> >>>>> > >>> [DISCUSS]
> >>>>> > >>> > >> >>>> thread
> >>>>> > >>> > >> >>>> (e.g., with subject [DISCUSS] "some topic" where
> >>>>>"some
> >>>>> > >>>topic" is
> >>>>> > >>> > >> >>>> perhaps
> >>>>> > >>> > >> >>>> the main idea you have) on dev@ctakes.apache.org,
> >>>>> > >>> > >> >>>> referencing
> >>>>> > >>> > your
> >>>>> > >>> > >> >>>> issue
> >>>>> > >>> > >> >>>> and
> >>>>> > >>> > >> >>>> asking for feedback
> >>>>> > >>> > >> >>>> 3. Work with the Apache cTAKES PMC and committers to
> >>>>>get
> >>>>> > >>> > >> >>>> your
> >>>>> > >>> > patches
> >>>>> > >>> > >> >>>> and
> >>>>> > >>> > >> >>>> other items attached to your issue from #1 committed
> >>>>> into
> >>>>> > >>> > >> >>>> the
> >>>>> > >>> > sources
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>> Ideally if 1-3 happen and it's a good interaction,
> >>>>> Apache
> >>>>> > >>> > >> >>>> is
> >>>>> > >>> built on
> >>>>> > >>> > >> >>>> meritocracy and you could possibly earn the merit to
> >>>>> > >>> > >> >>>> become a
> >>>>> > >>> PMC
> >>>>> > >>> > >> >>>> member
> >>>>> > >>> > >> >>>> or committer on the project.
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>> Cheers,
> >>>>> > >>> > >> >>>> Chris
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>>
> >>>>> > >>> >
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > >>> > ++++++++
> >>>>> > >>> > >> >>>> Chris Mattmann, Ph.D.
> >>>>> > >>> > >> >>>> Senior Computer Scientist
> >>>>> > >>> > >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>>> > >>> > >> >>>> Office: 171-266B, Mailstop: 171-246
> >>>>> > >>> > >> >>>> Email: chris.a.mattmann@nasa.gov
> >>>>> > >>> > >> >>>> WWW:  http://sunset.usc.edu/~mattmann/
> >>>>> > >>> > >> >>>>
> >>>>> > >>> >
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > >>> > ++++++++
> >>>>> > >>> > >> >>>> Adjunct Assistant Professor, Computer Science
> >>>>>Department
> >>>>> > >>> > >> >>>> University of Southern California, Los Angeles, CA
> >>>>>90089
> >>>>> > >>> > >> >>>> USA
> >>>>> > >>> > >> >>>>
> >>>>> > >>> >
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > >>> > ++++++++
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>> -----Original Message-----
> >>>>> > >>> > >> >>>> From: sandeep rg <sandeep.foss@gmail.com>
> >>>>> > >>> > >> >>>> Reply-To: "dev@ctakes.apache.org"
> >>>>> > <dev@ctakes.apache.org>
> >>>>> > >>> > >> >>>> Date: Thursday, July 11, 2013 11:30 AM
> >>>>> > >>> > >> >>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> >>>>> > >>> > >> >>>> Subject: Re: to involve in your development group
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>>> can you provide what all details i should include
> >>>>>in a
> >>>>> > >>> > >> >>>> proposal?whether i
> >>>>> > >>> > >> >>>>> wanted to include all implemetation(technical)
> >>>>>details
> >>>>> in
> >>>>> > >>>the
> >>>>> > >>> > >> >>>> proposal?
> >>>>> > >>> > >> >>>>>
> >>>>> > >>> > >> >>>>>
> >>>>> > >>> > >> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A
> >>>>> (398J)
> >>>>> > >>> > >> >>>>> < chris.a.mattmann@jpl.nasa.gov> wrote:
> >>>>> > >>> > >> >>>>>
> >>>>> > >>> > >> >>>>>> Dear Sandeep,
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> > >> >>>>>> Thanks for your interest in cTAKES. We would
> >>>>>welcome
> >>>>> > >>> > >> >>>>>> your
> >>>>> > >>> > >> >>>> contribution
> >>>>> > >>> > >> >>>>>> and are happy to have your interest in the project.
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> > >> >>>>>> Cheers,
> >>>>> > >>> > >> >>>>>> Chris
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> >
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > >>> > ++++++++
> >>>>> > >>> > >> >>>>>> Chris Mattmann, Ph.D.
> >>>>> > >>> > >> >>>>>> Senior Computer Scientist NASA Jet Propulsion
> >>>>> Laboratory
> >>>>> > >>> > >> >>>>>> Pasadena, CA 91109 USA
> >>>>> > >>> > >> >>>>>> Office: 171-266B, Mailstop: 171-246
> >>>>> > >>> > >> >>>>>> Email: chris.a.mattmann@nasa.gov
> >>>>> > >>> > >> >>>>>> WWW:  http://sunset.usc.edu/~mattmann/
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> >
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > >>> > ++++++++
> >>>>> > >>> > >> >>>>>> Adjunct Assistant Professor, Computer Science
> >>>>> > Department
> >>>>> > >>> > >> >>>>>> University of Southern California, Los Angeles, CA
> >>>>> 90089
> >>>>> > >>>USA
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> >
> >>>>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>> > >>> > ++++++++
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> > >> >>>>>> -----Original Message-----
> >>>>> > >>> > >> >>>>>> From: sandeep rg <sandeep.foss@gmail.com>
> >>>>> > >>> > >> >>>>>> Reply-To: "dev@ctakes.apache.org"
> >>>>> > >>> > >> >>>>>> <dev@ctakes.apache.org>
> >>>>> > >>> > >> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM
> >>>>> > >>> > >> >>>>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org
> >
> >>>>> > >>> > >> >>>>>> Subject: Re: to involve in your development group
> >>>>> > >>> > >> >>>>>>
> >>>>> > >>> > >> >>>>>>> sir,
> >>>>> > >>> > >> >>>>>>>
> >>>>> > >>> > >> >>>>>>> My name is sandeep rg.i am a btech graduate in
> >>>>> computer
> >>>>> > >>> > >> science.now
> >>>>> > >>> > >> >>>>>> doing
> >>>>> > >>> > >> >>>>>>> an internship in a company in java language.
> >>>>> > >>> > >> >>>>>>>
> >>>>> > >>> > >> >>>>>>> then  i had installed all things succesfully,now
> >>>>> > >>>downloading
> >>>>> > >>> the
> >>>>> > >>> > >> >>>>>>> resource.ittake too much time.
> >>>>> > >>> > >> >>>>>>>
> >>>>> > >>> > >> >>>>>>> i have gone through the suggested ocr
> >>>>>technologies.
> >>>>> > >>> > >> >>>>>>> Javaocr has some good user review.
> >>>>> > >>> > >> >>>>>>> Apache tika has a capability to process different
> >>>>> types
> >>>>> > >>> > >> >>>>>>> of
> >>>>> > >>> format.
> >>>>> > >>> > >> >>>>>>> More than that there is tesserract which are also
> >>>>> used
> >>>>> > >>> > >> >>>>>>> for
> >>>>> > >>> ocr
> >>>>> > >>> > >> >>>> purpose.
> >>>>> > >>> > >> >>>>>>> then apache pdfbox is also used for text
> >>>>>extratcion
> >>>>> but
> >>>>> > >>>only
> >>>>> > >>> for
> >>>>> > >>> > >> >>>> pdf
> >>>>> > >>> > >> >>>>>>> files.
> >>>>> > >>> > >> >>>>>>> now i am going through every thing to find out
> >>>>>best
> >>>>> > >>> technology
> >>>>> > >>> > >> from
> >>>>> > >>> > >> >>>>>> this.
> >>>>> > >>> > >> >>>>>>>
> >>>>> > >>> > >> >>>>>>>
> >>>>> > >>> > >> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei
> >>>>> > >>> > >> >>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
> >>>>> > >>> > >> >>>>>>>
> >>>>> > >>> > >> >>>>>>>> Hi Sandeep,
> >>>>> > >>> > >> >>>>>>>> I am delighted to work with you on this project.
> >>>>> > >>> > >> >>>>>>>>
> >>>>> > >>> > >> >>>>>>>> I was not sure if I understood you correctly- did
> >>>>> you
> >>>>> > >>>mean
> >>>>> > >>> to
> >>>>> > >>> > say
> >>>>> > >>> > >> >>>>>> that
> >>>>> > >>> > >> >>>>>>>> you
> >>>>> > >>> > >> >>>>>>>> have already tried using cTAKES and it's
> >>>>>components?
> >>>>> > >>> > >> >>>>>>>> If not, you can do an svn checkout of the code
> >>>>>and
> >>>>> try
> >>>>> > >>> running
> >>>>> > >>> > >> >>>> the
> >>>>> > >>> > >> >>>>>>>> debugger gui from the command line (or
> >>>>>eclipseide)
> >>>>> > >>> > >> >>>>>>>> that
> >>>>> > >>>will
> >>>>> > >>> > >> >>>> allow
> >>>>> > >>> > >> >>>>>> you
> >>>>> > >>> > >> >>>>>>>> to
> >>>>> > >>> > >> >>>>>>>> type in plain text and get back the different
> >>>>> > >>> > >> >>>>>>>> structured
> >>>>> > >>> content
> >>>>> > >>> > >> >>>>>> (types)
> >>>>> > >>> > >> >>>>>>>> that cTAKES produces:
> >>>>> > >>> > >> >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g"
> >>>>> > >>> > >> >>>>>>>> mvn -PrunCVD compile
> >>>>> > >>> > >> >>>>>>>> From the guide:
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >>
> >>>>> > >>> >
> >>>>> > >>>
> >>>>> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Devel
> >>>>> > op
> >>>>> > >>>e
> >>>>> > >>> > r+
> >>>>> > >>> > >> >>>> I
> >>>>> > >>> > >> >>>>>>>> nstall+Guide
> >>>>> > >>> > >> >>>>>>>>
> >>>>> > >>> > >> >>>>>>>> A bit of background:
> >>>>> > >>> > >> >>>>>>>> Apache cTAKES uses SVN for version on control:
> >>>>> > >>> > >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/
> >>>>> > >>> > >> >>>>>>>> Jira for issues tracking:
> >>>>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes
> >>>>> > >>> > >> >>>>>>>> Maven for building and dependency management.
> >>>>> > >>> > >> >>>>>>>> A lot of the developers use Eclipse IDE for their
> >>>>> > >>> development.
> >>>>> > >>> > >> >>>>>>>> More info on ctakes.apache.org
> >>>>> > >>> > >> >>>>>>>>
> >>>>> > >>> > >> >>>>>>>> cTAKES is built on top of the Apache UIMA
> >>>>>Framework.
> >>>>> > >>> > >> >>>> Essentially,
> >>>>> > >>> > >> >>>>>>>> cTAKES
> >>>>> > >>> > >> >>>>>>>> is a collection of Annotators (Java Classes) and
> >>>>> wired
> >>>>> > >>> together
> >>>>> > >>> > >> >>>> to
> >>>>> > >>> > >> >>>>>> into
> >>>>> > >>> > >> >>>>>>>> a
> >>>>> > >>> > >> >>>>>>>> pipeline.
> >>>>> > >>> > >> >>>>>>>> It's goal in a nutshell is to turn unstructured
> >>>>> plain
> >>>>> > >>>text
> >>>>> > >>> into
> >>>>> > >>> > >> >>>>>>>> structured/normalized form and specially trained
> >>>>>for
> >>>>> > >>>medical
> >>>>> > >>> > >> >>>> notes.
> >>>>> > >>> > >> >>>>>>>> Right now- the input cTAKES expects would be in
> >>>>> plain
> >>>>> > >>>text
> >>>>> > >>> > form
> >>>>> > >>> > >> >>>> and
> >>>>> > >>> > >> >>>>>>>> cTAKES
> >>>>> > >>> > >> >>>>>>>> does not have an OCR component.
> >>>>> > >>> > >> >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize
> >>>>> text
> >>>>> > >>> > inputs was
> >>>>> > >>> > >> >>>> an
> >>>>> > >>> > >> >>>>>> idea
> >>>>> > >>> > >> >>>>>>>> to allow cTAKES to take in any type of input
> >>>>>(PDF,
> >>>>> > >>>Images,
> >>>>> > >>> > Word,
> >>>>> > >>> > >> >>>> XLS,
> >>>>> > >>> > >> >>>>>>>> etc.)
> >>>>> > >>> > >> >>>>>>>> and pass the text for cTAKES processing.
> >>>>> > >>> > >> >>>>>>>> [I was originally thinking this could be done in
> >>>>> some
> >>>>> > >>>kind
> >>>>> > >>> of
> >>>>> > >>> > >> >>>>>>>> preprocessing, or an optional Annotator that
> >>>>>could
> >>>>> be
> >>>>> > >>>added
> >>>>> > >>> in
> >>>>> > >>> > >> >>>> the
> >>>>> > >>> > >> >>>>>>>> beginning of a pipeline].  There may be some
> >>>>> existing
> >>>>> > >>>work
> >>>>> > >>> > that
> >>>>> > >>> > >> >>>>>> could be
> >>>>> > >>> > >> >>>>>>>> potentially reused: Apache Tika (
> >>>>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93 )
> >>>>>as
> >>>>> > >>> > >> >>>>>>>> well
> >>>>> > >>>as
> >>>>> > >>> > some
> >>>>> > >>> > >> >>>> open
> >>>>> > >>> > >> >>>>>>>> source OCR toolkits (JavaOCR).
> >>>>> > >>> > >> >>>>>>>>
> >>>>> > >>> > >> >>>>>>>> About Me:
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >> >>>>
> >>>>> > >>> > >>
> >>>>> > >>> >
> >>>>> > >>>
> >>>>> > >>>
> >>>>>
> http://childrenshospital.org/cfapps/research/data_admin/Site3240/main
> >>>>> > >>>pag
> >>>>> > >>> > >> >>>> e
> >>>>> > >>> > >> >>>>>>>> S3240P8.html
> >>>>> > >>> > >> >>>>>>>> http://www.linkedin.com/in/peistation
> >>>>> > >>> > >> >>>>>>>> http://people.apache.org/committer-
> >>>>> > index.html#chenpei
> >>>>> > >>> > >> >>>>>>>>
> >>>>> > >>> > >> >>>>>>>>> -----Original Message-----
> >>>>> > >>> > >> >>>>>>>>> From: sandeep rg [mailto:sandeep.foss@gmail.com
> ]
> >>>>> > >>> > >> >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19 PM
> >>>>> > >>> > >> >>>>>>>>> To: dev@ctakes.apache.org
> >>>>> > >>> > >> >>>>>>>>> Subject: Re: to involve in your development
> >>>>>group
> >>>>> > >>> > >> >>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>> Thanks a lot for giving me support.i like to
> >>>>>work
> >>>>> > >>> > >> >>>>>>>>> with
> >>>>> > >>>you.
> >>>>> > >>> > >> >>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>> I have gone through the objectives of the
> >>>>> > >>> > >> >>>>>>>>> software,used
> >>>>> > >>>the
> >>>>> > >>> > >> >>>>>> software
> >>>>> > >>> > >> >>>>>>>> and
> >>>>> > >>> > >> >>>>>>>>> gone through various components of the
> >>>>>project.can
> >>>>> > >>> > >> >>>>>>>>> you
> >>>>> > >>> > provide
> >>>>> > >>> > >> >>>> me
> >>>>> > >>> > >> >>>>>>>> starting
> >>>>> > >>> > >> >>>>>>>>> point from where i should start to know more
> >>>>>about
> >>>>> > >>> > >> >>>>>>>>> the
> >>>>> > >>> > coding
> >>>>> > >>> > >> >>>> part
> >>>>> > >>> > >> >>>>>> of
> >>>>> > >>> > >> >>>>>>>> the
> >>>>> > >>> > >> >>>>>>>>> project.
> >>>>> > >>> > >> >>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>> can you tell me more about the project and about
> >>>>> you
> >>>>> > >>>also?
> >>>>> > >>> > >> >>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei
> >>>>> > >>> > >> >>>>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
> >>>>> > >>> > >> >>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>> Hi Sandeep,
> >>>>> > >>> > >> >>>>>>>>>> Thank you for the interest.  I just had a quick
> >>>>> look
> >>>>> > >>> > >> >>>>>>>>>> at
> >>>>> > >>> the
> >>>>> > >>> > >> >>>>>> ICFOSS
> >>>>> > >>> > >> >>>>>>>>>> pilot mentoring program and will be happy to
> >>>>>serve
> >>>>> > >>> > >> >>>>>>>>>> as a
> >>>>> > >>> > >> >>>> mentor
> >>>>> > >>> > >> >>>>>> for
> >>>>> > >>> > >> >>>>>>>>>> your project
> >>>>> > >>> > >> >>>>>>>>>> proposal(s) if you are interested.
> >>>>> > >>> > >> >>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>> --Pei
> >>>>> > >>> > >> >>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>> -----Original Message-----
> >>>>> > >>> > >> >>>>>>>>>>> From: sandeep rg
> >>>>>[mailto:sandeep.foss@gmail.com]
> >>>>> > >>> > >> >>>>>>>>>>> Sent: Monday, July 08, 2013 2:24 PM
> >>>>> > >>> > >> >>>>>>>>>>> To: dev@ctakes.apache.org
> >>>>> > >>> > >> >>>>>>>>>>> Subject: Re: to involve in your development
> >>>>>group
> >>>>> > >>> > >> >>>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>> sir,
> >>>>> > >>> > >> >>>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>> details of the program Pilot mentoring
> >>>>>programme
> >>>>> > >>> > >> >>>>>>>>>>> with
> >>>>> > >>> > india
> >>>>> > >>> > >> >>>>>> ICFOSS
> >>>>> > >>> > >> >>>>>>>>>>> is
> >>>>> > >>> > >> >>>>>>>>>> given
> >>>>> > >>> > >> >>>>>>>>>>> in the below web address
> >>>>> > >>> > >> >>>>>> http://community.apache.org/mentoringprogramme-
> >>>>> > icfoss-
> >>>>> > >>> > pilot.html
> >>>>> > >>> > >> >>>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>> I am new to this community so i need a mentor
> >>>>>for
> >>>>> > >>> > >> >>>>>>>>>>> the
> >>>>> > >>> > >> >>>>>> project.It
> >>>>> > >>> > >> >>>>>>>>>>> will be
> >>>>> > >>> > >> >>>>>>>>>> more
> >>>>> > >>> > >> >>>>>>>>>>> helpful for me..
> >>>>> > >>> > >> >>>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei
> >>>>> > >>> > >> >>>>>>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
> >>>>> > >>> > >> >>>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>>> Hi Sandeep,
> >>>>> > >>> > >> >>>>>>>>>>>> Welcome!  I am not familiar with the details
> >>>>>of
> >>>>> > >>> > >> >>>>>> icfoss-apache,
> >>>>> > >>> > >> >>>>>>>> but
> >>>>> > >>> > >> >>>>>>>>>>>> please- you are more than welcome to work on
> >>>>>the
> >>>>> > >>> > >> >>>>>>>>>>>> code
> >>>>> > >>> > and
> >>>>> > >>> > >> >>>>>>>>>>>> contributions will be greatly appreciated!
> >>>>> > >>> > >> >>>>>>>>>>>> There may be a learning curve, but feel free
> >>>>>let
> >>>>> > >>> > >> >>>>>>>>>>>> us
> >>>>> > >>>know
> >>>>> > >>> > >> >>>> if
> >>>>> > >>> > >> >>>>>> you
> >>>>> > >>> > >> >>>>>>>>>>>> have any questions/issues.
> >>>>> > >>> > >> >>>>>>>>>>>> Thanks,
> >>>>> > >>> > >> >>>>>>>>>>>> Pei
> >>>>> > >>> > >> >>>>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>>>> -----Original Message-----
> >>>>> > >>> > >> >>>>>>>>>>>>> From: sandeep rg
> >>>>> > [mailto:sandeep.foss@gmail.com]
> >>>>> > >>> > >> >>>>>>>>>>>>> Sent: Saturday, July 06, 2013 11:50 AM
> >>>>> > >>> > >> >>>>>>>>>>>>> To: dev@ctakes.apache.org
> >>>>> > >>> > >> >>>>>>>>>>>>> Subject: to involve in your development
> >>>>>group
> >>>>> > >>> > >> >>>>>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>>>> my name is sandeep.i am btech graduate.i had
> >>>>> > >>> > >> >>>> participated
> >>>>> > >>> > >> >>>>>> in
> >>>>> > >>> > >> >>>>>>>> a
> >>>>> > >>> > >> >>>>>>>>>>>>> camp coordinated in kerala,India in
> >>>>>association
> >>>>> > >>> > >> >>>>>>>>>>>>> with icfoss-apache called as
> >>>>> > >>> > >> >>>>>>>>>>>> youth
> >>>>> > >>> > >> >>>>>>>>>>>>> mentoring programme coordinated by Luciano
> >>>>> > resende.
> >>>>> > >>> > >> >>>>>>>>>>>>>
> >>>>> > >>> > >> >>>>>>>>>>>>>                                        i
> >>>>>like
> >>>>> the
> >>>>> > >>> > >> >>>> project
> >>>>> > >>> > >> >>>>>> and
> >>>>> > >>> > >> >>>>>>>>>>>>> like to
> >>>>> > >>> > >> >>>>>>>>>>>> involve in your project as a
> >>>>> > >>> > >> >>>>>>>>>>>>> programmer.i have gone through the your
> >>>>>project
> >>>>> > >>> > >> >>>>>>>>>>>>> and
> >>>>> > >>> > >> >>>> gone
> >>>>> > >>> > >> >>>>>>>> through
> >>>>> > >>> > >> >>>>>>>>>>>>> the bugs list.I like to work on the bug
> >>>>> > >>> > >> >>>>>>>>>>>>> "cTAKE-189:GSoC:implement OCR/tika to
> >>>>> > standardize
> >>>>> > >>> > text
> >>>>> > >>> > >> >>>>>> inputs
> >>>>> > >>> > >> >>>>>>>>>>>>> for cTAKES".can you allow me to
> >>>>> > >>> > >> >>>>>>>>>> work
> >>>>> > >>> > >> >>>>>>>>>>> on that?
> >>>>> > >>> > >> >
> >>>>> > >>> > >>
> >>>>> > >>> > >
> >>>>> > >>> > >
> >>>>> > >>>
> >>>>> > >>
> >>>>> > >>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message