ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: to involve in your development group
Date Tue, 16 Jul 2013 14:42:48 GMT
Chris/Sandeep,
According to ASF-ICFOSS, I believe the deadline for submitting proposals is this coming Friday
(July 19).  
After which point, mentors will have 2 weeks to review and score/accept.
Just curious, are we planning to follow the same process here?  Or since it's all volunteer
work, technically- sandeep and still contribute code to the community and participate in the
dev group here.

Looking forward to it.
--Pei


> -----Original Message-----
> From: sandeep rg [mailto:sandeep.foss@gmail.com]
> Sent: Monday, July 15, 2013 1:05 PM
> To: dev@ctakes.apache.org
> Subject: Re: to involve in your development group
> 
> sir,
> i gone through most of the ocr technologies and reached a conclusion.i
> would like to use apache tika and java ocr for this pupose.
> 
> Tessearact is a ocr tool,it can be used for extracting from multiple
> languages.it is implemented in vc++.so it can acceded using java native
> function.they provided another  tool tess4j but review says that it has
> many bugs.
> 
> Apache tika developed in java language.it can be used to extract text data
> from .xls,word,txt,pdf and other many formats.it is easy for implementing
> in project also.i have just gone through its implementation way.
> 
> then about javaocr,its good for extrating text from a jpeg or scanned
> images.we can train it with various fonts.more we train more will be its
> accuracy but its speed will get decreased.i didn't find any particular
> documentation for that.
> 
> 
> 
> On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg <sandeep.foss@gmail.com>
> wrote:
> 
> > thanks a lot for both of your support.I will do my best to find solution
> > for jira problem.i will share the proposal with both of you..
> >
> >
> >
> > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei
> <Pei.Chen@childrens.harvard.edu
> > > wrote:
> >
> >> Sandeep,
> >> Its great to have Chris on board as well- he was one of the coordinators
> >> of GSoC.
> >> Looking forward to it.
> >>
> >> Sent from my iPhone
> >>
> >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" <
> >> chris.a.mattmann@jpl.nasa.gov> wrote:
> >>
> >> > Hi Sandeep,
> >> >
> >> > That is great news, and good job. OK, for some ideas about developing
> >> > your proposal, you may want to simply start with a Google Docs, and
> then
> >> > share it with Pei. I'd be happy to help co-mentor if Pei and you think
> >> > it's useful too.
> >> >
> >> > Your proposal should likely cover:
> >> >
> >> > 1. Background - what's the state of CTAKES-189 and what's it trying to
> >> > accomplish
> >> >  (include some figures, etc. along with your text)
> >> >
> >> > 2. Approach - what are you going to do to solve CTAKES-189. Be specific,
> >> > and
> >> >  try to break it down into smaller, easily reversible steps
> >> >
> >> > 3. Schedule - how long and what is the schedule for achieving this?
> >> >
> >> > 4. Risks/etc. - any known risks like are you taking a vacation anytime
> >> > soon :)
> >> >  or are there other time constraints?
> >> >
> >> > 5. References, etc.
> >> >
> >> > HTH and I'd be happy if you want to share the GDocs with me as you
> >> develop
> >> > it.
> >> >
> >> > Cheers!
> >> >
> >> > Chris
> >> >
> >> >
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++
> >> > Chris Mattmann, Ph.D.
> >> > Senior Computer Scientist
> >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> > Office: 171-266B, Mailstop: 171-246
> >> > Email: chris.a.mattmann@nasa.gov
> >> > WWW:  http://sunset.usc.edu/~mattmann/
> >> >
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++
> >> > Adjunct Assistant Professor, Computer Science Department
> >> > University of Southern California, Los Angeles, CA 90089 USA
> >> >
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: sandeep rg <sandeep.foss@gmail.com>
> >> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> >> > Date: Saturday, July 13, 2013 8:57 AM
> >> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> >> > Subject: Re: to involve in your development group
> >> >
> >> >> i have also gone through the technologies available for development
> of
> >> >> ocr,from that i think apache tika and tessearact is best for resolving
> >> the
> >> >> problem.
> >> >>
> >> >>
> >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg
> <sandeep.foss@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> hi Mattamann Chris,
> >> >>> i has participated in the event coordinated by luciano resende
> >> >>>
> >> >>> http://community.apache.org/mentoringprogramme-icfoss-
> pilot.html
> >> >>>
> >> >>> and from that i learned about open source and like to work on your
> >> >>> project
> >> >>> ctakes.i would like to fix the jira
> >> >>>
> >> >>> https://issues.apache.org/jira/browse/CTAKES-189
> >> >>>
> >> >>> chen pei accepted my requested to be my mentor.now i want to give
> a
> >> >>> proposal to apache about the project i am going to work on.can
you
> >> help
> >> >>> me
> >> >>> to prepare a proposal to be submitted before 18 th of this july.
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A (398J) <
> >> >>> chris.a.mattmann@jpl.nasa.gov> wrote:
> >> >>>
> >> >>>> Hi Sandeep,
> >> >>>>
> >> >>>> I think the best thing to do is:
> >> >>>>
> >> >>>> 1. Develop a JIRA issue here:
> >> >>>> https://issues.apache.org/jira/browse/CTAKES
> >> >>>> 1a. you can register for a new account on JIRA
> >> >>>> 2. Once your JIRA issue is created, feel free to start a [DISCUSS]
> >> >>>> thread
> >> >>>> (e.g., with subject [DISCUSS] "some topic" where "some topic"
is
> >> >>>> perhaps
> >> >>>> the main idea you have) on dev@ctakes.apache.org, referencing
> your
> >> >>>> issue
> >> >>>> and
> >> >>>> asking for feedback
> >> >>>> 3. Work with the Apache cTAKES PMC and committers to get your
> patches
> >> >>>> and
> >> >>>> other items attached to your issue from #1 committed into the
> sources
> >> >>>>
> >> >>>> Ideally if 1-3 happen and it's a good interaction, Apache is
built on
> >> >>>> meritocracy and you could possibly earn the merit to become
a PMC
> >> >>>> member
> >> >>>> or committer on the project.
> >> >>>>
> >> >>>> Cheers,
> >> >>>> Chris
> >> >>>>
> >> >>>>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++
> >> >>>> Chris Mattmann, Ph.D.
> >> >>>> Senior Computer Scientist
> >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >>>> Office: 171-266B, Mailstop: 171-246
> >> >>>> Email: chris.a.mattmann@nasa.gov
> >> >>>> WWW:  http://sunset.usc.edu/~mattmann/
> >> >>>>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++
> >> >>>> Adjunct Assistant Professor, Computer Science Department
> >> >>>> University of Southern California, Los Angeles, CA 90089 USA
> >> >>>>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> -----Original Message-----
> >> >>>> From: sandeep rg <sandeep.foss@gmail.com>
> >> >>>> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> >> >>>> Date: Thursday, July 11, 2013 11:30 AM
> >> >>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> >> >>>> Subject: Re: to involve in your development group
> >> >>>>
> >> >>>>> can you provide what all details i should include in a
> >> >>>> proposal?whether i
> >> >>>>> wanted to include all implemetation(technical) details
in the
> >> >>>> proposal?
> >> >>>>>
> >> >>>>>
> >> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A (398J)
<
> >> >>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
> >> >>>>>
> >> >>>>>> Dear Sandeep,
> >> >>>>>>
> >> >>>>>> Thanks for your interest in cTAKES. We would welcome
your
> >> >>>> contribution
> >> >>>>>> and are happy to have your interest in the project.
> >> >>>>>>
> >> >>>>>> Cheers,
> >> >>>>>> Chris
> >> >>>>>>
> >> >>>>>>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++
> >> >>>>>> Chris Mattmann, Ph.D.
> >> >>>>>> Senior Computer Scientist
> >> >>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >>>>>> Office: 171-266B, Mailstop: 171-246
> >> >>>>>> Email: chris.a.mattmann@nasa.gov
> >> >>>>>> WWW:  http://sunset.usc.edu/~mattmann/
> >> >>>>>>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++
> >> >>>>>> Adjunct Assistant Professor, Computer Science Department
> >> >>>>>> University of Southern California, Los Angeles, CA
90089 USA
> >> >>>>>>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> -----Original Message-----
> >> >>>>>> From: sandeep rg <sandeep.foss@gmail.com>
> >> >>>>>> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> >> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM
> >> >>>>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> >> >>>>>> Subject: Re: to involve in your development group
> >> >>>>>>
> >> >>>>>>> sir,
> >> >>>>>>>
> >> >>>>>>> My name is sandeep rg.i am a btech graduate in
computer
> >> science.now
> >> >>>>>> doing
> >> >>>>>>> an internship in a company in java language.
> >> >>>>>>>
> >> >>>>>>> then  i had installed all things succesfully,now
downloading the
> >> >>>>>>> resource.ittake too much time.
> >> >>>>>>>
> >> >>>>>>> i have gone through the suggested ocr technologies.
> >> >>>>>>> Javaocr has some good user review.
> >> >>>>>>> Apache tika has a capability to process different
types of format.
> >> >>>>>>> More than that there is tesserract which are also
used for ocr
> >> >>>> purpose.
> >> >>>>>>> then apache pdfbox is also used for text extratcion
but only for
> >> >>>> pdf
> >> >>>>>>> files.
> >> >>>>>>> now i am going through every thing to find out
best technology
> >> from
> >> >>>>>> this.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei
> >> >>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
> >> >>>>>>>
> >> >>>>>>>> Hi Sandeep,
> >> >>>>>>>> I am delighted to work with you on this project.
> >> >>>>>>>>
> >> >>>>>>>> I was not sure if I understood you correctly-
did you mean to
> say
> >> >>>>>> that
> >> >>>>>>>> you
> >> >>>>>>>> have already tried using cTAKES and it's components?
> >> >>>>>>>> If not, you can do an svn checkout of the code
and try running
> >> >>>> the
> >> >>>>>>>> debugger gui from the command line (or eclipseide)
that will
> >> >>>> allow
> >> >>>>>> you
> >> >>>>>>>> to
> >> >>>>>>>> type in plain text and get back the different
structured content
> >> >>>>>> (types)
> >> >>>>>>>> that cTAKES produces:
> >> >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g"
> >> >>>>>>>> mvn -PrunCVD compile
> >> >>>>>>>> From the guide:
> >> >>>>
> >> >>>>
> >>
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Develope
> r+
> >> >>>> I
> >> >>>>>>>> nstall+Guide
> >> >>>>>>>>
> >> >>>>>>>> A bit of background:
> >> >>>>>>>> Apache cTAKES uses SVN for version on control:
> >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/
> >> >>>>>>>> Jira for issues tracking:
> >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes
> >> >>>>>>>> Maven for building and dependency management.
> >> >>>>>>>> A lot of the developers use Eclipse IDE for
their development.
> >> >>>>>>>> More info on ctakes.apache.org
> >> >>>>>>>>
> >> >>>>>>>> cTAKES is built on top of the Apache UIMA Framework.
> >> >>>> Essentially,
> >> >>>>>>>> cTAKES
> >> >>>>>>>> is a collection of Annotators (Java Classes)
and wired together
> >> >>>> to
> >> >>>>>> into
> >> >>>>>>>> a
> >> >>>>>>>> pipeline.
> >> >>>>>>>> It's goal in a nutshell is to turn unstructured
plain text into
> >> >>>>>>>> structured/normalized form and specially trained
for medical
> >> >>>> notes.
> >> >>>>>>>> Right now- the input cTAKES expects would be
in plain text
> form
> >> >>>> and
> >> >>>>>>>> cTAKES
> >> >>>>>>>> does not have an OCR component.
> >> >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize
text
> inputs was
> >> >>>> an
> >> >>>>>> idea
> >> >>>>>>>> to allow cTAKES to take in any type of input
(PDF, Images,
> Word,
> >> >>>> XLS,
> >> >>>>>>>> etc.)
> >> >>>>>>>> and pass the text for cTAKES processing.
> >> >>>>>>>> [I was originally thinking this could be done
in some kind of
> >> >>>>>>>> preprocessing, or an optional Annotator that
could be added in
> >> >>>> the
> >> >>>>>>>> beginning of a pipeline].  There may be some
existing work
> that
> >> >>>>>> could be
> >> >>>>>>>> potentially reused: Apache Tika (
> >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93
) as well as
> some
> >> >>>> open
> >> >>>>>>>> source OCR toolkits (JavaOCR).
> >> >>>>>>>>
> >> >>>>>>>> About Me:
> >> >>>>
> >> >>>>
> >>
> http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpag
> >> >>>> e
> >> >>>>>>>> S3240P8.html
> >> >>>>>>>> http://www.linkedin.com/in/peistation
> >> >>>>>>>> http://people.apache.org/committer-index.html#chenpei
> >> >>>>>>>>
> >> >>>>>>>>> -----Original Message-----
> >> >>>>>>>>> From: sandeep rg [mailto:sandeep.foss@gmail.com]
> >> >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19 PM
> >> >>>>>>>>> To: dev@ctakes.apache.org
> >> >>>>>>>>> Subject: Re: to involve in your development
group
> >> >>>>>>>>>
> >> >>>>>>>>> Thanks a lot for giving me support.i like
to work with you.
> >> >>>>>>>>>
> >> >>>>>>>>> I have gone through the objectives of the
software,used the
> >> >>>>>> software
> >> >>>>>>>> and
> >> >>>>>>>>> gone through various components of the
project.can you
> provide
> >> >>>> me
> >> >>>>>>>> starting
> >> >>>>>>>>> point from where i should start to know
more about the
> coding
> >> >>>> part
> >> >>>>>> of
> >> >>>>>>>> the
> >> >>>>>>>>> project.
> >> >>>>>>>>>
> >> >>>>>>>>> can you tell me more about the project
and about you also?
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei
> >> >>>>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>> Hi Sandeep,
> >> >>>>>>>>>> Thank you for the interest.  I just
had a quick look at the
> >> >>>>>> ICFOSS
> >> >>>>>>>>>> pilot mentoring program and will be
happy to serve as a
> >> >>>> mentor
> >> >>>>>> for
> >> >>>>>>>>>> your project
> >> >>>>>>>>>> proposal(s) if you are interested.
> >> >>>>>>>>>>
> >> >>>>>>>>>> --Pei
> >> >>>>>>>>>>
> >> >>>>>>>>>>> -----Original Message-----
> >> >>>>>>>>>>> From: sandeep rg [mailto:sandeep.foss@gmail.com]
> >> >>>>>>>>>>> Sent: Monday, July 08, 2013 2:24
PM
> >> >>>>>>>>>>> To: dev@ctakes.apache.org
> >> >>>>>>>>>>> Subject: Re: to involve in your
development group
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> sir,
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> details of the program Pilot mentoring
programme with
> india
> >> >>>>>> ICFOSS
> >> >>>>>>>>>>> is
> >> >>>>>>>>>> given
> >> >>>>>>>>>>> in the below web address
> >> >>>>>> http://community.apache.org/mentoringprogramme-icfoss-
> pilot.html
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> I am new to this community so i
need a mentor for the
> >> >>>>>> project.It
> >> >>>>>>>>>>> will be
> >> >>>>>>>>>> more
> >> >>>>>>>>>>> helpful for me..
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22 PM,
Chen, Pei
> >> >>>>>>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>> Hi Sandeep,
> >> >>>>>>>>>>>> Welcome!  I am not familiar
with the details of
> >> >>>>>> icfoss-apache,
> >> >>>>>>>> but
> >> >>>>>>>>>>>> please- you are more than welcome
to work on the code
> and
> >> >>>>>>>>>>>> contributions will be greatly
appreciated!
> >> >>>>>>>>>>>> There may be a learning curve,
but feel free let us know
> >> >>>> if
> >> >>>>>> you
> >> >>>>>>>>>>>> have any questions/issues.
> >> >>>>>>>>>>>> Thanks,
> >> >>>>>>>>>>>> Pei
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>> -----Original Message-----
> >> >>>>>>>>>>>>> From: sandeep rg [mailto:sandeep.foss@gmail.com]
> >> >>>>>>>>>>>>> Sent: Saturday, July 06,
2013 11:50 AM
> >> >>>>>>>>>>>>> To: dev@ctakes.apache.org
> >> >>>>>>>>>>>>> Subject: to involve in
your development group
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> my name is sandeep.i am
btech graduate.i had
> >> >>>> participated
> >> >>>>>> in
> >> >>>>>>>> a
> >> >>>>>>>>>>>>> camp coordinated in kerala,India
in association with
> >> >>>>>>>>>>>>> icfoss-apache called as
> >> >>>>>>>>>>>> youth
> >> >>>>>>>>>>>>> mentoring programme coordinated
by Luciano resende.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>                       
                i like the
> >> >>>> project
> >> >>>>>> and
> >> >>>>>>>>>>>>> like to
> >> >>>>>>>>>>>> involve in your project as
a
> >> >>>>>>>>>>>>> programmer.i have gone
through the your project and
> >> >>>> gone
> >> >>>>>>>> through
> >> >>>>>>>>>>>>> the bugs list.I like to
work on the bug
> >> >>>>>>>>>>>>> "cTAKE-189:GSoC:implement
OCR/tika to standardize
> text
> >> >>>>>> inputs
> >> >>>>>>>>>>>>> for cTAKES".can you allow
me to
> >> >>>>>>>>>> work
> >> >>>>>>>>>>> on that?
> >> >
> >>
> >
> >

Mime
View raw message