ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sandeep rg <sandeep.f...@gmail.com>
Subject Re: to involve in your development group
Date Tue, 16 Jul 2013 18:03:58 GMT
sir,
i am providing proposal by two days.now i am mainly going through
ASF-ICFOSS gateway because if i gone through their way and my proposal is
get selected,ICFOSS will provide some sort of support such as
certificates,small financial support etc. to us.


but,main thing is i like programming,i like to explore through the new
technologies in coding and like to interact with the coding.so if my
proposal is got rejected,then also i like to work in your project as a
volunteer if you allow me..

now i am preparing a proposal,within 2 days i will submit it..Mattmann
chris helped me to know more about the format of proposal.


On Tue, Jul 16, 2013 at 8:12 PM, Chen, Pei
<Pei.Chen@childrens.harvard.edu>wrote:

> Chris/Sandeep,
> According to ASF-ICFOSS, I believe the deadline for submitting proposals
> is this coming Friday (July 19).
> After which point, mentors will have 2 weeks to review and score/accept.
> Just curious, are we planning to follow the same process here?  Or since
> it's all volunteer work, technically- sandeep and still contribute code to
> the community and participate in the dev group here.
>
> Looking forward to it.
> --Pei
>
>
> > -----Original Message-----
> > From: sandeep rg [mailto:sandeep.foss@gmail.com]
> > Sent: Monday, July 15, 2013 1:05 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: to involve in your development group
> >
> > sir,
> > i gone through most of the ocr technologies and reached a conclusion.i
> > would like to use apache tika and java ocr for this pupose.
> >
> > Tessearact is a ocr tool,it can be used for extracting from multiple
> > languages.it is implemented in vc++.so it can acceded using java native
> > function.they provided another  tool tess4j but review says that it has
> > many bugs.
> >
> > Apache tika developed in java language.it can be used to extract text
> data
> > from .xls,word,txt,pdf and other many formats.it is easy for
> implementing
> > in project also.i have just gone through its implementation way.
> >
> > then about javaocr,its good for extrating text from a jpeg or scanned
> > images.we can train it with various fonts.more we train more will be its
> > accuracy but its speed will get decreased.i didn't find any particular
> > documentation for that.
> >
> >
> >
> > On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg <sandeep.foss@gmail.com>
> > wrote:
> >
> > > thanks a lot for both of your support.I will do my best to find
> solution
> > > for jira problem.i will share the proposal with both of you..
> > >
> > >
> > >
> > > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei
> > <Pei.Chen@childrens.harvard.edu
> > > > wrote:
> > >
> > >> Sandeep,
> > >> Its great to have Chris on board as well- he was one of the
> coordinators
> > >> of GSoC.
> > >> Looking forward to it.
> > >>
> > >> Sent from my iPhone
> > >>
> > >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A (398J)" <
> > >> chris.a.mattmann@jpl.nasa.gov> wrote:
> > >>
> > >> > Hi Sandeep,
> > >> >
> > >> > That is great news, and good job. OK, for some ideas about
> developing
> > >> > your proposal, you may want to simply start with a Google Docs, and
> > then
> > >> > share it with Pei. I'd be happy to help co-mentor if Pei and you
> think
> > >> > it's useful too.
> > >> >
> > >> > Your proposal should likely cover:
> > >> >
> > >> > 1. Background - what's the state of CTAKES-189 and what's it trying
> to
> > >> > accomplish
> > >> >  (include some figures, etc. along with your text)
> > >> >
> > >> > 2. Approach - what are you going to do to solve CTAKES-189. Be
> specific,
> > >> > and
> > >> >  try to break it down into smaller, easily reversible steps
> > >> >
> > >> > 3. Schedule - how long and what is the schedule for achieving this?
> > >> >
> > >> > 4. Risks/etc. - any known risks like are you taking a vacation
> anytime
> > >> > soon :)
> > >> >  or are there other time constraints?
> > >> >
> > >> > 5. References, etc.
> > >> >
> > >> > HTH and I'd be happy if you want to share the GDocs with me as you
> > >> develop
> > >> > it.
> > >> >
> > >> > Cheers!
> > >> >
> > >> > Chris
> > >> >
> > >> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++
> > >> > Chris Mattmann, Ph.D.
> > >> > Senior Computer Scientist
> > >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > >> > Office: 171-266B, Mailstop: 171-246
> > >> > Email: chris.a.mattmann@nasa.gov
> > >> > WWW:  http://sunset.usc.edu/~mattmann/
> > >> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++
> > >> > Adjunct Assistant Professor, Computer Science Department
> > >> > University of Southern California, Los Angeles, CA 90089 USA
> > >> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > -----Original Message-----
> > >> > From: sandeep rg <sandeep.foss@gmail.com>
> > >> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> > >> > Date: Saturday, July 13, 2013 8:57 AM
> > >> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> > >> > Subject: Re: to involve in your development group
> > >> >
> > >> >> i have also gone through the technologies available for development
> > of
> > >> >> ocr,from that i think apache tika and tessearact is best for
> resolving
> > >> the
> > >> >> problem.
> > >> >>
> > >> >>
> > >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep rg
> > <sandeep.foss@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >>> hi Mattamann Chris,
> > >> >>> i has participated in the event coordinated by luciano resende
> > >> >>>
> > >> >>> http://community.apache.org/mentoringprogramme-icfoss-
> > pilot.html
> > >> >>>
> > >> >>> and from that i learned about open source and like to work
on your
> > >> >>> project
> > >> >>> ctakes.i would like to fix the jira
> > >> >>>
> > >> >>> https://issues.apache.org/jira/browse/CTAKES-189
> > >> >>>
> > >> >>> chen pei accepted my requested to be my mentor.now i want
to give
> > a
> > >> >>> proposal to apache about the project i am going to work on.can
you
> > >> help
> > >> >>> me
> > >> >>> to prepare a proposal to be submitted before 18 th of this
july.
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann, Chris A (398J)
<
> > >> >>> chris.a.mattmann@jpl.nasa.gov> wrote:
> > >> >>>
> > >> >>>> Hi Sandeep,
> > >> >>>>
> > >> >>>> I think the best thing to do is:
> > >> >>>>
> > >> >>>> 1. Develop a JIRA issue here:
> > >> >>>> https://issues.apache.org/jira/browse/CTAKES
> > >> >>>> 1a. you can register for a new account on JIRA
> > >> >>>> 2. Once your JIRA issue is created, feel free to start
a
> [DISCUSS]
> > >> >>>> thread
> > >> >>>> (e.g., with subject [DISCUSS] "some topic" where "some
topic" is
> > >> >>>> perhaps
> > >> >>>> the main idea you have) on dev@ctakes.apache.org, referencing
> > your
> > >> >>>> issue
> > >> >>>> and
> > >> >>>> asking for feedback
> > >> >>>> 3. Work with the Apache cTAKES PMC and committers to get
your
> > patches
> > >> >>>> and
> > >> >>>> other items attached to your issue from #1 committed into
the
> > sources
> > >> >>>>
> > >> >>>> Ideally if 1-3 happen and it's a good interaction, Apache
is
> built on
> > >> >>>> meritocracy and you could possibly earn the merit to become
a PMC
> > >> >>>> member
> > >> >>>> or committer on the project.
> > >> >>>>
> > >> >>>> Cheers,
> > >> >>>> Chris
> > >> >>>>
> > >> >>>>
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++
> > >> >>>> Chris Mattmann, Ph.D.
> > >> >>>> Senior Computer Scientist
> > >> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > >> >>>> Office: 171-266B, Mailstop: 171-246
> > >> >>>> Email: chris.a.mattmann@nasa.gov
> > >> >>>> WWW:  http://sunset.usc.edu/~mattmann/
> > >> >>>>
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++
> > >> >>>> Adjunct Assistant Professor, Computer Science Department
> > >> >>>> University of Southern California, Los Angeles, CA 90089
USA
> > >> >>>>
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> -----Original Message-----
> > >> >>>> From: sandeep rg <sandeep.foss@gmail.com>
> > >> >>>> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> > >> >>>> Date: Thursday, July 11, 2013 11:30 AM
> > >> >>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> > >> >>>> Subject: Re: to involve in your development group
> > >> >>>>
> > >> >>>>> can you provide what all details i should include
in a
> > >> >>>> proposal?whether i
> > >> >>>>> wanted to include all implemetation(technical) details
in the
> > >> >>>> proposal?
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> On Thu, Jul 11, 2013 at 9:45 PM, Mattmann, Chris A
(398J) <
> > >> >>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
> > >> >>>>>
> > >> >>>>>> Dear Sandeep,
> > >> >>>>>>
> > >> >>>>>> Thanks for your interest in cTAKES. We would welcome
your
> > >> >>>> contribution
> > >> >>>>>> and are happy to have your interest in the project.
> > >> >>>>>>
> > >> >>>>>> Cheers,
> > >> >>>>>> Chris
> > >> >>>>>>
> > >> >>>>>>
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++
> > >> >>>>>> Chris Mattmann, Ph.D.
> > >> >>>>>> Senior Computer Scientist
> > >> >>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109
USA
> > >> >>>>>> Office: 171-266B, Mailstop: 171-246
> > >> >>>>>> Email: chris.a.mattmann@nasa.gov
> > >> >>>>>> WWW:  http://sunset.usc.edu/~mattmann/
> > >> >>>>>>
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++
> > >> >>>>>> Adjunct Assistant Professor, Computer Science
Department
> > >> >>>>>> University of Southern California, Los Angeles,
CA 90089 USA
> > >> >>>>>>
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> -----Original Message-----
> > >> >>>>>> From: sandeep rg <sandeep.foss@gmail.com>
> > >> >>>>>> Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> > >> >>>>>> Date: Wednesday, July 10, 2013 11:01 AM
> > >> >>>>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
> > >> >>>>>> Subject: Re: to involve in your development group
> > >> >>>>>>
> > >> >>>>>>> sir,
> > >> >>>>>>>
> > >> >>>>>>> My name is sandeep rg.i am a btech graduate
in computer
> > >> science.now
> > >> >>>>>> doing
> > >> >>>>>>> an internship in a company in java language.
> > >> >>>>>>>
> > >> >>>>>>> then  i had installed all things succesfully,now
downloading
> the
> > >> >>>>>>> resource.ittake too much time.
> > >> >>>>>>>
> > >> >>>>>>> i have gone through the suggested ocr technologies.
> > >> >>>>>>> Javaocr has some good user review.
> > >> >>>>>>> Apache tika has a capability to process different
types of
> format.
> > >> >>>>>>> More than that there is tesserract which are
also used for ocr
> > >> >>>> purpose.
> > >> >>>>>>> then apache pdfbox is also used for text extratcion
but only
> for
> > >> >>>> pdf
> > >> >>>>>>> files.
> > >> >>>>>>> now i am going through every thing to find
out best technology
> > >> from
> > >> >>>>>> this.
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>> On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei
> > >> >>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
> > >> >>>>>>>
> > >> >>>>>>>> Hi Sandeep,
> > >> >>>>>>>> I am delighted to work with you on this
project.
> > >> >>>>>>>>
> > >> >>>>>>>> I was not sure if I understood you correctly-
did you mean to
> > say
> > >> >>>>>> that
> > >> >>>>>>>> you
> > >> >>>>>>>> have already tried using cTAKES and it's
components?
> > >> >>>>>>>> If not, you can do an svn checkout of
the code and try
> running
> > >> >>>> the
> > >> >>>>>>>> debugger gui from the command line (or
eclipseide) that will
> > >> >>>> allow
> > >> >>>>>> you
> > >> >>>>>>>> to
> > >> >>>>>>>> type in plain text and get back the different
structured
> content
> > >> >>>>>> (types)
> > >> >>>>>>>> that cTAKES produces:
> > >> >>>>>>>> MAVEN_OPTS="-Xmx2g -Xms1g"
> > >> >>>>>>>> mvn -PrunCVD compile
> > >> >>>>>>>> From the guide:
> > >> >>>>
> > >> >>>>
> > >>
> > https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Develope
> > r+
> > >> >>>> I
> > >> >>>>>>>> nstall+Guide
> > >> >>>>>>>>
> > >> >>>>>>>> A bit of background:
> > >> >>>>>>>> Apache cTAKES uses SVN for version on
control:
> > >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/
> > >> >>>>>>>> Jira for issues tracking:
> > >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes
> > >> >>>>>>>> Maven for building and dependency management.
> > >> >>>>>>>> A lot of the developers use Eclipse IDE
for their
> development.
> > >> >>>>>>>> More info on ctakes.apache.org
> > >> >>>>>>>>
> > >> >>>>>>>> cTAKES is built on top of the Apache UIMA
Framework.
> > >> >>>> Essentially,
> > >> >>>>>>>> cTAKES
> > >> >>>>>>>> is a collection of Annotators (Java Classes)
and wired
> together
> > >> >>>> to
> > >> >>>>>> into
> > >> >>>>>>>> a
> > >> >>>>>>>> pipeline.
> > >> >>>>>>>> It's goal in a nutshell is to turn unstructured
plain text
> into
> > >> >>>>>>>> structured/normalized form and specially
trained for medical
> > >> >>>> notes.
> > >> >>>>>>>> Right now- the input cTAKES expects would
be in plain text
> > form
> > >> >>>> and
> > >> >>>>>>>> cTAKES
> > >> >>>>>>>> does not have an OCR component.
> > >> >>>>>>>> cTAKE-189:GSoC:implement OCR/tika to standardize
text
> > inputs was
> > >> >>>> an
> > >> >>>>>> idea
> > >> >>>>>>>> to allow cTAKES to take in any type of
input (PDF, Images,
> > Word,
> > >> >>>> XLS,
> > >> >>>>>>>> etc.)
> > >> >>>>>>>> and pass the text for cTAKES processing.
> > >> >>>>>>>> [I was originally thinking this could
be done in some kind of
> > >> >>>>>>>> preprocessing, or an optional Annotator
that could be added
> in
> > >> >>>> the
> > >> >>>>>>>> beginning of a pipeline].  There may be
some existing work
> > that
> > >> >>>>>> could be
> > >> >>>>>>>> potentially reused: Apache Tika (
> > >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93
) as well as
> > some
> > >> >>>> open
> > >> >>>>>>>> source OCR toolkits (JavaOCR).
> > >> >>>>>>>>
> > >> >>>>>>>> About Me:
> > >> >>>>
> > >> >>>>
> > >>
> > http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpag
> > >> >>>> e
> > >> >>>>>>>> S3240P8.html
> > >> >>>>>>>> http://www.linkedin.com/in/peistation
> > >> >>>>>>>> http://people.apache.org/committer-index.html#chenpei
> > >> >>>>>>>>
> > >> >>>>>>>>> -----Original Message-----
> > >> >>>>>>>>> From: sandeep rg [mailto:sandeep.foss@gmail.com]
> > >> >>>>>>>>> Sent: Tuesday, July 09, 2013 1:19
PM
> > >> >>>>>>>>> To: dev@ctakes.apache.org
> > >> >>>>>>>>> Subject: Re: to involve in your development
group
> > >> >>>>>>>>>
> > >> >>>>>>>>> Thanks a lot for giving me support.i
like to work with you.
> > >> >>>>>>>>>
> > >> >>>>>>>>> I have gone through the objectives
of the software,used the
> > >> >>>>>> software
> > >> >>>>>>>> and
> > >> >>>>>>>>> gone through various components of
the project.can you
> > provide
> > >> >>>> me
> > >> >>>>>>>> starting
> > >> >>>>>>>>> point from where i should start to
know more about the
> > coding
> > >> >>>> part
> > >> >>>>>> of
> > >> >>>>>>>> the
> > >> >>>>>>>>> project.
> > >> >>>>>>>>>
> > >> >>>>>>>>> can you tell me more about the project
and about you also?
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> On Tue, Jul 9, 2013 at 1:14 AM, Chen,
Pei
> > >> >>>>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
> > >> >>>>>>>>>
> > >> >>>>>>>>>> Hi Sandeep,
> > >> >>>>>>>>>> Thank you for the interest.  I
just had a quick look at the
> > >> >>>>>> ICFOSS
> > >> >>>>>>>>>> pilot mentoring program and will
be happy to serve as a
> > >> >>>> mentor
> > >> >>>>>> for
> > >> >>>>>>>>>> your project
> > >> >>>>>>>>>> proposal(s) if you are interested.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> --Pei
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>> -----Original Message-----
> > >> >>>>>>>>>>> From: sandeep rg [mailto:sandeep.foss@gmail.com]
> > >> >>>>>>>>>>> Sent: Monday, July 08, 2013
2:24 PM
> > >> >>>>>>>>>>> To: dev@ctakes.apache.org
> > >> >>>>>>>>>>> Subject: Re: to involve in
your development group
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> sir,
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> details of the program Pilot
mentoring programme with
> > india
> > >> >>>>>> ICFOSS
> > >> >>>>>>>>>>> is
> > >> >>>>>>>>>> given
> > >> >>>>>>>>>>> in the below web address
> > >> >>>>>> http://community.apache.org/mentoringprogramme-icfoss-
> > pilot.html
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> I am new to this community
so i need a mentor for the
> > >> >>>>>> project.It
> > >> >>>>>>>>>>> will be
> > >> >>>>>>>>>> more
> > >> >>>>>>>>>>> helpful for me..
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> On Mon, Jul 8, 2013 at 7:22
PM, Chen, Pei
> > >> >>>>>>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>> Hi Sandeep,
> > >> >>>>>>>>>>>> Welcome!  I am not familiar
with the details of
> > >> >>>>>> icfoss-apache,
> > >> >>>>>>>> but
> > >> >>>>>>>>>>>> please- you are more than
welcome to work on the code
> > and
> > >> >>>>>>>>>>>> contributions will be
greatly appreciated!
> > >> >>>>>>>>>>>> There may be a learning
curve, but feel free let us know
> > >> >>>> if
> > >> >>>>>> you
> > >> >>>>>>>>>>>> have any questions/issues.
> > >> >>>>>>>>>>>> Thanks,
> > >> >>>>>>>>>>>> Pei
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>> -----Original Message-----
> > >> >>>>>>>>>>>>> From: sandeep rg [mailto:sandeep.foss@gmail.com]
> > >> >>>>>>>>>>>>> Sent: Saturday, July
06, 2013 11:50 AM
> > >> >>>>>>>>>>>>> To: dev@ctakes.apache.org
> > >> >>>>>>>>>>>>> Subject: to involve
in your development group
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> my name is sandeep.i
am btech graduate.i had
> > >> >>>> participated
> > >> >>>>>> in
> > >> >>>>>>>> a
> > >> >>>>>>>>>>>>> camp coordinated in
kerala,India in association with
> > >> >>>>>>>>>>>>> icfoss-apache called
as
> > >> >>>>>>>>>>>> youth
> > >> >>>>>>>>>>>>> mentoring programme
coordinated by Luciano resende.
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>                  
                     i like the
> > >> >>>> project
> > >> >>>>>> and
> > >> >>>>>>>>>>>>> like to
> > >> >>>>>>>>>>>> involve in your project
as a
> > >> >>>>>>>>>>>>> programmer.i have
gone through the your project and
> > >> >>>> gone
> > >> >>>>>>>> through
> > >> >>>>>>>>>>>>> the bugs list.I like
to work on the bug
> > >> >>>>>>>>>>>>> "cTAKE-189:GSoC:implement
OCR/tika to standardize
> > text
> > >> >>>>>> inputs
> > >> >>>>>>>>>>>>> for cTAKES".can you
allow me to
> > >> >>>>>>>>>> work
> > >> >>>>>>>>>>> on that?
> > >> >
> > >>
> > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message