ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sandeep rg <sandeep.f...@gmail.com>
Subject Re: to involve in your development group
Date Wed, 17 Jul 2013 16:07:05 GMT
it just a skeleton of original proposal


On Wed, Jul 17, 2013 at 9:31 PM, sandeep rg <sandeep.foss@gmail.com> wrote:

> the sample work is shared with you both.any more details to be included
> please tell me.
> In which,GUI design,schedule and implementation flow chart design is to
> added which is under construction and will be uploaded within few hours.
>
>
> On Wed, Jul 17, 2013 at 7:56 PM, Chen, Pei <Pei.Chen@childrens.harvard.edu
> > wrote:
>
>> pei.station@gmail.com
>>
>> > -----Original Message-----
>> > From: Mattmann, Chris A (398J) [mailto:chris.a.mattmann@jpl.nasa.gov]
>> > Sent: Wednesday, July 17, 2013 10:22 AM
>> > To: dev@ctakes.apache.org
>> > Subject: Re: to involve in your development group
>> >
>> > chris.mattmann@gmail.com
>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > ++++++++
>> > Chris Mattmann, Ph.D.
>> > Senior Computer Scientist
>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > Office: 171-266B, Mailstop: 171-246
>> > Email: chris.a.mattmann@nasa.gov
>> > WWW:  http://sunset.usc.edu/~mattmann/
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > ++++++++
>> > Adjunct Assistant Professor, Computer Science Department University of
>> > Southern California, Los Angeles, CA 90089 USA
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > ++++++++
>> >
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: sandeep rg <sandeep.foss@gmail.com>
>> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>> > Date: Wednesday, July 17, 2013 6:53 AM
>> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>> > Subject: Re: to involve in your development group
>> >
>> > >can you provide your gmail id to share the proposal document with you?
>> > >
>> > >
>> > >
>> > >On Tue, Jul 16, 2013 at 11:33 PM, sandeep rg <sandeep.foss@gmail.com>
>> > >wrote:
>> > >
>> > >> sir,
>> > >> i am providing proposal by two days.now i am mainly going through
>> > >>ASF-ICFOSS gateway because if i gone through their way and my proposal
>> > >>is  get selected,ICFOSS will provide some sort of support such as
>> > >>certificates,small financial support etc. to us.
>> > >>
>> > >>
>> > >> but,main thing is i like programming,i like to explore through the
>> > >> new technologies in coding and like to interact with the coding.so
if
>> > >> my proposal is got rejected,then also i like to work in your project
>> > >> as a volunteer if you allow me..
>> > >>
>> > >> now i am preparing a proposal,within 2 days i will submit
>> > >> it..Mattmann chris helped me to know more about the format of
>> > proposal.
>> > >>
>> > >>
>> > >> On Tue, Jul 16, 2013 at 8:12 PM, Chen, Pei
>> > >><Pei.Chen@childrens.harvard.edu
>> > >> > wrote:
>> > >>
>> > >>> Chris/Sandeep,
>> > >>> According to ASF-ICFOSS, I believe the deadline for submitting
>> > >>>proposals  is this coming Friday (July 19).
>> > >>> After which point, mentors will have 2 weeks to review and
>> > >>>score/accept.
>> > >>> Just curious, are we planning to follow the same process here?
 Or
>> > >>>since  it's all volunteer work, technically- sandeep and still
>> > >>>contribute code to  the community and participate in the dev group
>> > >>>here.
>> > >>>
>> > >>> Looking forward to it.
>> > >>> --Pei
>> > >>>
>> > >>>
>> > >>> > -----Original Message-----
>> > >>> > From: sandeep rg [mailto:sandeep.foss@gmail.com]
>> > >>> > Sent: Monday, July 15, 2013 1:05 PM
>> > >>> > To: dev@ctakes.apache.org
>> > >>> > Subject: Re: to involve in your development group
>> > >>> >
>> > >>> > sir,
>> > >>> > i gone through most of the ocr technologies and reached a
>> > >>>conclusion.i
>> > >>> > would like to use apache tika and java ocr for this pupose.
>> > >>> >
>> > >>> > Tessearact is a ocr tool,it can be used for extracting from
>> > >>> > multiple languages.it is implemented in vc++.so it can acceded
>> > >>> > using java
>> > >>>native
>> > >>> > function.they provided another  tool tess4j but review says
that
>> > >>> > it
>> > >>>has
>> > >>> > many bugs.
>> > >>> >
>> > >>> > Apache tika developed in java language.it can be used to extract
>> > >>> > text
>> > >>> data
>> > >>> > from .xls,word,txt,pdf and other many formats.it is easy for
>> > >>> implementing
>> > >>> > in project also.i have just gone through its implementation
way.
>> > >>> >
>> > >>> > then about javaocr,its good for extrating text from a jpeg
or
>> > >>> > scanned images.we can train it with various fonts.more we
train
>> > >>> > more will be
>> > >>>its
>> > >>> > accuracy but its speed will get decreased.i didn't find any
>> > >>>particular
>> > >>> > documentation for that.
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > On Sun, Jul 14, 2013 at 9:18 PM, sandeep rg
>> > >>> > <sandeep.foss@gmail.com>
>> > >>> > wrote:
>> > >>> >
>> > >>> > > thanks a lot for both of your support.I will do my best
to find
>> > >>> solution
>> > >>> > > for jira problem.i will share the proposal with both
of you..
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > > On Sun, Jul 14, 2013 at 1:46 AM, Chen, Pei
>> > >>> > <Pei.Chen@childrens.harvard.edu
>> > >>> > > > wrote:
>> > >>> > >
>> > >>> > >> Sandeep,
>> > >>> > >> Its great to have Chris on board as well- he was
one of the
>> > >>> coordinators
>> > >>> > >> of GSoC.
>> > >>> > >> Looking forward to it.
>> > >>> > >>
>> > >>> > >> Sent from my iPhone
>> > >>> > >>
>> > >>> > >> On Jul 13, 2013, at 12:24 PM, "Mattmann, Chris A
(398J)" <
>> > >>> > >> chris.a.mattmann@jpl.nasa.gov> wrote:
>> > >>> > >>
>> > >>> > >> > Hi Sandeep,
>> > >>> > >> >
>> > >>> > >> > That is great news, and good job. OK, for some
ideas about
>> > >>> developing
>> > >>> > >> > your proposal, you may want to simply start
with a Google
>> > >>> > >> > Docs,
>> > >>>and
>> > >>> > then
>> > >>> > >> > share it with Pei. I'd be happy to help co-mentor
if Pei and
>> > >>> > >> > you
>> > >>> think
>> > >>> > >> > it's useful too.
>> > >>> > >> >
>> > >>> > >> > Your proposal should likely cover:
>> > >>> > >> >
>> > >>> > >> > 1. Background - what's the state of CTAKES-189
and what's it
>> > >>> trying to
>> > >>> > >> > accomplish
>> > >>> > >> >  (include some figures, etc. along with your
text)
>> > >>> > >> >
>> > >>> > >> > 2. Approach - what are you going to do to solve
CTAKES-189.
>> > >>> > >> > Be
>> > >>> specific,
>> > >>> > >> > and
>> > >>> > >> >  try to break it down into smaller, easily reversible
steps
>> > >>> > >> >
>> > >>> > >> > 3. Schedule - how long and what is the schedule
for achieving
>> > >>>this?
>> > >>> > >> >
>> > >>> > >> > 4. Risks/etc. - any known risks like are you
taking a
>> > >>> > >> > vacation
>> > >>> anytime
>> > >>> > >> > soon :)
>> > >>> > >> >  or are there other time constraints?
>> > >>> > >> >
>> > >>> > >> > 5. References, etc.
>> > >>> > >> >
>> > >>> > >> > HTH and I'd be happy if you want to share the
GDocs with me
>> > >>> > >> > as
>> > >>>you
>> > >>> > >> develop
>> > >>> > >> > it.
>> > >>> > >> >
>> > >>> > >> > Cheers!
>> > >>> > >> >
>> > >>> > >> > Chris
>> > >>> > >> >
>> > >>> > >> >
>> > >>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> > ++++++++
>> > >>> > >> > Chris Mattmann, Ph.D.
>> > >>> > >> > Senior Computer Scientist
>> > >>> > >> > NASA Jet Propulsion Laboratory Pasadena, CA
91109 USA
>> > >>> > >> > Office: 171-266B, Mailstop: 171-246
>> > >>> > >> > Email: chris.a.mattmann@nasa.gov
>> > >>> > >> > WWW:  http://sunset.usc.edu/~mattmann/
>> > >>> > >> >
>> > >>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> > ++++++++
>> > >>> > >> > Adjunct Assistant Professor, Computer Science
Department
>> > >>> > >> > University of Southern California, Los Angeles,
CA 90089 USA
>> > >>> > >> >
>> > >>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> > ++++++++
>> > >>> > >> >
>> > >>> > >> >
>> > >>> > >> >
>> > >>> > >> >
>> > >>> > >> >
>> > >>> > >> >
>> > >>> > >> > -----Original Message-----
>> > >>> > >> > From: sandeep rg <sandeep.foss@gmail.com>
>> > >>> > >> > Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>> > >>> > >> > Date: Saturday, July 13, 2013 8:57 AM
>> > >>> > >> > To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>> > >>> > >> > Subject: Re: to involve in your development
group
>> > >>> > >> >
>> > >>> > >> >> i have also gone through the technologies
available for
>> > >>> development
>> > >>> > of
>> > >>> > >> >> ocr,from that i think apache tika and tessearact
is best for
>> > >>> resolving
>> > >>> > >> the
>> > >>> > >> >> problem.
>> > >>> > >> >>
>> > >>> > >> >>
>> > >>> > >> >> On Sat, Jul 13, 2013 at 9:02 PM, sandeep
rg
>> > >>> > <sandeep.foss@gmail.com>
>> > >>> > >> >> wrote:
>> > >>> > >> >>
>> > >>> > >> >>> hi Mattamann Chris,
>> > >>> > >> >>> i has participated in the event coordinated
by luciano
>> > >>> > >> >>> resende
>> > >>> > >> >>>
>> > >>> > >> >>> http://community.apache.org/mentoringprogramme-icfoss-
>> > >>> > pilot.html
>> > >>> > >> >>>
>> > >>> > >> >>> and from that i learned about open source
and like to work
>> > >>> > >> >>> on
>> > >>> your
>> > >>> > >> >>> project
>> > >>> > >> >>> ctakes.i would like to fix the jira
>> > >>> > >> >>>
>> > >>> > >> >>> https://issues.apache.org/jira/browse/CTAKES-189
>> > >>> > >> >>>
>> > >>> > >> >>> chen pei accepted my requested to be
my mentor.now i want
>> > >>> > >> >>> to
>> > >>>give
>> > >>> > a
>> > >>> > >> >>> proposal to apache about the project
i am going to work
>> > >>> > >> >>> on.can
>> > >>> you
>> > >>> > >> help
>> > >>> > >> >>> me
>> > >>> > >> >>> to prepare a proposal to be submitted
before 18 th of this
>> > >>>july.
>> > >>> > >> >>>
>> > >>> > >> >>>
>> > >>> > >> >>>
>> > >>> > >> >>>
>> > >>> > >> >>>
>> > >>> > >> >>>
>> > >>> > >> >>> On Sat, Jul 13, 2013 at 2:26 AM, Mattmann,
Chris A (398J) <
>> > >>> > >> >>> chris.a.mattmann@jpl.nasa.gov> wrote:
>> > >>> > >> >>>
>> > >>> > >> >>>> Hi Sandeep,
>> > >>> > >> >>>>
>> > >>> > >> >>>> I think the best thing to do is:
>> > >>> > >> >>>>
>> > >>> > >> >>>> 1. Develop a JIRA issue here:
>> > >>> > >> >>>> https://issues.apache.org/jira/browse/CTAKES
>> > >>> > >> >>>> 1a. you can register for a new account
on JIRA 2. Once
>> > >>> > >> >>>> your JIRA issue is created, feel
free to start a
>> > >>> [DISCUSS]
>> > >>> > >> >>>> thread
>> > >>> > >> >>>> (e.g., with subject [DISCUSS] "some
topic" where "some
>> > >>>topic" is
>> > >>> > >> >>>> perhaps
>> > >>> > >> >>>> the main idea you have) on dev@ctakes.apache.org,
>> > >>> > >> >>>> referencing
>> > >>> > your
>> > >>> > >> >>>> issue
>> > >>> > >> >>>> and
>> > >>> > >> >>>> asking for feedback
>> > >>> > >> >>>> 3. Work with the Apache cTAKES PMC
and committers to get
>> > >>> > >> >>>> your
>> > >>> > patches
>> > >>> > >> >>>> and
>> > >>> > >> >>>> other items attached to your issue
from #1 committed into
>> > >>> > >> >>>> the
>> > >>> > sources
>> > >>> > >> >>>>
>> > >>> > >> >>>> Ideally if 1-3 happen and it's a
good interaction, Apache
>> > >>> > >> >>>> is
>> > >>> built on
>> > >>> > >> >>>> meritocracy and you could possibly
earn the merit to
>> > >>> > >> >>>> become a
>> > >>> PMC
>> > >>> > >> >>>> member
>> > >>> > >> >>>> or committer on the project.
>> > >>> > >> >>>>
>> > >>> > >> >>>> Cheers,
>> > >>> > >> >>>> Chris
>> > >>> > >> >>>>
>> > >>> > >> >>>>
>> > >>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> > ++++++++
>> > >>> > >> >>>> Chris Mattmann, Ph.D.
>> > >>> > >> >>>> Senior Computer Scientist
>> > >>> > >> >>>> NASA Jet Propulsion Laboratory Pasadena,
CA 91109 USA
>> > >>> > >> >>>> Office: 171-266B, Mailstop: 171-246
>> > >>> > >> >>>> Email: chris.a.mattmann@nasa.gov
>> > >>> > >> >>>> WWW:  http://sunset.usc.edu/~mattmann/
>> > >>> > >> >>>>
>> > >>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> > ++++++++
>> > >>> > >> >>>> Adjunct Assistant Professor, Computer
Science Department
>> > >>> > >> >>>> University of Southern California,
Los Angeles, CA 90089
>> > >>> > >> >>>> USA
>> > >>> > >> >>>>
>> > >>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> > ++++++++
>> > >>> > >> >>>>
>> > >>> > >> >>>>
>> > >>> > >> >>>>
>> > >>> > >> >>>>
>> > >>> > >> >>>>
>> > >>> > >> >>>>
>> > >>> > >> >>>> -----Original Message-----
>> > >>> > >> >>>> From: sandeep rg <sandeep.foss@gmail.com>
>> > >>> > >> >>>> Reply-To: "dev@ctakes.apache.org"
>> > <dev@ctakes.apache.org>
>> > >>> > >> >>>> Date: Thursday, July 11, 2013 11:30
AM
>> > >>> > >> >>>> To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
>> > >>> > >> >>>> Subject: Re: to involve in your
development group
>> > >>> > >> >>>>
>> > >>> > >> >>>>> can you provide what all details
i should include in a
>> > >>> > >> >>>> proposal?whether i
>> > >>> > >> >>>>> wanted to include all implemetation(technical)
details in
>> > >>>the
>> > >>> > >> >>>> proposal?
>> > >>> > >> >>>>>
>> > >>> > >> >>>>>
>> > >>> > >> >>>>> On Thu, Jul 11, 2013 at 9:45
PM, Mattmann, Chris A (398J)
>> > >>> > >> >>>>> < chris.a.mattmann@jpl.nasa.gov>
wrote:
>> > >>> > >> >>>>>
>> > >>> > >> >>>>>> Dear Sandeep,
>> > >>> > >> >>>>>>
>> > >>> > >> >>>>>> Thanks for your interest
in cTAKES. We would welcome
>> > >>> > >> >>>>>> your
>> > >>> > >> >>>> contribution
>> > >>> > >> >>>>>> and are happy to have your
interest in the project.
>> > >>> > >> >>>>>>
>> > >>> > >> >>>>>> Cheers,
>> > >>> > >> >>>>>> Chris
>> > >>> > >> >>>>>>
>> > >>> > >> >>>>>>
>> > >>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> > ++++++++
>> > >>> > >> >>>>>> Chris Mattmann, Ph.D.
>> > >>> > >> >>>>>> Senior Computer Scientist
NASA Jet Propulsion Laboratory
>> > >>> > >> >>>>>> Pasadena, CA 91109 USA
>> > >>> > >> >>>>>> Office: 171-266B, Mailstop:
171-246
>> > >>> > >> >>>>>> Email: chris.a.mattmann@nasa.gov
>> > >>> > >> >>>>>> WWW:  http://sunset.usc.edu/~mattmann/
>> > >>> > >> >>>>>>
>> > >>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> > ++++++++
>> > >>> > >> >>>>>> Adjunct Assistant Professor,
Computer Science
>> > Department
>> > >>> > >> >>>>>> University of Southern California,
Los Angeles, CA 90089
>> > >>>USA
>> > >>> > >> >>>>>>
>> > >>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> > ++++++++
>> > >>> > >> >>>>>>
>> > >>> > >> >>>>>>
>> > >>> > >> >>>>>>
>> > >>> > >> >>>>>>
>> > >>> > >> >>>>>>
>> > >>> > >> >>>>>>
>> > >>> > >> >>>>>> -----Original Message-----
>> > >>> > >> >>>>>> From: sandeep rg <sandeep.foss@gmail.com>
>> > >>> > >> >>>>>> Reply-To: "dev@ctakes.apache.org"
>> > >>> > >> >>>>>> <dev@ctakes.apache.org>
>> > >>> > >> >>>>>> Date: Wednesday, July 10,
2013 11:01 AM
>> > >>> > >> >>>>>> To: "dev@ctakes.apache.org"
<dev@ctakes.apache.org>
>> > >>> > >> >>>>>> Subject: Re: to involve
in your development group
>> > >>> > >> >>>>>>
>> > >>> > >> >>>>>>> sir,
>> > >>> > >> >>>>>>>
>> > >>> > >> >>>>>>> My name is sandeep rg.i
am a btech graduate in computer
>> > >>> > >> science.now
>> > >>> > >> >>>>>> doing
>> > >>> > >> >>>>>>> an internship in a company
in java language.
>> > >>> > >> >>>>>>>
>> > >>> > >> >>>>>>> then  i had installed
all things succesfully,now
>> > >>>downloading
>> > >>> the
>> > >>> > >> >>>>>>> resource.ittake too
much time.
>> > >>> > >> >>>>>>>
>> > >>> > >> >>>>>>> i have gone through
the suggested ocr technologies.
>> > >>> > >> >>>>>>> Javaocr has some good
user review.
>> > >>> > >> >>>>>>> Apache tika has a capability
to process different types
>> > >>> > >> >>>>>>> of
>> > >>> format.
>> > >>> > >> >>>>>>> More than that there
is tesserract which are also used
>> > >>> > >> >>>>>>> for
>> > >>> ocr
>> > >>> > >> >>>> purpose.
>> > >>> > >> >>>>>>> then apache pdfbox is
also used for text extratcion but
>> > >>>only
>> > >>> for
>> > >>> > >> >>>> pdf
>> > >>> > >> >>>>>>> files.
>> > >>> > >> >>>>>>> now i am going through
every thing to find out best
>> > >>> technology
>> > >>> > >> from
>> > >>> > >> >>>>>> this.
>> > >>> > >> >>>>>>>
>> > >>> > >> >>>>>>>
>> > >>> > >> >>>>>>> On Wed, Jul 10, 2013
at 12:52 AM, Chen, Pei
>> > >>> > >> >>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
>> > >>> > >> >>>>>>>
>> > >>> > >> >>>>>>>> Hi Sandeep,
>> > >>> > >> >>>>>>>> I am delighted to
work with you on this project.
>> > >>> > >> >>>>>>>>
>> > >>> > >> >>>>>>>> I was not sure if
I understood you correctly- did you
>> > >>>mean
>> > >>> to
>> > >>> > say
>> > >>> > >> >>>>>> that
>> > >>> > >> >>>>>>>> you
>> > >>> > >> >>>>>>>> have already tried
using cTAKES and it's components?
>> > >>> > >> >>>>>>>> If not, you can
do an svn checkout of the code and try
>> > >>> running
>> > >>> > >> >>>> the
>> > >>> > >> >>>>>>>> debugger gui from
the command line (or eclipseide)
>> > >>> > >> >>>>>>>> that
>> > >>>will
>> > >>> > >> >>>> allow
>> > >>> > >> >>>>>> you
>> > >>> > >> >>>>>>>> to
>> > >>> > >> >>>>>>>> type in plain text
and get back the different
>> > >>> > >> >>>>>>>> structured
>> > >>> content
>> > >>> > >> >>>>>> (types)
>> > >>> > >> >>>>>>>> that cTAKES produces:
>> > >>> > >> >>>>>>>> MAVEN_OPTS="-Xmx2g
-Xms1g"
>> > >>> > >> >>>>>>>> mvn -PrunCVD compile
>> > >>> > >> >>>>>>>> From the guide:
>> > >>> > >> >>>>
>> > >>> > >> >>>>
>> > >>> > >>
>> > >>> >
>> > >>>https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Devel
>> > op
>> > >>>e
>> > >>> > r+
>> > >>> > >> >>>> I
>> > >>> > >> >>>>>>>> nstall+Guide
>> > >>> > >> >>>>>>>>
>> > >>> > >> >>>>>>>> A bit of background:
>> > >>> > >> >>>>>>>> Apache cTAKES uses
SVN for version on control:
>> > >>> > >> >>>>>>>> https://svn.apache.org/repos/asf/ctakes/trunk/
>> > >>> > >> >>>>>>>> Jira for issues
tracking:
>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/ctakes
>> > >>> > >> >>>>>>>> Maven for building
and dependency management.
>> > >>> > >> >>>>>>>> A lot of the developers
use Eclipse IDE for their
>> > >>> development.
>> > >>> > >> >>>>>>>> More info on ctakes.apache.org
>> > >>> > >> >>>>>>>>
>> > >>> > >> >>>>>>>> cTAKES is built
on top of the Apache UIMA Framework.
>> > >>> > >> >>>> Essentially,
>> > >>> > >> >>>>>>>> cTAKES
>> > >>> > >> >>>>>>>> is a collection
of Annotators (Java Classes) and wired
>> > >>> together
>> > >>> > >> >>>> to
>> > >>> > >> >>>>>> into
>> > >>> > >> >>>>>>>> a
>> > >>> > >> >>>>>>>> pipeline.
>> > >>> > >> >>>>>>>> It's goal in a nutshell
is to turn unstructured plain
>> > >>>text
>> > >>> into
>> > >>> > >> >>>>>>>> structured/normalized
form and specially trained for
>> > >>>medical
>> > >>> > >> >>>> notes.
>> > >>> > >> >>>>>>>> Right now- the input
cTAKES expects would be in plain
>> > >>>text
>> > >>> > form
>> > >>> > >> >>>> and
>> > >>> > >> >>>>>>>> cTAKES
>> > >>> > >> >>>>>>>> does not have an
OCR component.
>> > >>> > >> >>>>>>>> cTAKE-189:GSoC:implement
OCR/tika to standardize text
>> > >>> > inputs was
>> > >>> > >> >>>> an
>> > >>> > >> >>>>>> idea
>> > >>> > >> >>>>>>>> to allow cTAKES
to take in any type of input (PDF,
>> > >>>Images,
>> > >>> > Word,
>> > >>> > >> >>>> XLS,
>> > >>> > >> >>>>>>>> etc.)
>> > >>> > >> >>>>>>>> and pass the text
for cTAKES processing.
>> > >>> > >> >>>>>>>> [I was originally
thinking this could be done in some
>> > >>>kind
>> > >>> of
>> > >>> > >> >>>>>>>> preprocessing, or
an optional Annotator that could be
>> > >>>added
>> > >>> in
>> > >>> > >> >>>> the
>> > >>> > >> >>>>>>>> beginning of a pipeline].
 There may be some existing
>> > >>>work
>> > >>> > that
>> > >>> > >> >>>>>> could be
>> > >>> > >> >>>>>>>> potentially reused:
Apache Tika (
>> > >>> > >> >>>>>>>> https://issues.apache.org/jira/browse/TIKA-93
) as
>> > >>> > >> >>>>>>>> well
>> > >>>as
>> > >>> > some
>> > >>> > >> >>>> open
>> > >>> > >> >>>>>>>> source OCR toolkits
(JavaOCR).
>> > >>> > >> >>>>>>>>
>> > >>> > >> >>>>>>>> About Me:
>> > >>> > >> >>>>
>> > >>> > >> >>>>
>> > >>> > >>
>> > >>> >
>> > >>>
>> > >>>
>> http://childrenshospital.org/cfapps/research/data_admin/Site3240/main
>> > >>>pag
>> > >>> > >> >>>> e
>> > >>> > >> >>>>>>>> S3240P8.html
>> > >>> > >> >>>>>>>> http://www.linkedin.com/in/peistation
>> > >>> > >> >>>>>>>> http://people.apache.org/committer-
>> > index.html#chenpei
>> > >>> > >> >>>>>>>>
>> > >>> > >> >>>>>>>>> -----Original
Message-----
>> > >>> > >> >>>>>>>>> From: sandeep
rg [mailto:sandeep.foss@gmail.com]
>> > >>> > >> >>>>>>>>> Sent: Tuesday,
July 09, 2013 1:19 PM
>> > >>> > >> >>>>>>>>> To: dev@ctakes.apache.org
>> > >>> > >> >>>>>>>>> Subject: Re:
to involve in your development group
>> > >>> > >> >>>>>>>>>
>> > >>> > >> >>>>>>>>> Thanks a lot
for giving me support.i like to work
>> > >>> > >> >>>>>>>>> with
>> > >>>you.
>> > >>> > >> >>>>>>>>>
>> > >>> > >> >>>>>>>>> I have gone
through the objectives of the
>> > >>> > >> >>>>>>>>> software,used
>> > >>>the
>> > >>> > >> >>>>>> software
>> > >>> > >> >>>>>>>> and
>> > >>> > >> >>>>>>>>> gone through
various components of the project.can
>> > >>> > >> >>>>>>>>> you
>> > >>> > provide
>> > >>> > >> >>>> me
>> > >>> > >> >>>>>>>> starting
>> > >>> > >> >>>>>>>>> point from where
i should start to know more about
>> > >>> > >> >>>>>>>>> the
>> > >>> > coding
>> > >>> > >> >>>> part
>> > >>> > >> >>>>>> of
>> > >>> > >> >>>>>>>> the
>> > >>> > >> >>>>>>>>> project.
>> > >>> > >> >>>>>>>>>
>> > >>> > >> >>>>>>>>> can you tell
me more about the project and about you
>> > >>>also?
>> > >>> > >> >>>>>>>>>
>> > >>> > >> >>>>>>>>>
>> > >>> > >> >>>>>>>>> On Tue, Jul
9, 2013 at 1:14 AM, Chen, Pei
>> > >>> > >> >>>>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
>> > >>> > >> >>>>>>>>>
>> > >>> > >> >>>>>>>>>> Hi Sandeep,
>> > >>> > >> >>>>>>>>>> Thank you
for the interest.  I just had a quick look
>> > >>> > >> >>>>>>>>>> at
>> > >>> the
>> > >>> > >> >>>>>> ICFOSS
>> > >>> > >> >>>>>>>>>> pilot mentoring
program and will be happy to serve
>> > >>> > >> >>>>>>>>>> as a
>> > >>> > >> >>>> mentor
>> > >>> > >> >>>>>> for
>> > >>> > >> >>>>>>>>>> your project
>> > >>> > >> >>>>>>>>>> proposal(s)
if you are interested.
>> > >>> > >> >>>>>>>>>>
>> > >>> > >> >>>>>>>>>> --Pei
>> > >>> > >> >>>>>>>>>>
>> > >>> > >> >>>>>>>>>>> -----Original
Message-----
>> > >>> > >> >>>>>>>>>>> From:
sandeep rg [mailto:sandeep.foss@gmail.com]
>> > >>> > >> >>>>>>>>>>> Sent:
Monday, July 08, 2013 2:24 PM
>> > >>> > >> >>>>>>>>>>> To:
dev@ctakes.apache.org
>> > >>> > >> >>>>>>>>>>> Subject:
Re: to involve in your development group
>> > >>> > >> >>>>>>>>>>>
>> > >>> > >> >>>>>>>>>>> sir,
>> > >>> > >> >>>>>>>>>>>
>> > >>> > >> >>>>>>>>>>> details
of the program Pilot mentoring programme
>> > >>> > >> >>>>>>>>>>> with
>> > >>> > india
>> > >>> > >> >>>>>> ICFOSS
>> > >>> > >> >>>>>>>>>>> is
>> > >>> > >> >>>>>>>>>> given
>> > >>> > >> >>>>>>>>>>> in the
below web address
>> > >>> > >> >>>>>> http://community.apache.org/mentoringprogramme-
>> > icfoss-
>> > >>> > pilot.html
>> > >>> > >> >>>>>>>>>>>
>> > >>> > >> >>>>>>>>>>>
>> > >>> > >> >>>>>>>>>>> I am
new to this community so i need a mentor for
>> > >>> > >> >>>>>>>>>>> the
>> > >>> > >> >>>>>> project.It
>> > >>> > >> >>>>>>>>>>> will
be
>> > >>> > >> >>>>>>>>>> more
>> > >>> > >> >>>>>>>>>>> helpful
for me..
>> > >>> > >> >>>>>>>>>>>
>> > >>> > >> >>>>>>>>>>>
>> > >>> > >> >>>>>>>>>>> On Mon,
Jul 8, 2013 at 7:22 PM, Chen, Pei
>> > >>> > >> >>>>>>>>>>> <Pei.Chen@childrens.harvard.edu>wrote:
>> > >>> > >> >>>>>>>>>>>
>> > >>> > >> >>>>>>>>>>>>
Hi Sandeep,
>> > >>> > >> >>>>>>>>>>>>
Welcome!  I am not familiar with the details of
>> > >>> > >> >>>>>> icfoss-apache,
>> > >>> > >> >>>>>>>> but
>> > >>> > >> >>>>>>>>>>>>
please- you are more than welcome to work on the
>> > >>> > >> >>>>>>>>>>>>
code
>> > >>> > and
>> > >>> > >> >>>>>>>>>>>>
contributions will be greatly appreciated!
>> > >>> > >> >>>>>>>>>>>>
There may be a learning curve, but feel free let
>> > >>> > >> >>>>>>>>>>>>
us
>> > >>>know
>> > >>> > >> >>>> if
>> > >>> > >> >>>>>> you
>> > >>> > >> >>>>>>>>>>>>
have any questions/issues.
>> > >>> > >> >>>>>>>>>>>>
Thanks,
>> > >>> > >> >>>>>>>>>>>>
Pei
>> > >>> > >> >>>>>>>>>>>>
>> > >>> > >> >>>>>>>>>>>>>
-----Original Message-----
>> > >>> > >> >>>>>>>>>>>>>
From: sandeep rg
>> > [mailto:sandeep.foss@gmail.com]
>> > >>> > >> >>>>>>>>>>>>>
Sent: Saturday, July 06, 2013 11:50 AM
>> > >>> > >> >>>>>>>>>>>>>
To: dev@ctakes.apache.org
>> > >>> > >> >>>>>>>>>>>>>
Subject: to involve in your development group
>> > >>> > >> >>>>>>>>>>>>>
>> > >>> > >> >>>>>>>>>>>>>
my name is sandeep.i am btech graduate.i had
>> > >>> > >> >>>> participated
>> > >>> > >> >>>>>> in
>> > >>> > >> >>>>>>>> a
>> > >>> > >> >>>>>>>>>>>>>
camp coordinated in kerala,India in association
>> > >>> > >> >>>>>>>>>>>>>
with icfoss-apache called as
>> > >>> > >> >>>>>>>>>>>>
youth
>> > >>> > >> >>>>>>>>>>>>>
mentoring programme coordinated by Luciano
>> > resende.
>> > >>> > >> >>>>>>>>>>>>>
>> > >>> > >> >>>>>>>>>>>>>
                                       i like the
>> > >>> > >> >>>> project
>> > >>> > >> >>>>>> and
>> > >>> > >> >>>>>>>>>>>>>
like to
>> > >>> > >> >>>>>>>>>>>>
involve in your project as a
>> > >>> > >> >>>>>>>>>>>>>
programmer.i have gone through the your project
>> > >>> > >> >>>>>>>>>>>>>
and
>> > >>> > >> >>>> gone
>> > >>> > >> >>>>>>>> through
>> > >>> > >> >>>>>>>>>>>>>
the bugs list.I like to work on the bug
>> > >>> > >> >>>>>>>>>>>>>
"cTAKE-189:GSoC:implement OCR/tika to
>> > standardize
>> > >>> > text
>> > >>> > >> >>>>>> inputs
>> > >>> > >> >>>>>>>>>>>>>
for cTAKES".can you allow me to
>> > >>> > >> >>>>>>>>>> work
>> > >>> > >> >>>>>>>>>>> on that?
>> > >>> > >> >
>> > >>> > >>
>> > >>> > >
>> > >>> > >
>> > >>>
>> > >>
>> > >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message