ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sandeep rg <sandeep.f...@gmail.com>
Subject Re: to involve in your development group
Date Wed, 10 Jul 2013 18:01:27 GMT
sir,

My name is sandeep rg.i am a btech graduate in computer science.now doing
an internship in a company in java language.

then  i had installed all things succesfully,now downloading the
resource.ittake too much time.

i have gone through the suggested ocr technologies.
Javaocr has some good user review.
Apache tika has a capability to process different types of format.
More than that there is tesserract which are also used for ocr purpose.
then apache pdfbox is also used for text extratcion but only for pdf files.
now i am going through every thing to find out best technology from this.


On Wed, Jul 10, 2013 at 12:52 AM, Chen, Pei
<Pei.Chen@childrens.harvard.edu>wrote:

> Hi Sandeep,
> I am delighted to work with you on this project.
>
> I was not sure if I understood you correctly- did you mean to say that you
> have already tried using cTAKES and it's components?
> If not, you can do an svn checkout of the code and try running the
> debugger gui from the command line (or eclipseide) that will allow you to
> type in plain text and get back the different structured content (types)
> that cTAKES produces:
> MAVEN_OPTS="-Xmx2g -Xms1g"
> mvn -PrunCVD compile
> From the guide:
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Developer+Install+Guide
>
> A bit of background:
> Apache cTAKES uses SVN for version on control:
> https://svn.apache.org/repos/asf/ctakes/trunk/
> Jira for issues tracking:
> https://issues.apache.org/jira/browse/ctakes
> Maven for building and dependency management.
> A lot of the developers use Eclipse IDE for their development.
> More info on ctakes.apache.org
>
> cTAKES is built on top of the Apache UIMA Framework.  Essentially, cTAKES
> is a collection of Annotators (Java Classes) and wired together to into a
> pipeline.
> It's goal in a nutshell is to turn unstructured plain text into
> structured/normalized form and specially trained for medical notes.
> Right now- the input cTAKES expects would be in plain text form and cTAKES
> does not have an OCR component.
> cTAKE-189:GSoC:implement OCR/tika to standardize text inputs was an idea
> to allow cTAKES to take in any type of input (PDF, Images, Word, XLS, etc.)
> and pass the text for cTAKES processing.
> [I was originally thinking this could be done in some kind of
> preprocessing, or an optional Annotator that could be added in the
> beginning of a pipeline].  There may be some existing work that could be
> potentially reused: Apache Tika (
> https://issues.apache.org/jira/browse/TIKA-93 ) as well as some open
> source OCR toolkits (JavaOCR).
>
> About Me:
>
> http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpageS3240P8.html
> http://www.linkedin.com/in/peistation
> http://people.apache.org/committer-index.html#chenpei
>
> > -----Original Message-----
> > From: sandeep rg [mailto:sandeep.foss@gmail.com]
> > Sent: Tuesday, July 09, 2013 1:19 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: to involve in your development group
> >
> > Thanks a lot for giving me support.i like to work with you.
> >
> > I have gone through the objectives of the software,used the software and
> > gone through various components of the project.can you provide me
> starting
> > point from where i should start to know more about the coding part of the
> > project.
> >
> > can you tell me more about the project and about you also?
> >
> >
> > On Tue, Jul 9, 2013 at 1:14 AM, Chen, Pei
> > <Pei.Chen@childrens.harvard.edu>wrote:
> >
> > > Hi Sandeep,
> > > Thank you for the interest.  I just had a quick look at the ICFOSS
> > > pilot mentoring program and will be happy to serve as a mentor for
> > > your project
> > > proposal(s) if you are interested.
> > >
> > > --Pei
> > >
> > > > -----Original Message-----
> > > > From: sandeep rg [mailto:sandeep.foss@gmail.com]
> > > > Sent: Monday, July 08, 2013 2:24 PM
> > > > To: dev@ctakes.apache.org
> > > > Subject: Re: to involve in your development group
> > > >
> > > > sir,
> > > >
> > > > details of the program Pilot mentoring programme with india ICFOSS
> > > > is
> > > given
> > > > in the below web address
> > > >
> > > > http://community.apache.org/mentoringprogramme-icfoss-pilot.html
> > > >
> > > >
> > > > I am new to this community so i need a mentor for the project.It
> > > > will be
> > > more
> > > > helpful for me..
> > > >
> > > >
> > > > On Mon, Jul 8, 2013 at 7:22 PM, Chen, Pei
> > > > <Pei.Chen@childrens.harvard.edu>wrote:
> > > >
> > > > > Hi Sandeep,
> > > > > Welcome!  I am not familiar with the details of icfoss-apache, but
> > > > > please- you are more than welcome to work on the code and
> > > > > contributions will be greatly appreciated!
> > > > > There may be a learning curve, but feel free let us know if you
> > > > > have any questions/issues.
> > > > > Thanks,
> > > > > Pei
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: sandeep rg [mailto:sandeep.foss@gmail.com]
> > > > > > Sent: Saturday, July 06, 2013 11:50 AM
> > > > > > To: dev@ctakes.apache.org
> > > > > > Subject: to involve in your development group
> > > > > >
> > > > > >  my name is sandeep.i am btech graduate.i had participated in
a
> > > > > > camp coordinated in kerala,India in association with
> > > > > > icfoss-apache called as
> > > > > youth
> > > > > > mentoring programme coordinated by Luciano resende.
> > > > > >
> > > > > >                                         i like the project and
> > > > > > like to
> > > > > involve in your project as a
> > > > > > programmer.i have gone through the your project and gone through
> > > > > > the bugs list.I like to work on the bug
> > > > > > "cTAKE-189:GSoC:implement OCR/tika to standardize text inputs
> > > > > > for cTAKES".can you allow me to
> > > work
> > > > on that?
> > > > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message