uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rohit14csu173@ncuindia.edu <rohit14csu...@ncuindia.edu>
Subject Re: Problem in running DUCC Job for Arabic Language
Date Fri, 06 Jul 2018 05:56:12 GMT
Yes if i run the AE as a DUCC UIMA-AS Service and send it CASes from UIMA-AS client it works
fine.
Infact the enviornment i.e the LANG argument is same for UIMA-AS Service and DUCC JOB.

Environ[3] = LANG=en_IN

And if i change the LANG=ar then while getting the data coming in JD the arabic text is already
replaced with ???(Question Mark) and the encoding of the data coming in JD or CR  shows ASCII
encoding.
I don't understand why is this happening.

Best
Rohit 


On 2018/07/05 13:35:11, Eddie Epstein <eaepstein@gmail.com> wrote: 
> So if you run the AE as a DUCC UIMA-AS service and send it CASes from some
> UIMA-AS client it works OK? The full environment for all processes that
> DUCC launches are available via ducc-mon under the Specification or
> Registry tab for that job or managed reservation or service. Please see if
> the LANG setting for the service is different from the LANG setting for the
> job.
> 
> One can also see the LANG setting for a linux process-id by doing:
> 
> cat /proc/<pid>/environ
> 
> The LANG to be used for a DUCC process can be set by adding to the
> --environment argument "LANG=xxx" as needed
> 
> Thanks,
> Eddie
> 
> 
> 
> On Thu, Jul 5, 2018 at 6:47 AM, rohit14csu173@ncuindia.edu <
> rohit14csu173@ncuindia.edu> wrote:
> 
> > Hey,
> >  Yeah you got it right the first snippet comes in CR before the data goes
> > in CAS.
> > And the second snippet is in the first annotator or analysis engine(AE) of
> > my Aggregate Desciptor.
> > I am pretty sure this is an issue of the CAS used by DUCC because if i use
> > service of DUCC in which we are supposed to send the CAS and receive the
> > same CAS with added features from DUCC i get accurate results.
> >
> > But the only problem comes in submitting a job where the cas is generated
> > by DUCC.
> > This can also be a issue of the enviornment(Language) of DUCC because the
> > default language is english.
> >
> > Bets Regards
> > Rohit
> >
> > On 2018/07/03 13:11:50, Eddie Epstein <eaepstein@gmail.com> wrote:
> > > Rohit,
> > >
> > > Before sending the data into jcas if i force encode it :-
> > > >
> > > > String content2 = null;
> > > > content2 = new String(content.getBytes("UTF-8"), "ISO-8859-1");
> > > > jcas.setDocumentText(content2);
> > > >
> > >
> > > Where is this code, in the job CR?
> > >
> > >
> > >
> > > >
> > > > And when i go in my first annotator i force decode it:-
> > > >
> > > > String content = null;
> > > > content = new String(jcas.getDocumentText.getBytes("ISO-8859-1"),
> > > > "UTF-8");
> > > >
> > >
> > > And is this in the first annotator of the job process, i.e. the CM?
> > >
> > > Please be as specific as possible.
> > >
> > > Thanks,
> > > Eddie
> > >
> >
> 

Mime
View raw message