asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taewoo Kim <wangs...@gmail.com>
Subject Re: loading CSV records with comma in the value
Date Mon, 27 Jul 2015 06:25:52 GMT
@Chen: the format of your data file is not correct. In fact, after the
delimiter (,), the quote should be followed based on CSV RFC. However, in
your example, a white space exists. In fact, I saw the following error
message, which complains about the file format. After removing a white
space after the delimiter, it worked fine. So, if you correct the file
format, it should work.

At record: 1, field#: 2 - a quote enclosing a field needs to be placed in
the beginning of that field. [IOException]


[ { "id": 14i32, "authors": "John Smith, Mary Reeve" }
 ]



Best,
Taewoo

On Sun, Jul 26, 2015 at 10:47 PM, Chen Li <chenli@gmail.com> wrote:

> I added the following line
>
> ("quote"="\"")
>
> to the load statement, but the problem remains: it mistakenly used the
> "," in the "authors" field to break the record.
>
> @Taewoo: can you try the simple AQL example I included in this thread
> to see if it can parse the quoted field correctly?
>
> Chen
>
> On Sun, Jul 26, 2015 at 1:25 PM, Taewoo Kim <wangsaeu@gmail.com> wrote:
> > We have test cases for this case. There are located in
> > asterix-app/src/test/resources/runtimets/queries/load/.  The
> documentation
> > is in the /asterix-doc/src/site/markdown/csv.md. Addtional syntax for
> the
> > CSV is fairly simple. You just have two additional parameters - "quote"
> and
> > "header". Refer to the file for more details.
> >
> >
> >
> > Best,
> > Taewoo
> >
> > On Sat, Jul 25, 2015 at 11:30 PM, Chen Li <chenli@gmail.com> wrote:
> >
> >> @Taewoo: I tried it and it has the same problem.  Do you have a test
> >> case for this feature?  Also do we have documentation for this syntax?
> >>
> >> Chen
> >>
> >> On Sat, Jul 25, 2015 at 10:52 PM, Taewoo Kim <wangsaeu@gmail.com>
> wrote:
> >> > The URL is
> https://asterixdb.ics.uci.edu/documentation/aql/primer.html.
> >> >
> >> >
> >> > It should look like this:
> >> >
> >> > ////
> >> > use dataverse pubs;
> >> >
> >> > create type PaperType as open {
> >> >    id: int32,
> >> >    authors: string
> >> > }
> >> >
> >> > create dataset Papers(PaperType) primary key id;
> >> >
> >> > load dataset Papers using localfs
> >> >      using localfs
> >> > (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
> >> >    ("format"="delimited-text"),
> >> >    ("delimiter"=","));
> >> >
> >> > for $paper in dataset('Papers')
> >> > return $paper;
> >> >
> >> >
> >> >
> >> > Best,
> >> > Taewoo
> >> >
> >> > On Sat, Jul 25, 2015 at 10:47 PM, Chen Li <chenli@gmail.com> wrote:
> >> >
> >> >> @Taewoo: can you send me the syntax or the documentation URL to show
> the
> >> >> syntax?
> >> >>
> >> >> Chen
> >> >>
> >> >> On Sat, Jul 25, 2015 at 3:27 PM, Taewoo Kim <wangsaeu@gmail.com>
> wrote:
> >> >> > Can you try to load it into an internal dataset? I think I have
> >> >> implemented
> >> >> > the "comma between the comma (delimiter)" when modifying the
> delimited
> >> >> data
> >> >> > parser. And Chris also modified that part, too. If it doesn't
> work, I
> >> can
> >> >> > look at the issue.
> >> >> >
> >> >> > Best,
> >> >> > Taewoo
> >> >> >
> >> >> > On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <chenli@gmail.com>
wrote:
> >> >> >
> >> >> >> Not sure if this topic was discussed before.  I was trying
to
> load an
> >> >> >> external CVS file using "," as the delimiter.  But the engine
> failed
> >> to
> >> >> >> read a file with the following single record:
> >> >> >>
> >> >> >> 14, "John Smith, Mary Reeve"
> >> >> >>
> >> >> >>
> >> >> >> use dataverse pubs;
> >> >> >>
> >> >> >>    create type PaperType as open {
> >> >> >>       id: int32,
> >> >> >>        authors: string
> >> >> >>    }
> >> >> >>
> >> >> >> create external dataset Papers(PaperType)
> >> >> >>    using localfs
> >> >> >> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
> >> >> >>    ("format"="delimited-text"),
> >> >> >>    ("delimiter"=","));
> >> >> >>
> >> >> >> for $paper in dataset('Papers')
> >> >> >> return $paper;
> >> >> >>
> >> >> >> The following is the output, which shows that the comma in
the
> >> authors
> >> >> >> field was incorrectly used to break the field.  Any idea about
> how to
> >> >> fix
> >> >> >> it?
> >> >> >>
> >> >> >> Output
> >> >> >> Results:
> >> >> >>
> >> >> >> { "id": 14, "authors": " \"John Smith" }
> >> >> >>
> >> >> >> Duration of all jobs: 0.091 sec
> >> >> >>
> >> >> >> Success: Query Complete
> >> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message