druid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Krug <ben.k...@imply.io>
Subject Re: druid can't parse string
Date Sun, 18 Jul 2021 19:08:29 GMT
Are you using the console, or an ingestion spec?  If you use a spec, you
might attach it.  If you're using the console, and if the strings have
commas in them, maybe .tsv would work, and you can create a file with a
different delimiter.  (In .tsv, you can choose the delimiter; it doesn't
have to be a tab.)  Or you can take a screenshot of what's happening and
attach that, it might help.

On Fri, Jul 16, 2021 at 11:25 AM Y H <yurim2220@gmail.com> wrote:

> thanks!
> But i still have problem
>
> i success to store string as UTF-8 with inline text ingestion. But when i
> try to ingest batch type with csv, it encoded awkword.
>
> the problem seems to happen when read csv. Should i transform csv file to
> text file?? and if i ingest batch data with text file, what type of parser
> should i choose?(still .*csv ?)
>
>
>
> 2021년 7월 17일 (토) 오전 1:46, Gian Merlino <gian@apache.org>님이 작성:
>
> > Including the original poster in case they are not on the dev list
> > themselves (hello!).
> >
> > On Fri, Jul 16, 2021 at 9:44 AM Gian Merlino <gian@apache.org> wrote:
> >
> >> Druid stores strings as UTF-8 and from a storage and query basis, it
> >> should work fine with any language. The
> >> "wikiticker-2015-09-12-sampled.json.gz" dataset used for the tutorial
> has
> >> strings in a variety of languages (check the "page" field):
> >> https://druid.apache.org/docs/latest/tutorials/index.html
> >>
> >> So I wonder if there is an encoding problem with reading your input
> data?
> >> If it's in a text format, it should be encoded as UTF-8 for Druid to be
> >> able to read it properly.
> >>
> >
> >>
> >> On Fri, Jul 16, 2021 at 7:51 AM Y H <yurim2220@gmail.com> wrote:
> >>
> >>> hi, i am using druid for develop analytic-web.
> >>> And i found druid can't parse language without english
> >>>
> >>> [image: image.png]
> >>>
> >>> is there any option on utf-8 OR way to parse string correctly?
> >>>
> >>> i attached my druid environment file,
> >>> please let me know way to parse string in druid
> >>>
> >>> thanks.
> >>>
> >>>
> >>>
> >>> environment
> >>> ___________________________________________________
> >>> DRUID_XMS=1g
> >>> DRUID_MAXNEWSIZE=250m
> >>> DRUID_NEWSIZE=250m
> >>> DRUID_MAXDIRECTMEMORYSIZE=6172m
> >>>
> >>> druid_emitter_logging_logLevel=debug
> >>>
> >>> druid_extensions_loadList=["druid-stats","druid-histogram",
> >>> "druid-datasketches", "druid-lookups-cached-global",
> >>> "postgresql-metadata-storage", "druid-kafka-indexing-service",
> >>> "druid-kafka-extraction-namespace"]
> >>>
> >>> druid_zk_service_host=zookeeper
> >>>
> >>> # kafka config
> >>> listeners=PLAINTEXT://211.253.8.155:59092
> >>>
> >>>
> >>> # druid_metadata_storage_host=
> >>> druid_metadata_storage_type=postgresql
> >>>
> >>>
> druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
> >>> druid_metadata_storage_connector_user=druid
> >>> druid_metadata_storage_connector_password=FoolishPassword
> >>>
> >>> druid_coordinator_balancer_strategy=cachingCost
> >>>
> >>> druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g",
> >>> "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC",
> >>> "-Dfile.encoding=UTF-8",
> >>> "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
> >>> druid_indexer_fork_property_druid_processing_buffer_sizeBytes=268435456
> >>>
> >>> druid_storage_type=local
> >>> druid_storage_storageDirectory=/opt/data/segments
> >>> druid_indexer_logs_type=file
> >>> druid_indexer_logs_directory=/opt/data/indexing-logs
> >>>
> >>> druid_processing_numThreads=2
> >>> druid_processing_numMergeBuffers=2
> >>>
> >>>
> >>> DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration
> >>> status="WARN"><Appenders><Console name="Console"
> >>> target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c
-
> >>> %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef
> >>> ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog"
> >>> additivity="false" level="DEBUG"><AppenderRef
> >>> ref="Console"/></Logger></Loggers></Configuration>
> >>>
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message