jmeter-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Burton <>
Subject CSV files with UTF8 BOM
Date Thu, 19 Jul 2018 23:47:37 GMT
Hi list,

Is there any appetite for handling UTF-8 with BOM markers automatically
when loading CSV input files? These currently fail silently since the first
character in the file is the BOM marker, which means CSV files with headers
don't create the correct variable name.

I *know* that technically, the BOM variant isn't an official UTF variant,
but it is commonplace when exporting from MS SQL Server (which for a lot of
Windows-based users might be their way of generating data).

I know we can convert the encoding from UTF8 BOM to UTF8 using, e.g.
Notepad++ or dos2unix but this adds an extra step to fix a problem that a
lot of users would struggle to identify in the first place ("My data file
is not working, and it looks fine when I open it in Notepad!")

(SQL Server does provide an option to output Unicode but this is UTF16, not
UTF8, which is a whole other story).

I'd propose an additonal step of identifying the file's encoding using
getEncoding() method in InputStreamReader) and if UTF8, checking if it has
a BOM marker and if so, handling it with the BOMInputStream class in apache
commons-io (ref

One other thing that might be useful is changing the input field of the
CSVDataSet for encoding to be a drop down list with only the charset values
supported by InputStreamReader (ref
The documentation doesn't list which encodings are valid (I had to dig
through the code to find the relevant handling class) and there's always
the risk of a typo.

I'm happy to spend some time on this if it was something that core devs
would find useful.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message