PDF to CSV output adds strange  character when opening in Excel

PDF to CSV output adds strange  character when opening in Excel

The problem here is that our web service generates CSV files in UTF-8 encoding but Excel cannot detect UTF-8 in CSV files if they don't have special Byte Order Mark (BOM) in the beginning. Excel opens such CSV in default ANSI encoding. This causes £ character presented in multiple bytes.

This should not be a problem if you import CSV into a database which detect encoding automatically, but if you need to work with files in Excel you should re-save them with the BOM.

The following c# code performs this:

// Read downloaded file into a string
string text = File.ReadAllText(@"c:\temp\ReconciledTransactionsReport (2).csv");
// Re-save with the explicit encoding parameter. This will add the required BOM.
File.WriteAllText(@"c:\temp\fixed.csv", text, Encoding.UTF8);

If you need to avoid temporary files, then, in your c# code, replace this line

webClient.DownloadFile(resultFileUrl, DestinationFile);

with the following:

byte[] bytes = webClient.DownloadData(resultFileUrl);
string text = Encoding.UTF8.GetString(bytes);
File.WriteAllText(DestinationFile, text, Encoding.UTF8);
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.