Invalid read file in UNICODE (fread) in C ++
I am trying to load into a string the content of a file saved on dics. The file is .CS code generated in VisualStudio, so I am assuming it is saved in UTF-8 encoding. I'm doing it:
FILE *fConnect = _wfopen(connectFilePath, _T("r,ccs=UTF-8"));
if (!fConnect)
return;
fseek(fConnect, 0, SEEK_END);
lSize = ftell(fConnect);
rewind(fConnect);
LPTSTR lpContent = (LPTSTR)malloc(sizeof(TCHAR) * lSize + 1);
fread(lpContent, sizeof(TCHAR), lSize, fConnect);
But the result is so strange - the first part (half of the line is the contents of the .CS file), then strange characters like 췍 췍췍 췍췍 췍췍 췍췍 췍췍 췍췍 췍췍 췍췍 췍췍 췍췍 췍췍 appear. So I think I read the content wrong. But how to do it right? Thank you so much and I want to hear!
a source to share
ftell (), fseek () and fread () only work on bytes, not characters. In a Unicode environment, TCHAR is at least 2 bytes, so you allocate and read twice as much memory as you should be.
I've never seen fopen () or _wfopen () support the "ccs" attribute. You should use "rb" as read mode, read the raw bytes into memory and then decode them once you have them all available, for example:
FILE *fConnect = _wfopen(connectFilePath, _T("rb"));
if (!fConnect)
return;
fseek(fConnect, 0, SEEK_END);
lSize = ftell(fConnect);
rewind(fConnect);
LPBYTE lpContent = (LPBYTE) malloc(lSize);
fread(lpContent, 1, lSize, fConnect);
fclose(lpContent);
.. decode lpContent as needed ...
free(lpContent);
a source to share
Does the string contain the entire contents of the cs file followed by additional funny characters? It is probably just null terminated incorrectly, as it fread
will not automatically do this. You need to set the character following the string content to zero:
lpContent[lSize] = 0;
a source to share