Archived Forum Post

Index of archived forum posts

Question:

ActiveX Charset and HashStringENC / PostXML

Mar 27 '13 at 14:57

Hi Matt

I've been having a bit of a charset nightmare that I think I have got to the bottom of (appears to now work) but I am just looking for some clarification.

I am using Visual Dataflex (VDF) which uses the OEM character set

If I have some UTF-8 data in a VDF string that I pass to HashStringENC with the charset set to UTF-8 then I get the 'wrong' hash back again (according to the other end when they do the same calculation) if I use ibm437 then I get the correct one. This seems to match the help in that it says it automatically converts the data to the specified charset but since my string contains things like £ as utf-8's C2A3 then I think this must get converted again internally. So my question is there does not appear to be a "no conversion" option and as such I'm hoping the ibm437 is the right choice for me working in OEM ?

It is a similar story with PostXML but this time it says the xmlCharset parameter should match the encoding....But again if I specify UTF-8 things dont work if they contain £ signs but specifying ibm437 the do seem to so again it looks like some sort of conversion is happening

I'm not saying there is anything worng with the conversions just that I'm having a few too many of them :o)

Thanks in advance


Answer

Bump.

I'm sure this isn't high on the todo list but I would appreciate a little guidance with this

Thanks


Answer

I really don't know anything about Visual DataFlex, but my guess is that it's similar to VB6. Strings are objects, and characters are stored internally (within the string object) using Unicode (utf-16). However, VDF is probably only capable of displaying OEM chars.

Any string passed to an ActiveX, regardless of programming language, is ALWAYS passed as a Unicode string (utf-16). In C++ terms, it is a "BSTR". Because strings are always passed as utf-16, if you want to hash a string, you have to specify the exact byte representation to be used for each character. Is it utf-8? ANSI? utf-16? etc.