The landlord Published in: 15:35:05 2016-01-24
S string = "hello", s.size "cout" () "endl; output is 4, no problem. When the string s = "hello", cout < < s.size () < < endl; output is 5, but logically, a English characters corresponding to two byte, the results should not is the output of 10?
#1 Score: 0 Reply to: 16:05:34 2016-01-24
"XXX" is a representation of a number of bytes, XXX "L" is a wide character, the other can use _T ("XXX"), so that in the multi byte with Unicode has a different meaning
In addition, string also need to be replaced by the use of Unicode is wstring
#2 Score: 0 Reply to: 16:21:47 2016-01-24
String and size's wstring () strictly return the number of characters, not the number of bytes.
A Chinese character is equivalent to two English characters.
#3 Score: 0 Reply to: 16:39:58 2016-01-24
A total of more than 65536 Unicode characters, it is impossible to use only 2 bytes to represent all possible Unicode characters.
#4 Score: 0 Reply to: 17:01:21 2016-01-24
Uncommon words and the content is directly swallowed, it seems that even the tried and tested the forum system cannot cope with a little special a Unicode character.
This year's "Oxford dictionary" annual vocabulary WITH TEARS OF JOY FACE is actually a Unicode character, its encoding 1F602, in general this character is to use 4 bytes of encoding.
#5 Score: 0 Reply to: 17:28:51 2016-01-24
So Microsoft is no eye, ANSI to Unicode, the result is still not enough, utf8 linux/unix, to the number of
#6 Score: 0 Reply to: 10:53:23 2016-01-25
For reference only:
Comment #pragma (LIB, "user32")
#7 Score: 0 Reply to: 14:44:49 2016-01-25
UCS-2 is fixed 2 bytes, UTF-16 is longer.
"Utf-16 developed from an earlier fixed-width 16 bit encoding known as ucs-2 (for 2-byte universal character set) once it became clear that a fixed-width 2-byte encodings could not encode enough characters to be truly universal."
#8 Score: 0 Reply to: 00:09:03 2016-01-29
When using VS2010, the default is Unicode
This is just to say, API Unicode version of the windows only.
With c++'s standard library half dime relationship is not.
C++string is corresponding to ANSI
C++wstring corresponds to a wide character.
Namespace STD using;
Main int (argc int, argv char*)
S wstring = L "Hello"";
S.size "cout ()" endl;
S1 wstring = hello "L";
S1.size "cout ()" endl;
The output result is 5, 2.
Prove that size is just the number of corresponding characters (wide characters).
Also, if the compiler changes an option, it can change the meaning of the same function in the standard library.
That's not the standard library......
This is the legacy of history, Microsoft started with the ANSI, and found that it was not enough.
Use Unicode to rewrite the API windows, but the name of the function and want to change, so there is a compiler option.
Compiler Unicode settings, just open a macro switch. Don't think much of it.
C++ standard library has not been acquired by Microsoft....
OutputDebugString OutputDebugStringW #define
OutputDebugString OutputDebugStringA #define
#endif / / UNICODE!
#9 Score: 0 Reply to: 00:56:12 2016-01-29
Unicode 16, with 32
When the original proposed 16Bits, only Unicode
Each character is 16Bits, the original thought that can be said that all the characters
But then came the Unicode 32Bits
So 16 Bits can not be said, so there is a two word 16Bits, said 32Bits method,
Things are a little bit of accumulated
At the outset of the right, and now may not be right, just like the Millennium bug.
#10 Score: 0 Reply to: 04:35:11 2016-01-29
Utf16 is the first edition of Unicode
So utf8 in the old windows has not been widely used, including software and development tools
There are no very complicated reasons.
So as long as you use the new version of the VC OS will not have this trouble
The new version of the OS you use Notepad to save the file can be selected as utf8 encode
The new version of the VC using the new c++ standard char is equal to utf8
#11 Score: 0 Reply to: 13:04:41 2016-01-29
Utf-16 is longer!!!
At the same time, upstairs argument has a problem, a Chinese character is not 2 bytes, this is a relationship with encoding, such as utf8 is not.
UTF-8 is 1-6 bytes,
2 bytes of a Chinese character, it is estimated that some of the Chinese people's c/c++ textbooks, some of the old version of VC seems to be GB2312 encoding, resulting in
2 bytes of Chinese characters.
In addition, GB2312 does not belong to the Unicode character set.
#12 Score: 0 Reply to: 12:38:53 2016-01-31
You get points...
#13 Score: 0 Reply to: 15:07:30 2016-02-01