🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Escape sequences in string literals

Started by
3 comments, last by Deyja 18 years, 10 months ago
In C++, the result of (std::string("\xFA") == std::string("·")) is true. Somewhere along the way, the \xFA is converted to the actual character ·. I'm not sure wether the compiler or the preprocessor does this, but I suspect it is the preprocessor. The preprocessor would have to expand such an escape for character literals, it makes sense that it would handle them in strings too. Now then, heres the rub: I have to explicitly disable the expansion of other escape sequences, such as \n and \". These ones the preprocessor must simply ignore, so that angelscript can still parse the string literal. What I need is a complete list of all the escape sequences angelscript already parses. Some of them I don't have to worry about - \t, for example, should work fine in AS as either \t or the actual ascii code for tab.
Advertisement
In C++ escape sequences are handled by the compiler, not the pre-processor.
Character literals are handled by the preprocessor, and escape sequences work inside them.
My guess is that escape sequences in strings are converted by the C++ compiler, not the preprocessor. This is because the C++ compilers normally only accepts ASCII characters, i.e. values below 128 (at least I should think so). If the preprocessor did the conversion it would be possible to insert illegal characters in the strings that the C++ compiler wouldn't accept.

The preprocessor on the other hand takes care of character literals, as they are converted to a numeric constant.

In AngelScript the following escape sequences are currently handles (as can be seen in the script manual [wink]):

sequence   value   description \0         0       null character \\         92      back-slash \"         34      double quotation mark \n         10      new line feed \r         13      carriage return \xFF       0xFF    FF should be exchanged for the hexadecimal number representing the byte value wanted 


FIY: The code that handles escape sequences in strings is this:

int asCBuilder::RegisterConstantString(const char *cstr, int len){	asCArray<char> str;	str.Allocate(len, false);	for( int n = 0; n < len; n++ )	{		if( cstr[n] == '\\' )		{			++n;			if( n == len ) return -1;			if( cstr[n] == '"' )				str.PushLast('"');			else if( cstr[n] == 'n' )				str.PushLast('\n');			else if( cstr[n] == 'r' )				str.PushLast('\r');			else if( cstr[n] == '0' )				str.PushLast('\0');			else if( cstr[n] == '\\' )				str.PushLast('\\');			else if( cstr[n] == 'x' || cstr[n] == 'X' )			{				++n;				if( n == len ) break;				int val = 0;				if( cstr[n] >= '0' && cstr[n] <= '9' )					val = cstr[n] - '0';				else if( cstr[n] >= 'a' && cstr[n] <= 'f' )					val = cstr[n] - 'a' + 10;				else if( cstr[n] >= 'A' && cstr[n] <= 'F' )					val = cstr[n] - 'A' + 10;				else					continue;				++n;				if( n == len )				{					str.PushLast((char)val);					break;				}				if( cstr[n] >= '0' && cstr[n] <= '9' )					val = val*16 + cstr[n] - '0';				else if( cstr[n] >= 'a' && cstr[n] <= 'f' )					val = val*16 + cstr[n] - 'a' + 10;				else if( cstr[n] >= 'A' && cstr[n] <= 'F' )					val = val*16 + cstr[n] - 'A' + 10;				else				{					str.PushLast((char)val);					continue;				}				str.PushLast((char)val);			}			else				continue;		}		else			str.PushLast(cstr[n]);	}	return module->AddConstantString(str.AddressOf(), str.GetLength());}


In my opinion the preprocessor shouldn't try to convert any escape sequences inside string constants. Otherwise you'd have to disable conversion of escape sequences such as \x0A (= \n), \x22 (= \"), and \x5C (= \\) as well.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Okay. I should have tested with the latest version. In 1.10.x it seems that \x## isn't being converted. I was going to suggest that it was. As of right now, the only escape the preprocessor will handle inside string literals is \", and only so it doesn't think that quote terminates the string. Character literals will support all those escapes. Additionally, they support 'literal escapes' which Angelscript might not. Where as escapes such as \n must be convered to an entirely different ascii code, ones such as \" do not. \" is handled by default code that simply removes the slash. Therefore, you could have a character literal such as '\y', and it's value would be 'y'. I also support \t, as it is usually more recognizable in code than the actual tab character, which is different sizes in different editors and may only be a single space depending on where it is in the line. I'm begining to carry on; like C++, multi-character character literals will compile, but only the first character is used. 'youallsuck' would equal 'y'.

Just for comparison.
static char* parseEscapeSequence(char* start, char* end, Lexem& out){	if (start == end) return start;	if (*start != '\\') return start; //Why was this called?	++start;	if (start == end) return start;	if (out.type == STRING) //Ignore the escape sequence! 	{						//Don't need to worry about hex-escapes.		out.value += '\\';		out.value += *start;		return ++start;	} 	else //must be a character literal. 	{		//Non-literal escapes		if (*start == 'n') { out.value += '\n'; return ++start; }		if (*start == 't') { out.value += '\t'; return ++start; }		if (*start == 'r') { out.value += '\r'; return ++start; }		if (*start == '0') 		{ 			out.value += '\0';			//out.value.resize(out.value.size()+1);			//out.value[out.value.size()-1] = '\0'; 			return ++start; 		}		if (*start == 'x') return parseHexEscape(start,end,out);		//Literal escape - Just get rid of the slash.			out.value += *start;		return ++start;	}}

This topic is closed to new replies.

Advertisement