What are Encodings?
April 13, 2011
Posted by on
Few days back I got an email query on one of blog post related to WCF encodings. Query was asking for help to understand things better, especially around the encoding part. While not being a SME on encodings, I thought of sharing few thoughts.
Encoding is a word I am sure most of us would have come across. If you are a developer this would sound even more true, with terms like Base 64 encoding / HTML encoding popping up more than often. In simplest terms Encoding converts / maps your input data (character set) to another character set. The output character set varies depending on the Encoding style you choose. While there could be many motivations around using encodings, most frequent is to prevent unintentional loss of data. Let’s see couple of examples
a) I had given an example in one of my earlier blog post on ASP.NET security which used HTML encoding to prevent loss of string data being interpreted by browser as Script. Second line of code below keeps the string data intact while the former is interpreted as Script
Response.Write(Server.HtmlEncode(“alert(‘Niraj’)”)); // Or HttpUtility.HtmlEncode
b) Consider another scenario where we want to insert the below string in SQL Server
string Key = “S” + Convert.ToChar(0x0) + “!@#$”;
Now the problem with above is the NULL character (Convert.ToChar(0x0)), which Databases generally consider as end of the string. So in order to insert the string in Database you may as well encode it. Below line of code shows how to encode /decode strings using Base 64 .NET APIs
//To Encode, UTF8 is backward compatible with ASCII
As most you are aware ASP.NET ViewState too is encoded using Base 64; to experiment just copy of view state of your page and try to decode it using above APIs.
Coming to my blog post on WCF encodings I had mentioned that Base 64 encoding (used by Text Encoding) inflates the size of string data. I found wiki explaining this quite clearly (see the Examples section) and generally Base 64 inflates the size of data by 33%. That’s where you might prefer using binary encoding in non-interoperable scenarios and MTOM while transferring large chunks of binary data.
Hope this helps🙂 .