Microsoft Knowledge Base Article
This article contents is Microsoft Copyrighted material.
©2005-©2007 Microsoft Corporation. All rights reserved.
Terms
of Use |
Trademarks
Article ID: 232580 - Last Review: December 7, 2005 - Revision: 4.3
Description of storing UTF-8 data in SQL Server
This article was previously published under Q232580
SUMMARY
Some applications (especially those that are Web based) must deal with Unicode data that is encoded with the UTF-8 encoding method. SQL Server 7.0 and SQL Server 2000 use a different Unicode encoding (UCS-2) and do not recognize UTF-8 as valid character data. This article discusses some options for dealing with this situation.
MORE INFORMATION
Unicode data can be encoded in many different ways. UCS-2 and UTF-8 are two common ways to store bit patterns that represent Unicode characters. Microsoft Windows NT, SQL Server, Java, COM, and the SQL Server ODBC driver and OLEDB provider all internally represent Unicode data as UCS-2.
The options for using SQL Server 7.0 or SQL Server 2000 as a backend server for an application that sends and receives Unicode data that is encoded as UTF-8 include:
- If your application uses Active Server Pages (ASP) and you are using Internet Information Server (IIS) 5.0 and Microsoft Windows 2000, you can add "<% Session.Codepage=65001 %>" to your server-side ASP script. This instructs IIS to convert all dynamically generated strings (example: Response.Write) from UCS-2 to UTF-8 automatically before sending them to the client.
If you do not want to enable sessions, you can alternatively use the server-side directive "<%@ CodePage=65001 %>".
Any UTF-8 data sent from the client to the server via GET or POST is also converted to UCS-2 automatically. The Session.Codepage property is the recommended method to handle UTF-8 data within a web application. This Codepage setting is not available on IIS 4.0 and Windows NT 4.0. For additional information, see the following article in the Microsoft Knowledge Base: 254313Â
(http://kbalertz.com/Feedback.aspx?kbNumber=254313/EN-US/
)
Error Message: Active Server Pages Error 'ASP 0203' Invalid Code
- Translate to and from UCS-2 or UTF-8 as appropriate within the application. Sample code for this type of conversion is located at the Unicode Consortium's site:
A high-level description of the algorithm to convert UCS-2 to UTF-8 can be found in the Internet Request For Comments document RFC2279.
On Windows NT or Windows 2000, you may use the Win32 functions MultiByteToWideChar and WideCharToMultiByte to convert UTF-8 to and from UCS-2 by passing the constant CP_UTF8 (65001) as the first parameter to the functions. - Modify the application to use UCS-2 instead of UTF-8 encoding.
- Store the actual UTF-8 data on the server using the
BINARY/VARBINARY/IMAGE columns. Storing UTF-8 data on SQL Server
means that you can not use SQL Server to sort or find ranges of
these values as if the data were valid character data. The types of
operations on columns containing UTF-8 data that would not return
expected results include "ORDER BY", greater-than ">" and less-than
"<" comparisons, and the built-in SQL Server string manipulation
functions such as SUBSTRING().
However, equality comparisons, will work as long as the strings being compared are equivalent at a byte level. Note that if you store UTF-8 data in SQL Server you should not use character columns (CHAR/NCHAR/VARCHAR and so forth). UTF-8 is not valid character data to SQL Server, and by
storing non-character data in character columns you risk
encountering problems such as the issues discussed in the
following Microsoft Knowledge Base articles:
155723Â
(http://kbalertz.com/Feedback.aspx?kbNumber=155723/EN-US/
)
INF: SQL Server Truncation of a DBCS String
234748Â
(http://kbalertz.com/Feedback.aspx?kbNumber=234748/EN-US/
)
PRB: SQL Server ODBC Driver Converts Language Events to Unicode
If you are considering this option, keep in mind that if you ever need to access UTF-8 data stored inside SQL Server from any application other than a Web browser (for example, from a non-Web-based ODBC application) you will need to do a conversion from UTF-8 to UCS-2 within this application as ODBC, OLEDB, COM, Win32 API calls, VB, and C runtime string manipulation functions do not work with UTF-8 data. This moves the burden of translation to a different application. - If your requirements do not include the need to store data from a combination of languages that cannot be satisfied by a single code page, you may not need to use Unicode.
Unicode support was introduced to SQL Server starting with SQL Server 7.0. Since SQL Server 6.5 does not support the storing of Unicode data, the only options for SQL Server 6.5 are outlined in step 4 and step 5.
APPLIES TO
- Microsoft SQL Server 2000 Standard Edition
- Microsoft SQL Server 6.5 Standard Edition
- Microsoft SQL Server 7.0 Standard Edition
- Microsoft SQL Server 2005 Standard Edition
- Microsoft SQL Server 2005 Express Edition
- Microsoft SQL Server 2005 Developer Edition
- Microsoft SQL Server 2005 Enterprise Edition
- Microsoft SQL Server 2005 Workgroup Edition
Community Feedback System
Very often, it takes hours to solve a problem. Very often, you've looked high
and low, and have tried a lot of solutions. When you finally found it, chances
are, it was because someone else helped you. Here's your chance to give back.
Use our community feedback tool to let others know what worked for you and what
didn't.
Please also understand that the community feedback system is not warranted to be
correct, it's simply a system that we've built to let people try and help each
other. If something in a feedback response doesn't make sense to you, or you're
not comfortable making changes that the feedback talks about (like registry
edits), please consult a professional.
Thank you for using kbAlertz.com Feedback System.
-- Scott Cate
Be the first to leave feedback, to help others about this knowledge base
article.
(Optional) Name
(Optional)
Public URL Or Email
Comments
No
HTML -- Text Only Please