The
xml property of the
DOMDocument object does not return the encoding attribute for the XML data, even if a specific encoding is specified in the XML.
Because the
xml property always returns the data as a Unicode string, it is UTF-16 encoded. This means that the original encoding is no longer valid and is filtered out.
This behavior is by design.
If a newer version of MSXML has been installed in side-by-side mode, you must explicitly use the Globally Unique Identifiers (GUIDs) or ProgIDs for that version to run the sample code. For example, MSXML version 4.0 can only be installed in side-by-side mode. For additional information about the code changes that are required to run the sample code with the MSXML 4.0 parser, click the following article number to view the article in the Microsoft Knowledge Base:
305019Â
(http://kbalertz.com/Feedback.aspx?kbNumber=305019/EN-US/
)
INFO: MSXML 4.0 Specific GUIDs and ProgIds
Steps To Reproduce Behavior
- Create an XML file ("test.xml") similar to the following text that specifies a particular encoding, in this case "windows-1252:"
<?xml version="1.0" encoding="windows-1252"?>
<root>Hello</root>
- Create a script using the following code:
<HTML>
<BODY>
<script language="vbscript">
Set xmldoc = CreateObject("Msxml2.DOMDocument")
xmldoc.async = false
xmldoc.load("test.xml")
MsgBox xmldoc.xml
</script>
</BODY>
</HTML>
- Execute the script, and note the XML that is displayed.
Results
The XML data that is displayed in the message box looks similar to the following:
<?xml version="1.0"?>
<root>Hello</root>
Note that the encoding attribute has been removed.
However, the original value of this attribute is still stored in the
DOMDocument, and can be retrieved by using a
XMLDOMProcessingInstruction object. Usually, the encoding information is contained in the beginning of the XML file, or as the first node of the
DOMDocument.
To retrieve the encoding information, retrieve the first node (item 0) of the
DOMDocument object, which, in this case, is a processing instruction node, and then get the text value of the corresponding "encoding" attribute.
The following Microsoft VBScript example displays the value "windows-1252" if xmldoc refers to a
DOMDocument object that was created by using the XML data from the preceding example:
Dim encoding
encoding = xmldoc.childNodes(0).Attributes.getNamedItem("encoding").Text
MsgBox encoding
The following is an example of how to retrieve the value in Microsoft Visual C++:
IXMLDOMProcessingInstructionPtr pInst = pXMLDoc->GetchildNodes()->Getitem(0);
_bstr_t bstrEncoding = pInst->Getattributes()->getNamedItem("encoding")->Gettext();
For additional information%1, click the article number%2 below
to view the article%2 in the Microsoft Knowledge Base:
%3Â
(http://kbalertz.com/Feedback.aspx?kbNumber=%3/EN-US/
)
%4