XML Unicode Root API

cstoneba · May 3, 2023, 2:12pm

I am hitting our BES root API https://server:52311/api/computer/. One of property values has a unicode U+0014 in it and this is not valid supported Unicode characters in XML v1.0. This does seem to be a supported character in XML 1.1 https://en.wikipedia.org/wiki/Valid_characters_in_XML#

I’m still trying to understand the solution but is BigFix Root API only able to return v1.0 XML documents or does it also support v1.1?

JasonWalker · May 3, 2023, 3:07pm

I haven’t managed to find an online XML validator that accepts U+0014 even in an XML 1.1 document. It appears to be a device control character…are you sure this would be valid in your use-case even if it were returned in a 1.1 document?

I’m curious what sort of computer property would even return that character as part of its value?

JasonWalker · May 3, 2023, 3:14pm

Just for reference, I used a simple query to get a response from the root server, which is presented with XML 1.0 declaration but contains the invalid unicode character in the results. Then I changed the header to 1.1 and tried running it through several online validators that claim to have 1.1 support, but all rejected the character.
The document I ended up with is

<?xml version="1.1" encoding="UTF-8"?>
<BESAPI xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="BESAPI.xsd">
	<Query Resource="character 20">
		<Result>
			<Answer type="string"></Answer>
		</Result>
		<Evaluation>
			<Time>0.29ms</Time>
			<Plurality>Singular</Plurality>
		</Evaluation>
	</Query>
</BESAPI>

(the unprintable character is here in the post but is not printable/displayable. But this doc includes the query so you could reproduce pretty easily.

If you send this through a browser to api/query, the browser complains the result page is invalid XML but I could “View Source” to get the XML response)

cstoneba · May 3, 2023, 3:24pm

It’s within a Managed Property that includes the computer’s serial number. I suspect someone probably rebuilt the computer and manually entered a bad character in the Serial number.

https://server:52311/api/computer/xxxxxx

I reproduced it by making a property that reads a txt file and put the bad character in the file. The Console successfully shows it

here is the contents: beforeafter
The bad character is DC4 / (value 20 ) /hex 14

JasonWalker · May 3, 2023, 3:30pm

I’d suspect the Console is simply stripping/replacing the invalid character as part of it’s XML processing…you might have to do the same in whatever script is retrieving your API query. I’m most familiar with Python, it has options in ElementTree or lxml to replace or discard the invalid characters.

Better option is to fix the data at the source, sure, but you may need to handle it in your processing script.

I don’t think there’s a way to make the server API return it as an XML 1.1, but I’m not sure this character even processes correctly in a 1.1 document anyway?

JasonWalker · May 3, 2023, 3:32pm

(there shouldn’t even be a way at the keyboard to enter that unicode character, at least in Windows, short of holding the ALT button and typing the UNICODE value in the numeric keypad)

cstoneba · May 3, 2023, 3:39pm

Hi, you are correct. We might just drop it from the results when we process the response if we detect unicode characters. In testing in our IDE, it seems to encode the bad character when it is an XML 1.1 document vs 1.0.

I may still create an Idea to have the REST API updated to output XML documents in v1.1 just as best practice.