String Encoding & Decoding

The IFC exchange format "STEP physical file" only allows characters represented by decimal value 32 to 126 from the code table in ISO 8859-1. Any other character, like some Western characters, like the German "Umlaut", Greek or Cyrillic letters, or Asian characters, has to be encoded before being exchanged as part of a string value.

The rules for decoding and encoding are defined in ISO10303-21: "Industrial automation systems and integration — Product data representation and exchange — Part 21: Implementation methods: Clear text encoding of the exchange structure". A short summary and guideline is included in the IFC Implementation Guide.

Example: The following encodings define the character "Upper A umlaut" Ä - the hexadecimal character code is xC4 (decimal 196)

'\S\D' character code of D = x44 (decimal 68) added to x80 (128) is  x44 + x80 (68+128) = xC4 (196); since Ä is defined in ISO 8859-1 it is the default code page and no \P encoding is required.
'\PA\\S\D' same as above, but the \PA\ directive at the begin of the string explicitly defines that the value of xC4 (196) is taken from ISO 8859-1
'\X\C4' character code xC4 as 8-bit character code found in ISO 10646 (first 255 characters - also referred to as "row 0")
'\X2\00C4\X0\' character code xC4 as 16-bit character x00C4 in ISO 10646 (Unicode)