IFC String Encoding

String Encoding & Decoding

The IFC exchange format β€œSTEP physical file” uses characters represented by decimal value 32 to 126 from the code table in ISO 8859-1. Any other character, like some Western characters, like the German β€œUmlaut”, Greek or Cyrillic letters, or Asian characters, has to be encoded before being exchanged as part of a string value. Up until IFC4.x this encoding is used in IFC. In the future, IFC will adopt the UTF8 encoding.Β 

The rules for decoding and encoding are defined in ISO10303-21: β€œIndustrial automation systems and integration β€” Product data representation and exchange β€” Part 21: Implementation methods: Clear text encoding of the exchange structureβ€œ. A short summary and guideline is included in the IFC Implementation Guide.

Example:Β The following encodings define the character β€œUpper A umlaut” Γ„ – the hexadecimal character code is xC4 (decimal 196)

Characters Description
β€˜\S\D’ character code of D = x44 (decimal 68) added to x80 (128) isΒ  x44 + x80 (68+128) = xC4 (196); since Γ„ is defined in ISO 8859-1 it is the default code page and no P encoding is required.
β€˜\PA\\S\D’ same as above, but the PA directive at the begin of the string explicitly defines that the value of xC4 (196) is taken from ISO 8859-1
β€˜\X\C4’ character code xC4 as 8-bit character code found in ISO 10646 (first 255 characters – also referred to as β€œrow 0”)
β€˜\X2\00C4\X0\’ character code xC4 as 16-bit character x00C4 in ISO 10646 (Unicode)

 

Imported from MarkDown source file