Category Archives: Clarion 11

Unicode followup

There was a question raised regarding what the new implementation of Unicode support would mean for developers who create Clarion add-on products.  The question was asked:

Does having THREE encoding options mean every 3rd party product will need to include 3 DLL/DLL and 3 LIB/LIB libraries?

And the short answer is NO, no need for that.  Read on for more details.

The three options I showed in a screenshot in this post  new Unicode implementation show the Project Setting encoding options are “ANSI, UTF-8, Unicode”. These project level encoding settings are for the compiler.  The compiler needs to know how to treat string literals.  The following are the rules the compiler uses to determine how to handle string literals:

  • If the source file is encoded as ANSI, strings literals without the U specifier before the apostrophe are taken as is. Unicode string literals with U before the apostrophe are converted by the compiler to Unicode using the codepage value set by the pragma define(codepage=>n).
  • If the source file has UTF-8 or UTF-16 encoding, Unicode string literals
    are taken as is. ANSI string literals without U before the apostrophe
    are converted by the compiler to ANSI using the codepage value set
    by the pragma define (codepage=>n).
  • The Default value for the codepage(when not specified by the pragma define (codepage=>n)) used by the compiler for conversions of ANSI<->Unicode is CP_ACP.

Clarion 11 – new Unicode implementation

The C11 RTL implements new internationalization code all based on the OS locale and codepage settings. All Windows and controls in the new RTL are created and processed using the Unicode variant of Windows API functions. All text drawing also uses Unicode. C11 introduces the new USTRING data type (Unicode analog of CSTRING) and adds official support for the BSTRING data type.

The new internationalization code in the RTL supports conversion between ANSI and Unicode strings on the basis of the system codepage and locale. There are also two new built-in functions TOANSI and TOUNICODE that allow conversions that are not based on the current codepage.

There is a new Project level setting to tell the compiler what encoding to expect:

While Clarion has supported Unicode for a long time you were limited to the system locale setting in the “Regional and Language Options”.  C11 allows as many different charsets as you need.  This test program shows mixing several charsets, and the use of the “U” specifier to tell the compiler that the static string is Unicode text (and the program also uses the TOUNICODE function) –

running the test program produces this –


For more on the technical details read on:
The string stack supports Unicode strings, both ANSI and Unicode strings are handled by the same string stack. If a string expression has the USTRING or BSTRING type
(USTRING or BSTRING variable, Unicode string literal, result of function returning the *BSTRING, *USTRING or USTRING type, or any concatenation if at least one operand is a Unicode expression), then the corresponding element of the string stack is processed as Unicode.

If none of the above applies, then the string is assumed to be an ANSI string (with optimization for numbers). The LEN() function is now a compiler intrinsic. It returns the number of wide chars in the top element’s value, if it is Unicode, or number of ANSI characters, if the top element is ANSI.

* BSTRINGs were designed for use in API functions; they are not suitable for usage as USE variables.

The previous internationlization settings; CLACASE, CLACOLSEQ, CLADIGRAPH are still supported, but are considered as deprecated.

The LOCALE function in C11 supports the following additional parameters:
1) ‘CLALCID=n’ or ‘CLALCID=Windows’ or ‘CLALCID=”ll-cc”‘
Changes the default locale in the Clarion RTL.

If the value is “Windows”, the default Windows user locale is used.
If the parameter has the form “ll-cc”, it can be one of following:
“EN-US” – USA English, default sorting –
“EN-GB” – British English, default sorting –
“ES-ES” – Spain Spanish, default sorting –
“DE-DE” – Germany German, default sorting –
“FR-FR” – France French, default sorting –
“IT-IT” – Italy Italian, default sorting –
“NL-NL” – The Netherland Dutch, default sorting
“RU-RU” – Russia Russian, default sorting –
“ES-MX” – Mexico Spanish, default sorting –
“PT-PT” – Portugal Portuguese, default sorting
“EN-AU” – Australia English, default sorting
“FR-CA” – Canadian French, default sorting
“EN-CA” – Canadian English, default sorting
“EN-ZA” – South Africa English, default sorting
“PT-BR” – Brazilian Portuguese, default sorting
“ES-AR” – Argentina Spanish, default sorting
“JA-JP” – Japan Japanese, default sorting –
“KO-KR” – Korea Korean, default sorting

The locale string settings are case insensitive

2) ‘CLACODEPAGE=n’ or ‘CLACODEPAGE=Windows’ or

Changes the default codepage in the Clarion RTL. If value is Windows, the
current default Windows user codepage is used. If the parameter is a string enclosed in double quotes, it must be one of following: –
– “THAI”
– “UTF7”
– “UTF8”

(case insensitive)

3) ‘CLADOWNAME=s’ or ‘CLADOWNAME=Windows’ Changes the default full names of the days of the week. If the parameter is Windows, the names of days of week from default locale are used. Otherwise the parameter must be a list (enclosed in double quotes) of names to use.

4) ‘CLADOW=s’ or ‘CLADOW=Windows’ Changes default abbreviations of the days of the week. If  the parameter is “Windows”, the abbreviations from default locale are used, Otherwise the parameter must be a list (enclosed in double quotes) of abbreviations to use.

CLALCID and CLACODEPAGE (or SYSTEM{PROP:Locale} and SYSTEM{PROP:Codepage}) are replacements for CLASYSTEMCHARSET, CLACASE, CLACOLSEQ and CLADIGRAPH parameters of the LOCALE/ENV files and for corresponding SYSTEM properties. Old parameters/properties are still supported but locale and codepage are preferable, and all new programs should use them.