OS/2 Warp 4 and up include APIs for Unicode support, referred to as the Universal Language Support (ULS) functions. Unfortunately, they are very poorly documented.

The Warp 4 Toolkit contains cursory INF reference documentation (UNIAPI.INF) for ULS, but it is now severely outdated. The 4.5x Toolkit includes updated documentation in HTML format (UNIAPI.HTM), but it is woefully incomplete, very poorly formatted, and contains some blatant errors.

Consequently, I have embarked upon a series of projects to fix this state of affairs, including an updated API reference documentation, as well as an introductory programming guide. Both are available below.

I have also undertaken to create a comprehensive list of all codepages supported by OS/2. These codepages may be used in the ULS API conversion functions. The latest version of this list is available, in various formats, at the bottom of this page.

In addition, I have released a REXX library which provides access to parts of the ULS API through REXX functions.

Finally, I have been working on some sample programs to illustrate how to use parts of the Unicode API.

Updated Programmer's Reference

First, I have undertaken a major revamping of the ULS API reference documentation. My document is based on IBM's UNIAPI.HTM from the 4.52 Toolkit, but with many improvements:

  • The HTML has been completely reformatted so as to be significantly more clear and readable. (Compatibility with older web browsers has nonetheless been preserved.)

  • A section describing possible API return codes has been added.

  • Several incorrect, incomplete, poorly-expressed or just plain misleading function descriptions have been fixed. In addition, important missing information has been added (such as the entire function description for UniStrToUcs(), which in the IBM version was actually a erroneous copy-paste of an entirely different function instead).

  • Various clarifications have been made, and in some cases helpful comments have been added.

  • Some of the sample code has been rewritten or replaced with code that is more illustrative of the function in question (and that actually works).

  • Descriptions of the ULS keyboard functions and data types (which were inexplicably missing from the latest toolkit documentation) have been restored.

This document remains a work in progress, as I continue to find areas for improvement. See the below for the change history.

The legal status of this document is a bit muddy. As it is a direct derivation of IBM's own documentation, I do not claim any particular rights over it. As far as I am concerned, it may be freely redistributed and/or modified. IBM's own legal terms regarding modifying or redistributing this documentation are unclear.

Download

You can download the HTML documentation as a ZIP file, or read it online.

Release History

2006-09-29

  • Several important corrections to the descriptions of UniStrToUcs() and UniStrFromUcs().
  • Added more information to UniQueryUconvObject(), including the 4-byte character identifier in the first[] parameter.

2006-09-01

  • Added descriptions of Unicode keyboard functions and data types.
  • Added some missing information on character classification functions and values, and fixed several errors.
  • Fixed incorrect sample code for UniQueryChar().

2006-08-27

  • Fixed several errors in the codepage specifier descriptions for UniCreateUconvObject(), plus a couple of other tweaks.
  • Replaced sample code for UniQueryUconvObject().
  • Moved description of uconv_attribute_t parameters out of UniSetUconvObject() section into the structure description where they belong.
  • Several corrections and improvements to Data Types section.

2006-08-18

  • Improved description and example code for UniDeleteUserLocale().
  • Added several missing LocaleItem values to UniQueryLocaleItem().
  • Some wording improvements.

2006-07-20

  • First public release.

New Programmer's Guide

My other project is to write programming guide or tutorial document that describes how to properly use the Unicode APIs.

This document is in OS/2 INF format. The IPF source is included, to make it easier for anyone who wants to suggest amendments (or for translation, if there's anyone that ambitious).

This document is entirely my own work, and may be redistributed freely.

Download

Release History

2010-07-01 (v1.4)

  • Added more information about outputting Unicode text under PM and GPI, including a note about the requirement for Unicode fonts, as well as sample code for direct UCS-2 output.
  • Improved and expanded the description of conversion specifier options; in particular, added an explanation of the behaviour of the "path" parameter.
  • Added a section discussing conversion buffer length.
  • Added a note about parameter differences between the conversion functions.
  • Added a note about behaviour when referencing a locale name which exists as both a user and a system locale.
  • Corrected Shequel symbol to New Shequel symbol in footnote for codepage 862.

2007-02-21 (v1.3)

  • Clarified use of the length parameters in the section describing conversion functions.

2006-10-10 (v1.21 / WarpStock 2006 release)

  • Corrected description of codepage 916.

2006-09-29 (v1.2)

  • A few minor corrections.
  • Reworked the codepage listing, including many corrections and improvements.

2006-09-15 (v1.1)

  • Added information about the level of Unicode supported.
  • Corrected description of CHS_HANGUEL attribute.
  • Corrected description of locale item naming conventions.
  • Corrected order of UniStr*Ucs() parameters.
  • Corrected value ranges of UCS planes.
  • Added a section describing character sets supported by the API (Unicode 2.x).
  • Added Appendix B, listing all known OS/2 codepage numbers.
  • Added several items to the references section.
  • Other miscellaneous improvements.

2006-09-01 (v1.0)

  • Initial release.

List of OS/2 Codepages

This table attempts to list every codepage, including numeric aliases, known to modern OS/2 systems. Keep in mind that not all codepages may be available on all systems (depending on the installation options and/or operating system version).

The codepages listed in this table may be used in conjunction with the ULS codepage conversion functions. Many of them are not available for use as system or PM codepages, and will not be listed by the WinQueryCpList() function.

Explanation of Fields

Codepage
Lists the codepage number. ("OS2UGL" is a special codepage which has no number, and is thus identified by name.)
Description
A brief description of the codepage, including the language or character set standard(s) covered, and any additional information about the encoding format used.

Codepages prefixed with 'IBM' indicate encodings based on modifications of the standard DOS 8-bit ASCII layout, and are not recommended for cross-platform interchange.

Codepages prefixed with 'ISO' are official ISO standards, and may be used for interchange with other systems.

Compatibility
Indicates the underlying encoding of Latin text on which the codepage is based.
  • "ASCII" indicates a PC codepage compatible with the 7-bit displayable ASCII character set.
  • "EBCDIC" indicates an IBM mainframe (System 370/390/iSeries) codepage based on EBCDIC.
  • "Other" indicates a codepage which is not byte-for-byte compatible with either ASCII or EBCDIC. (By definition this includes all fixed-width double-byte codepages.)
Bytes/Character
Indicates the number of bytes used by the codepage to represent a single character. Fixed-width codepages will show a single integer value; variable-width codepages will show a range.
Process
Indicates whether or not the codepage may be used as an OS/2 process codepage (through the CODEPAGE setting in CONFIG.SYS). This does not take into account the system's COUNTRY setting, which may impose additional restrictions on which codepages may actually be used in this way.
PM
Indicates whether or not the codepage may be used as a display codepage within Presentation Manager (either through WinSetCp() or GPI font attributes).
Notes
Additional information about the codepage.

Download

  • Read online (72 kB HTML document)
  • Download in OpenOffice 1.0 format (11 kB spreadsheet)
  • Download in CSV format (11 kB tab-delimited ASCII file)

REXX Universal Language Support library — RXULS.DLL

REXX Universal Language Support (RxULS) provides a REXX interface to selected parts of the OS/2 Universal Language Support API (ULS).

Using RxULS, it becomes possible to do the following from REXX:

  • Search or transform text strings according to locale-specific rules.
  • Query locale information.
  • Convert text strings from one codepage to another, including to or from Unicode encodings such as UTF-8 and UCS-2.
  • Access Unicode-formatted clipboard text.

See the documentation for details.

Files

rxuls_052.zip RXULS.DLL with documentation, examples, and source.

Sample Programs

CPMAP
CPMAP is a simple program that can display a complete character map of any OS/2 codepage (even those which are not available for use as system or PM codepages).

I originally wrote this program to help me create the list of codepages (above). However, it serves as a useful illustration of some of the Unicode APIs. (It can also be handy for debugging fonts.)

In some ways, this program is similar to Ken Borgendale's ShowCP program, although it evolved quite independently. Unlike ShowCP, CPMAP can display any installed codepage, not just a selected few. (Conversely, though, it lacks both print support and a glyph-details mode).

For best results, CPMAP should be used in conjunction with a Unicode outline font. It defaults to using Times New Roman MT 30, which is included in recent versions of OS/2 (Warp Server for e-business and later), and is also available with some versions of Java 1.1.8.

CPMAP is made available under a BSD-style license; see the documentation for details.

Screenshots

Files

cpmap_11.zip Program and source code included.

Unicode Clipboard Demonstration
This is a very simple program that demonstrates how to implement support for the "text/unicode" clipboard format used by the Mozilla family of applications.

The user interface consists only of an MLE (editor) control. If you paste text that was copied from Mozilla (or some other program that supports the same format) into the MLE, it will be converted from UCS-2 (Unicode) into the current codepage. Conversely, text which you copy from the MLE will be converted from the current codepage into UCS-2 format.

No other functionality (loading, saving, printing, etc.) is supported. This program is intended to demonstrate clipboard support, and nothing more.

The source code is included, along with a Makefile for the IBM C Compiler (version 3.x). It may be considered public domain code, and may be freely used for any purpose, commercial or otherwise.

Files

clipuni.zip

I have a few other programs that use the ULS APIs: