Sorting in PeopleTools

This chapter provides an overview of sort orders and discusses how to:

Set the sort order.
Force a binary sort in Structured Query Language (SQL).
Sort in COBOL.
Sort in Structured Query Reports (SQR).

Understanding Sort Orders

This section discusses:

Sorting overview.
PeopleTools sorts.
Database-level SQL ORDER BY sorts.
PeopleTools in-memory sorts.
Binary sorts.

Sorting Overview

Sorting data in English is reasonably simple given the well-defined sorting rules of the language. Additionally, most character sets are based on the ASCII standard, which allocates characters to numerical codes in English alphabetical order. Therefore, when sorting ASCII data by its binary representation, you automatically get a sort that makes sense in English; data is sorted from A to Z and numerics are sorted from 0 to 9.

However, sorting non-English languages is significantly more complex; some languages have special rules for sorting characters with diacritic marks; others, such as Japanese and Thai, can be sorted in several different orders depending on the usage or context of the sort.

In linguistic circles, sorting is also known as collation. In this book, the two terms are used interchangeably.

A sort order identifies how PeopleTools assembles, compares, and displays data. For example, a sort order specifies whether A is less than, equal to, or greater than Z. The simplest way of sorting data in a computer system is to sort it in the order that the characters appear in the character set. This is known as a binary sort, because it sorts the numerical codes of each character as they are stored in memory without any special sorting cases or linguistic considerations. A binary sort works well for sorting English language data; English sorting rules can be implemented as a binary sort as long as the underlying character set is laid out from A to Z. English characters in US-ASCII, EBCDIC, and Unicode are all laid out in this fashion, so a binary sort on data in any of these character sets is sufficient for sorting English data.

However, when sorting data in languages other than English, you must consider how to sort:

Characters with diacritic marks (such as á, ñ, and ö): after the letter Z, or after the base character without the diacritic.

In most character sets, these characters appear after the letter Z in binary order; however, in most languages they sort after the base form of the character.
Special characters and ligatures (such as æ and œ) and representative characters (such as ß) in some European languages.

In most cases, these characters must be expanded to their full form before being sorted. For example, æ is often expanded to ae when sorted.
Ideographic languages.

The Chinese, Japanese, and Korean languages use a large repertoire of characters in their written languages, such that it would be impossible for the average person to remember an arbitrary sort order. Instead, several schemes exist for sorting Chinese, Japanese, and Korean characters, including sorting based on core, common parts of each character (radicals), or by counting the number of brush or pen strokes that it takes to write the character (stroke count).

Unfortunately, most of these sorting schemes are language-specific and sometimes even specific to a country in which a language is spoken. For example, the German sharp-S character (ß) is sorted in Germany as if it were written as SS, but in Austria it is sorted as if it were written as SZ. Other times, multiple sort orders can be in use in a single country. In Spain, it is common to sort the ch character sequence after cz but before d. However, in more recent times many Spanish organizations have reverted to sorting the ch sequence as individual characters between cg and ci. Which is correct depends on personal or organizational preference.

In the majority of cases where data is sorted in PeopleTools, the host database management system performs the sort through an ORDER BY clause in SQL, and the result is displayed directly to the user. It is therefore critical, when you create a database, that you select an appropriate sort order based on the languages that you plan to maintain in the database, the countries in which you plan to operate, and any specific preferences or policies that your organization maintains relating to sorting (such as which Spanish ch sort is preferred). Most database systems require you to choose a sort order during database creation, because it affects the way that SQL indexes are stored on disk to optimize sorting performance.

Note. Due to limitations in most database packages and for performance reasons, PeopleTools supports only one sort order per database.

In some cases, PeopleTools also sorts data in memory and must use internal tables to determine the appropriate order for character data. This is particularly prevalent when sorting drop-down lists on pages displaying translate values and within PeopleSoft Query. It is therefore important that you not only create the database with the appropriate sort order, but you also tell PeopleTools which sort order you have chosen for the database so it can emulate this sort for lists that it sorts in memory.

Note. Sorting in PeopleTools is case sensitive (for example, the lowercase letter a sorts after the uppercase letter Z) and accent sensitive (for example, the accented letter á’ is considered distinct from the unaccented letter a). Sorting in PeopleTools is also kana sensitive for Japanese data, meaning that certain forms of Japanese characters (Hiragana and Katakana) are considered distinct rather than equivalent. PeopleTools does not support case-insensitive, accent-insensitive, or kana-insensitive sorting.

PeopleTools Sorts

Many different components of PeopleTools sort character data, most of which rely on the database system to perform sorting by using a SQL ORDER BY clause. However, PeopleTools does perform some sorting in memory.

See PeopleTools In-Memory Sorts.

Other parts of PeopleTools support only a binary sort for logistical or performance reasons. The following table indicates each of the common areas of PeopleTools that presents sorted lists of data to the user and the mechanism that each uses to perform the sort:

Functional Area	Sorting Engine
Component search dialog boxes.	Database-level SQL ORDER BY sort.
Scroll buffers.	Database-level SQL ORDER BY sort.
SQL operations in PeopleSoft Query, PS/n Vision, Crystal, and SQR.	Database-level SQL ORDER BY sort.
SQL functions that are called from PeopleCode (SQL Objects, ExecSQL) containing BETWEEN, <, >, MIN, MAX, and so on.	Database-level SQL ORDER BY sort.
PeopleCode ScrollSelect( ) / ScrollSelectNew( ), RowScrollSelect( ), and RollScrollSelectNew( ) functions.	Database-level SQL ORDER BY sort.
Column-specific re-sorts in PeopleSoft Pure Internet Architecture (by clicking on a column heading).	PeopleTools in-memory sort.
PeopleCode SortScroll( ).	PeopleTools in-memory sort.
Drop-down list boxes of translate values.	PeopleTools in-memory sort.
PeopleCode binary comparison functions ( <, >, =).	Binary sort.
%BINARYSORT meta-SQL functions.	Binary sort.
Greater than and less than COBOL operations.	Binary sort.
Greater than and less than comparisons in SQR.	Binary sort.

Database-Level SQL ORDER BY Sorts

PeopleTools relies on the sorting capabilities of the host database system for each functional area that is identified in the table in the previous section as using a database-level SQL ORDER BY sort. In this case, the SQL statement that is used to retrieve the data to be displayed to the user is coded to include an ORDER BY statement, and the sort order of the database determines in what order data is retrieved. Obviously, this relies on the database system being correctly configured for the appropriate linguistic sort that you determined is required for your database.

Each database management system has a different mechanism of determining the sort order for SQL ORDER BY statements. For example, Microsoft SQL Server requires the user to select a sort order when the server is installed, or when each database is created. Oracle enables the database administrator to specify the appropriate sort order in the init.ora parameter file.

See Enterprise PeopleTools 8.49 Installation Guide for your database platform.

PeopleTools In-Memory Sorts

To avoid round-trips to the database, PeopleTools performs some sorting in memory, typically of short lists, such as a list of translate values. PeopleTools supports a significant number of linguistic sorts for in-memory sorting. The sort order that is used for PeopleTools in-memory sorting is controlled by the sort order option on the PeopleTools Options page. While PeopleTools cannot emulate exactly each sort order that is offered by your database vendor, it provides an appropriate sort order for most popular business languages. Select the sort order that most closely corresponds to the sort order of your database. Conflicting sort order selection between the database and PeopleTools results in lists of values that are sorted by PeopleTools appearing in a significantly different order than lists of values that are sorted by the database.

See Setting the Sort Order.

Binary Sorts

When comparing values in PeopleCode, SQR, and COBOL that use less than (<), greater than (>), or other character comparison operators, a binary sort is used. The specific sort that is performed depends on the character set of data in memory at the time. For example, all binary sorts that are performed in PeopleCode, SQR syntax, and Unicode COBOL take place based on the Unicode binary order. Binary sorts that are performed in non-Unicode COBOL take place in the non-Unicode character set of the batch server. Binary sorts that are performed by the %BINARYSORT PeopleCode meta-SQL function take place in the character set of the database engine.

Therefore, you should be careful not to write code that depends on the sorting of data by using binary operators matching the sorting of data by the database as the result of sorting operators in SQL statements.

For example, the following PeopleCode statement is performed as a Unicode binary comparison, and á is located in the Unicode tables after the character z, so it always returns True:

if 'z' < 'á' then. . .

However, when the same comparison is run in SQL (as in the following example), the database’s sort order determines which character is greater. In a database that is configured for binary sorting, it returns true; however, in a database that is configured for French sorting (where á is sorted after a but before b), it returns false.

SELECT . . .
WHERE 'z' < 'á'

It is important that these functional areas use binary sorting instead of attempting to perform a linguistically-sensitive sort because:

Any linguistically-sensitive sort that is performed in memory can be only an approximation of the sort that the database system would perform in the same situation given the large number of sort orders that are provided by database vendors and the significant variations to these orders in minor versions of the database software.

Do not assume that a binary sort that is performed in PeopleTools will match the sort of the same characters in a SQL statement that uses the less than, greater than, or BETWEEN operators.
Performing a guaranteed database-compatible sort for each comparison in PeopleCode would require a round-trip to the database to perform the sorting and would affect the performance of PeopleCode operations.

Setting the Sort Order

As noted previously in this chapter, some components of PeopleTools cannot rely on the database to sort data and must do so in memory. The sort order option on the PeopleTools Options page enables you to select which sort order should be used by PeopleTools when sorting data in memory.

Generally, you should set this option soon after you have completed the installation of the database; choose the option that most closely approximates the sort order that you selected when creating the database.

To set the sort order:

Select PeopleTools, Utilities, Administration, PeopleTools Options.
Select an option from the Sort Order Option drop-down list box.
Click Save.

Forcing a Binary Sort in SQL

When writing PeopleCode and other procedural logic, it is sometimes expected that sorting a list of data in memory produces the same results as sorting the same list in the database through a SQL ORDER BY statement. When working in some languages, such as English, whose sorting logic is relatively simple, this equivalence may be taken for granted.

However, when running PeopleTools against a database that is using a linguistic sort, it is likely that a greater than, less than, or between comparison of values in memory may produce different results than if the same comparison were performed by the database.

Take, for instance, the following PeopleCode syntax:

If START_NAME < END_NAME then 
   Error("Start name must be less than end name");
End-If

Because all character comparisons in PeopleCode are performed based on the binary value of the character in Unicode (and not on the linguistic weight of the character), this code may produce unexpected results in languages where binary sorting is not sufficient. For example, if START_NAME had a value of Über and END_NAME had a value of Zeifang, this code produces the error as in the Unicode character set, the uppercase U with umlaut (Ü) appears after the uppercase Z. However, in a German sort, Ü should precede Z. If the database were created with a German sort order, this would be reflected by the database's sort if the same statement were reflected in SQL that is called from PeopleCode:

SQLExec("SELECT 'X' FROM PSLOCK WHERE :1 < :2"
        ,START_NAME, END_NAME, &X);
 
If &X != 'X' then
   Error("Start name must be less than end name");
End-If

The example shows that when you use a linguistically sorted database, these string comparisons may return different results when they are run through the database by using SQL than when they are performed in PeopleCode. Of course numerical values and dates always sort equivalently—this behavior is limited to the sorting of characters and character strings.

In some situations, you may want to disable the linguistic sort that is performed by the database on a statement-by-statement level and have the database perform the comparison at a binary level.

To force a SQL query to return sort-sensitive results based on a binary sort instead of a linguistic sort, a meta-SQL token, %BINARYSORT, is provided. Use this token is to wrap each column in an ORDER BY, less than, greater than, or BETWEEN operation where a binary comparison is required. For example, to return all employee names, ordered by last name in binary ordering, use the following SQL:

SELECT NAME FROM PS_PERSONAL_DATA
ORDER BY %BINARYSORT(NAME)

This may be useful if you are building an ordered array of names in memory that you plan to parse or manipulate with PeopleCode less than or greater than operators.

Similarly, to fetch a list of names from the database where the NAME field is greater than START_NAME by using a binary comparison that is parallel to that used in PeopleTools, use the following syntax:

SELECT NAME FROM PS_PERSONAL_DATA
WHERE %BINARYSORT(NAME) < %BINARYSORT(:1)

Similar constructs can be used with the BETWEEN predicate.

Note. The %BINARYSORT meta-SQL token ensures that the database evaluates the column that is wrapped by the token based on its binary value in the character set of the database. The sorting of this representation matches its binary sort position in PeopleTools only if the character set of the database contains the character in the same order as Unicode, which is used for binary representation of data in PeopleTools memory. Therefore, if you are running a US-ASCII, Latin-1 (ISO 8859-1), or Unicode database, the sorting of all alphabetic characters matches the Unicode sort in memory. However, if you are running a database that is encoded in EBCDIC or in Japanese Shift-JIS, the order may be markedly different, and you cannot rely on the %BINARYSORT meta-SQL token to match the binary order that is used within PeopleTools.

Sorting in COBOL

Character string comparisons that are performed in COBOL are performed based on the binary representation of each character in the character set that is being used by the COBOL program. Typically, this matches the character set of the database. Use the %BINARYSORT meta-SQL token to wrap any ORDER BY clause or string comparison that is performed by the SQL statements that are called by the program that you want to sort the same as it would sort in COBOL memory.

Sorting in SQR

Like PeopleCode, all string comparisons and sorting in SQR are performed based on a Unicode binary sort. However, because SQR does not process PeopleTools meta-SQL tokens, you cannot use %BINARYSORT from SQR. You should make allowances for SQR programs to not depend on having memory sorting match the sort that the database performs in ORDER BY and similar SQL clauses.