%Paradigm: Sorting. Aug 1995, revised Feb 1996 %C.G. van der laan, cgl@rc.service.rig.nl \input blue.tex \loadtocmacros \loadindexmacros %necessary for sorting examples \tolerance500\hbadness=499\hfuzz=5pt \bluetitle Paradigms: Sorting \blueissue \maps{96}2 \everyverbatim{\emc} \def\on{\bgroup\afterassignment\doold\count0=} \def\doold{\oldstyle\the\count0\egroup} \beginscript \bluehead BLUe's Design IX Hi folks. A strong and {\em unique\/} point of BLUe's format system is its indexing on the fly. Be it for a total document or just for a chapter. One of the requisites for indexing on the fly is the possibility to sort within \TeX. Sorting has always been an important topic in computer science. In \TeX{} I needed sorting on several occasions especially for sorting numbers such as citation lists, words such as addresses, and index entries. This note is devoted to paradigms encountered while implementing and applying sorting in \TeX. Sorting can be characterized by \item- the set to be sorted (numbers, word. etc.) \item- the addressing of elements of the set \item- the ordering for the set \item- the comparison operation, and \item- the exchange operation. \smallbreak To do some sorting of your own please load from \bluetex{} the index macros via \cs{loadindexmacros}. Below parts have been extracted from that collection of macros to make this note as intelligible as possible. \cs{ea} is my shortcut for \cs{expandafter}. \bluehead Linear sorting A simple sorting method is repeatedly searching for the smallest element. In the example below the set is defined as a def with list element tag |\\|. \thisverbatim{\unmc} \beginverbatim \def\lst{\\\ia\\\ib\\\ic} \def\ia{314} \def\ib{1} \def\ic{27} % \def\dblbsl#1{\ifnum#1<\min\let\min=#1\fi} % \loop\ifx\empty\lst\expandafter\break\fi \def\\{\let\\=\dblbsl\let\min=} %space \lst%find minimum \min%typeset minimum {\def\\#1{\ifx#1\min \else\nx\\% \nx#1\fi}\xdef\lst{\lst}}% \pool%Inspired upon van der Goot's %Midnight macros. \def\loop#1\pool{#1\loop#1\pool} \def\break#1\pool{} !endverbatim The coding implements the looping of the basic steps \item- find minimum (via \cs{lst}, and suitable definition of \DeK's list element tag |\\|) \item- typeset minimum (via \cs{min}) \item- delete minimum from the list (via another appropriate definition of the list element tag. \smallbreak Remark. The kludge for using \cs{ifx} instead of \cs{ifnum} in the deletion part is necessary because \TeX{} inserts a \cs{relax}. \bluehead Sorting in an array If we adopt array addressing in \TeX{} for the elements to be sorted then we can implement bubble sort in \TeX{} too.\ftn {The above example of linear sorting can be seen as sorting in a so-called associative array.} \bluesubhead Array addressing When we think of associating values to (index) numbers\Dash |1| $\rightarrow$ |\value{1}| \Dash then we are talking about an array. A mapping of the \IN{} on \dots for example \IN. The \cs{value} control sequence can be implemented as follows. \beginverbatim \def\value#1{\csname#1\endcsname} !endverbatim The writing to the array elements can be done via \beginverbatim \def\1{} \def\2{}... !endverbatim In general this must be done via \beginverbatim \ea\def\csname\endcsname{} !endverbatim \bluesubsubhead To get the hang of it The reader must be aware of the differences between \item- the index number, $\langle k\rangle$ \item- the counter variable \cs{k}, with the value $\langle k\rangle$ as index number \item- the control sequences |\|$, k=1, 2, \dots, n$, with as replacement texts the items to be sorted. \smallbreak When we have |\def\3{4}| |\def\4{5}| |\def\5{6}| then \\ \def\3{4}\def\4{5}\def\5{6} |\3| yields {\bf\3}, \\ |\csname\3\endcsname| yields {\bf\csname\3\endcsname}, and \\ |\csname\csname\3\endcsname\endcsname| yields {\bf\csname\csname\3\endcsname\endcsname}. Similarly, when we have\\ \cs{k3} |\def\3{name}| |\def\name{action}| then \\ \def\3{name}\def\name{action}\k=3{} |\the\k| yields {\bf\the\k}, \\ |\csname\the\k\endcsname| yields {\bf\csname\the\k\endcsname}, and\\ |\csname\csname\the\k\endcsname\endcsname| yields {\bf\csname\csname\the\k\endcsname\endcsname}.\ftn{Confusing, but powerful.} To exercise shortcut notation the last can be denoted by |\value{\value{\the\k}}|. Another \cs{csname...} will execute \cs{action}, which can be whatever you provided as replacement text.\ftn {My other uses of the \cs{csname} construction are: to let \TeX{} accept an outer defined macro name in a replacement text, to check whether a name has already been defined, and to mimic a switch selector.} \bluehead Bubble sort This process looks repeatedly for the biggest element which is swapped to the end. This is done for the complete array, the array of size $n-1$ et cetera. The pseudo code reads as follows. \beginpascal for n:= upb downto 2 do begin for k:= n-1 downto 1 do if a[n]. %Result: \1<=\2<=...<=\. {\loop\ifnum1<\n{\k\n \loop\ifnum1<\k \advance\k-1 \cmp{\deref\k}{\deref\n}% \ifnum\status=1 \xch\k\n\fi \repeat}\advance\n-1 \repeat}}%end \bubblesort %with auxiliaries \def\deref#1{\csname\the#1\endcsname} \let\cmp\cmpn %from blue.tex or provide %\def\cmp#1#2{%Comparison. Yields % \status=0, 1, 2 for =, >, < %{...} %\def\xch#1#2{%exchange %#1, #2 counter variables %{...} !endverbatim \bluehead Heap sort We can organize the array as a heap. A heap is an ordered tree. Loosely speaking for each node the siblings are smaller or equal than the node. The process consists of two main steps \item- creation of a heap \item- sorting the heap \smallbreak with a sift operation to be used in both. In comparison with my earlier release of the code in \maps{92}2, I adapted the notation with respect to sorting in {\em non-decreasing\/} order.\ftn {It is true that the reverse of the comparison operation would do, but it seemed more consistent to me to adapt the notation of the heap concept with the smallest elements at the bottom.} What is a heap? A sequence $a_1, a_2, \dots, a_n$, is a heap if $a_k\ge a_{2k} \wedge a_k\ge a_{2k+1}, k=1, 2, \dots, n\div2$, and because $a_{n+1}$ is undefined, the notation is simplified by defining $a_k>a_{n+1}, k= 1, 2, \dots , n$. \\ A tree and one of its heap representations of $2, 6, 7, 1, 3, 4$ read $$\thisbintree{\tophns10ex} \beginbintree{00}2{11}6{12}7{21}1{22}3{23}4 2\endbintree \kern-4ex\raise13ex\hbox{$\buildrel heap\over \longrightarrow$} \thisbintree{\tophns10ex}\kern-4ex \beginbintree{00}7{11}6{12}4{21}3{22}2{23}1 2\endbintree$$ In PASCAL-like notation the algoritm, for sorting the array a[1:n], reads {\parindent0pt \beginpascal (*heap creation*) l := n div 2 + 1; while l <> 1 do begin l := l-1; sift(a, l, n) end; (*sorting*) r := n; while r <> 1 do begin swap(a[1], a[r]); r := r-1; sift(a, 1, r) end; (*sift arg1 through arg2*) j:= arg1; while 2j >= arg2 and (a[j] < a[2j] or a[j] < a[2j+1]) do begin mi := 2j + if a[2j] > a[2j+1] then 0 else 1; exchange(a[j], a[mi]); j := mi end; \endpascal \smallskip} \bluesubhead Purpose Sorting values given in an array. \bluesubhead Input The values are stored in the control sequences \cs{1}, \dots, |\|. The counter |\n| must contain the value $\langle n\rangle$. The parameter for comparison, \cs{cmp}, must be \cs{let}-equal to \item- \cs{cmpn}, for numerical comparison, \item- \cs{cmpw}, for word comparison, \item- \cs{cmpaw}, for word comparison obeying the ASCII ordering, or \item- a comparison macro of your own. \smallbreak \bluesubhead Output The sorted array \cs{1}, \cs{2}, \dots |\|, with \\ \cs{value1} $\le$ \cs{value2} $\le$ \dots $\le$ \cs{value}$\langle n\rangle$. \bluesubhead Source \thisverbatim{\unmc} \beginverbatim %Non-descending sorting \def\heapsort{%data in \1 to \n \r\n\heap\ic1 {\loop\ifnum1<\r\xch\ic\r \advance\r-1 \sift\ic\r \repeat}} % \def\heap{%Transform \1..\n into heap \lc\n\divide\lc2{}\advance\lc1 {\loop\ifnum1<\lc\advance\lc-1 \sift\lc\n\repeat}} % \def\sift#1#2{%#1, #2 counter variables \jj#1\uone#2\advance\uone1 \goontrue {\loop\jc\jj \advance\jj\jj \ifnum\jj<\uone \jjone\jj \advance\jjone1 \ifnum\jj<#2 \cmpval\jj\jjone \ifnum2=\status\jj\jjone\fi\fi \cmpval\jc\jj\ifnum2>\status\goonfalse\fi \else\goonfalse\fi \ifgoon\xch\jc\jj\repeat}} % \def\cmpval#1#2{%#1, #2 counter variables %Result: \status= 0, 1, 2 if %values pointed by % #1 =, >, < #2 \ea\let\ea\aone\csname\the#1\endcsname \ea\let\ea\atwo\csname\the#2\endcsname \cmp\aone\atwo} % \def\cmpn#1#2{%#1, #2 must expand into %numbers %Result: \status= 0, 1, 2 if % \val{#1} =, >, < \val{#2}. \ifnum#1=#2\global\status0 \else \ifnum#1>#2\global\status1 \else \global\status2 \fi\fi} % \def\xch#1#2{%#1, #2 counter variables \edef\aux{\csname\the#1\endcsname}\ea \xdef\csname\the#1\endcsname{\csname \the#2\endcsname}\ea \xdef\csname\the#2\endcsname{\aux}}. %with auxiliaries \newcount\n\newcount\lc\newcount\r \newcount\ic\newcount\uone \newcount\jc\newcount\jj\newcount\jjone \newif\ifgoon !endverbatim Explanation. \item{}\cs{heapsort} The values given in \cs{1},\dots|\|, are sorted in non-descending order. \item{}\cs{heap} The values given in \cs{1},\dots|\|, are rearranged into a heap. \item{}\cs{sift} The first element denoted by the first (counter) argument has disturbed the heap. Sift rearranges the part of the array denoted by its two arguments, such that the heap property holds again. \item{}\cs{cmpval} The values denoted by the counter values, supplied as arguments, are compared. \smallbreak \blueexample Numbers, words \cs{cmpn}, and \cs{cmpw} stand for compare numbers and words. \cs{prtn}, and \cs{prtw} stand for print numbers and words, and work the way you expect. \cs{accdef} takes care that accents are properly defined. \beginverbatim \def\1{314}\def\2{1}\def\3{27}\n3 \let\cmp\cmpn\heapsort \beginquote\prtn,\endquote % \def\1{ab}\def\2{c}\def\3{aa}\n3 \let\cmp\cmpaw\heapsort \beginquote\prtw,\endquote and \def\1{j\ij}\def\2{ge\"urm}\def\3{gar\c con} \def\4{\'el\`eve}\n4 \let\cmp\cmpw {\accdef\heapsort} \beginquote\prtw\endquote !endverbatim yields \def\1{314}\def\2{1}\def\3{27}\n=3 {\let\cmp\cmpn\heapsort \beginquote\prtn,\endquote % \def\1{ab}\def\2{c}\def\3{aa}\n=3 \let\cmp\cmpaw\heapsort \beginquote\prtw,\endquote and \def\1{j\ij}\def\2{ge\"urm}\def\3{gar\c con}\def\4{\'el\`eve}\n=4 \let\cmp=\cmpw{\accdef\heapsort} \beginquote\prtw.\endquote } \bluehead Quick sort The quick sort algorithm has been discussed in many places, The following code is borrowed from Bentley.\ftn{Programming Pearls, Addison-Wesley. It contains also diagrams which keep track of the invariants.} \beginpascal procedure QSort(low,up); if low=T*) for I:=low+1 to up do if X[I]|, \dots, |\|. \bluesubhead Input The values are stored in |\|, \dots, |\|, with $1\le low\le up\le n$. The parameter for comparison, \cs{cmp}, must be \cs{let}-equal to \item- \cs{cmpn}, for number comparison, \item- \cs{cmpw}, for word comparison, \item- \cs{cmpaw},for word comparison obeying the ASCII ordering, or \item- a comparison macro of your own. \smallbreak \bluesubhead Output The sorted array |\|, \dots, |\|, with \\ \cs{va}$\langle low\rangle \le \dots \le{}$ \cs{val}$\langle up\rangle$. \bluesubhead Source \thisverbatim{\unmc} \beginverbatim \def\quicksort{%Values given in %\low,...,\up are sorted, non-descending. %Parameters: \cmp, comparison. \ifnum\low<\up\else\brk\fi %\refval, a reference value selected %at random. \m\up\advance\m-\low%Size-1 of array part \ifnum10<\m\rnd\multiply\m\rndval \divide\m99 \advance\m\low \xch\low\m \fi \ea\let\ea\refval\csname\the\low\endcsname \m\low\k\low\let\refvalcop\refval {\loop\ifnum\k<\up\advance\k1 \ea\let\ea\oneqs\csname\the\k\endcsname \cmp\refval\oneqs\ifnum1=\status \global\advance\m1 \xch\m\k\fi \let\refval\refvalcop \repeat}\xch\low\m {\up\m\advance\up-1 \quicksort}% \low\m\advance\low1 \quicksort} % \def\brk#1\quicksort{\fi} !endverbatim Explanation. At each level the array is partitioned into two parts. After partitioning the left part contains values less than the reference value and the right part contains values greater than or equal to the reference value. Each part is again partitioned via a recursive call of the macro. The array is sorted when all parts are partitioned. In the \TeX{} coding the reference value as estimate for the mean value is determined via a random selection of one of the elements.\ftn {If the array is big enough. I chose rather arbitrarily \on10{} as threshold.} Reid's \cs{rnd} has been used. The random number is mapped into the range [$\,low:up\,$], via the linear transformation $\hbox{\cs{low}}+(\hbox{\cs{up}}-\hbox{\cs{low}})* \hbox{\cs{rndval}}/99$.\ftn {Note that the number is guaranteed within the range.} The termination of the recursion is coded in a \TeX{} peculiar way. First, I coded the infinite loop. Then I inserted the condition for termination with the \cs{fi} on the same line, and not enclosing the main part of the macro. On termination the invocation \cs{brk} gobbles up all the tokens at that level to the end, to its separator \cs{quicksort}, and inserts its replacement text, a new \cs{fi}, to compensate for the gobbled \cs{fi}. \bluesubhead Auxiliaries Sorting is parameterized by comparison and exchanging. Also needed is a random number generator. The latter is not supplied here. \thisverbatim{\unmc} \beginverbatim \def\cmpn#1#2{%#1, #2 must expand into %numbers %Result: \status= 0, 1, 2 if % \val{#1} =, >, < \val{#2}. \ifnum#1=#2\global\status0 \else \ifnum#1>#2\global\status1 \else \global\status2 \fi\fi} % \def\xch#1#2{%#1, #2 counter variables \edef\aux{\csname\the#1\endcsname}\ea \xdef\csname\the#1\endcsname{\csname \the#2\endcsname}\ea \xdef\csname\the#2\endcsname{\aux}} !endverbatim \bluesubhead Ordering The ordering is parameterized in the ordering table. \blueexample Numbers, words \cs{cmpn}, and \cs{cmpw} stand for compare numbers and words. \cs{prtn}, and \cs{prtw} stand for print numbers and words, and work the way you expect. \cs{accdef} takes care that accents are properly defined. \beginverbatim \def\1{314}\def\2{1}\def\3{27}\n3 \low1\up\n\let\cmp\cmpn \quicksort \beginquote\prtn,\endquote % \def\1{ab}\def\2{c}\def\3{aa} \def\4{\ij}\def\5{ik}\def\6{z}\def\7{a}\n7 \low1\up\n\let\cmp\cmpw \quicksort \beginquote\prtw,\endquote and \def\1{j\ij}\def\2{ge\"urm}\def\3{gar\c con} \def\4{\'el\`eve}\n4 \low1\up\n\let\cmp\cmpw {\accdef\quicksort} \beginquote\prtw.\endquote !endverbatim yields \def\1{314}\def\2{1}\def\3{27}\n3 {\low1\up\n\let\cmp\cmpn \quicksort \beginquote\prtn,\endquote % \def\1{ab}\def\2{c}\def\3{aa} \def\4{\ij}\def\5{ik}\def\6{z}\def\7{a}\n7 \low1\up\n\let\cmp\cmpw \quicksort \beginquote\prtw,\endquote and \def\1{j\ij}\def\2{ge\"urm}\def\3{gar\c con} \def\4{\'el\`eve}\n=4 \low=1\up=\n\let\cmp=\cmpw {\accdef\quicksort} \beginquote\prtw.\endquote } \bluehead Use I needed sorting within \TeX{} for indexing and for sorting address labels. \bluesubhead Sorting address labels Suppose we wish to sort addresses on the secondary key membership number. In order to do so the index must point to the name of the database entry and the name must point to its membership number, that is $$\vbox{\hbox{$1\,2\,\ldots \rightarrow$ |\|${}_x\,$ |\|$_y\,\ldots \rightarrow$ ||${}_x\,$ ||$_y\,\ldots$\hss}} $$ This can be coded as follows. \beginverbatim \loadindexmacros % \def\lst#1#2{\advance\k1 \ea\def\csname\the\k\endcsname{#1}% \ea\def\ea#1\gobbletono#2} \def\gobbletono#1\no{} \k0 \input toy.dat %The test database \n\k %number of items Membershipno unsorted: \1, \2, ... % \let\cmp\cmpn\sort Sorted on membershipno: \1, \2, ... !endverbatim The amazing thing is that we don't have to do much extra because the name will expand to the number, which will be used in the comparison. I used that \cs{no} was the last element of the database entry, but that is not essential. Each database entry consist of a triple \cs{lst}, |\|, and entry proper within braces. \bluesubsubhead Typesetting Now we have to redirect the pointer from the name away from the number to the complete entry, that is $$\vbox{\hbox{$1\,2\,\ldots \rightarrow$|\|$_1\,$|\|$_2\,\dots \rightarrow$|{entry}|$_1\,$|{entry}|$_2\,\ldots$\hss}} $$ This is done as follows. \beginverbatim \def\lst#1#2{\def#1{#2}} \input toy.dat \1 \2 \3 \4 \5 \6 !endverbatim \bluesubhead Sorting index entries One of the processes in preparing an index is sorting the Index Reminders, IRs. This is again a sorting process on secondary keys, even tertiary keys. Given the sorting macros we just have to code the special comparison macro in compliance with \cs{cmpw}: compare two `values' specified by \cs{def}s. Let us call this macro \cs{cmpir}.\ftn{Mnemonics: compare index reminders} Each value is composed of \item- a word (action: word comparison) \item- a digit (action: number comparison), and \item- a page number (action: (page) number comparison). \smallbreak The macros read as follows. \thisverbatim{\unmc\catcode`!=12 \catcode`*=0 } \beginverbatim \def\cmpir#1#2{%#1, #2 defs %Result: \status= 0, 1, 2 if % \val{#1} =, >, < \val{#2} \ea\ea\ea\decom\ea#1\ea;#2.} % \def\decom#1 !#2 #3;#4 !#5 #6.{% \def\one{#1}\def\four{#4}\cmpaw\one\four \ifnum0=\status%Compare second key \ifnum#2<#5\global\status2 \else \ifnum#2>#5\global\status1 \else %Compare third key \ifnum#3<#6\global\status2 \else\ifnum#3>#6\global\status1 \fi \fi \fi \fi \fi} *endverbatim Explanation. I needed a two-level approach. The values are decomposed into their components by providing them as arguments to \cs{decom}.\ftn {Mnemonics: decompose. In each comparison the defs are `dereferenced,' that is their replacement texts are passed over. This is a standard \TeX nique: a triad of \cs{ea}s, and the hop-overs to the second argument.} The macro picks up the components \item- the primary keys, the $\langle word\rangle$ \item- the secondary keys, the $\langle digit\rangle$, and \item- the tertiary keys, the $\langle page\,number\rangle$. \smallbreak It compares the primary keys, and if necessary successively the secondary and the tertiary keys. The word comparison is done via the already available macro \cs{cmpaw}. To let this work with \cs{sort}, we have to \cs{let}-equal the \cs{cmp} parameter to \cs{cmpir}. \bluehead Sorting in the mouth Alan Jeffrey and Bernd Raichle have provided macros for this. The following variant of the linear sorting given at the beginning of this note is inspired upon Bernd's `Quick Sort in the Mouth,' Euro\TeX~\on94. The idea is that a sequence is split in its smallest element and the rest by an invoke of \cs{fifo}. The rest is treated recursively as a similar sequence. Another example of (multiple) nested FIFO. \thisverbatim{\unmc} \beginverbatim \def\fifo#1%accumulated rest #2%smallest #3%next {\ifx\ofif#3 #2\ofif{#1}\fi \ifnum#3<#2 \p{\fifo{#1{#2}}{#3}}\else \q{\fifo{#1{#3}}{#2}}\fi} %repeat or terminate \def\ofif#1\fi#2\fi{\fi \if*#1*\endsort\fi \fifo{}#1\ofif} %auxiliaries \def\p#1\else#2\fi{\fi#1} \def\q#1\fi{\fi#1} %terminator \def\endsort#1\ofif{\fi} %test \fifo{}3{123}8{1943}\ofif !endverbatim To assure yourself that it is all done in the mouth \cs{write} the test.\ftn {I don't know how to ensure correctness. It is tricky to get the braces right. I used \cs{tracingmacros=1}.} However, in sorting within \TeX{} I prefer a uniform approach not in the least parameterized over the ordering table. Have fun, and all the best \makesignature \pasteuptoc \endscript