forked from swiftlang/swift
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmonoids.tex
1140 lines (1045 loc) · 101 KB
/
monoids.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[../generics]{subfiles}
\begin{document}
\chapter{Monoids}\label{monoids}
\lettrine{M}{onoids are algebraic structures} that most programmers encounter at some point without even realizing. We begin by describing finitely-presented monoids, and then show that they embed into Swift generics. Next, we encounter the so-called ``word problem'' and take a brief detour into computability theory. This motivates the concept of a rewrite system, which restricts our domain, but provides a computable algorithm to solve the word problem. The theory of rewrite systems will eventually lead to the implementation of generic signature queries and requirement minimization.
But first, all the math. Abstract algebra is the area of mathematics concerned with various generalizations of the familiar concepts of addition and multiplication of numbers. To study these generalizations, we have four key tools at our disposal:
\begin{itemize}
\item We can introduce an abstraction, defining some algebraic operations, and axioms obeyed by those operations.
\item We can prove that some \emph{specific} object models this abstraction, hopefully showing that our abstraction is not completely vacuous in the process.
\item We can prove theorems from our operations and axioms, whose consequences give us results about \emph{all} objects which model this abstraction.
\item We can further classify our objects by introducing more specific abstractions which add additional axioms, drawing a distinction between those objects which satisfy the new axioms and those that do not. These refined abstractions describe more structure, allowing us to prove more theorems, but this in turn narrows the domain of objects we can abstract over.
\end{itemize}
Now, compare with what we do as Swift programmers:
\begin{itemize}
\item We can define protocols which model various concepts in our problem domain.
\item We can declare that a specific concrete type conforms to the protocol, providing an implementation of each protocol requirement.
\item We can reuse code across all conforming types by implementing generic algorithms on top of our protocol.
\item We can classify our conforming types by using protocol inheritance to introduce additional requirements which are only satisfied by some concrete types.
\end{itemize}
To make these matters concrete, take the standard library \texttt{Collection} protocol. We can conform a hypothetical \texttt{LinkedList} type to this protocol. Then, we can write a \texttt{randomElement()} generic function, and use it with both the standard library \texttt{Array} and our own \texttt{LinkedList}. Finally, we can consider the \texttt{RandomAccessCollection} protocol, which adds richer indexing operations on top of those provided by \texttt{Collection}. An implementation of \texttt{randomElement()} generic over \texttt{RandomAccessCollection} might run in constant time, instead of linear time. However, our strawman \texttt{LinkedList} cannot correctly implement this protocol, so our new algorithm is limited to \texttt{Array} and similar data types.
Of course in the real world Swift programmers do more than abstract over collection types, and similarly, our description of a mathematician's job is overly simplistic. Still, this view of the world will serve us well for now. Let's define the abstraction we will be working with, and look at a specific example of this abstraction. This definition is standard, and can be found in many texts, such as \cite{semigroup}.
\begin{definition}
\IndexDefinition{monoid}%
\index{binary operation}%
\IndexDefinition{identity element}%
\IndexDefinition{associative operation}%
\index{set}%
A \emph{monoid} \index{$\cdot$}\index{$\cdot$!z@\igobble|seealso{monoid}}$(M,\, \cdot,\, \varepsilon)$ is a set of elements $M$ together with a binary operation $\cdot$ and an element $\varepsilon\in M$, which together satisfy the following three axioms:
\begin{itemize}
\item The set $M$ is \emph{closed} under the binary operation: for all $x, y \in M$, $x\cdot y\in M$.
\item The binary operation is \emph{associative}: for all $x, y, z \in M$, $x\cdot(y\cdot z)=(x\cdot y)\cdot z$.
\item The fixed element $\varepsilon$ is the \emph{identity}: for all $x\in M$, $x\cdot \varepsilon=\varepsilon\cdot x=x$.
\end{itemize}
We use two convenient bits of shorthand. When the binary operation and identity are implicit from context, we can write $M$ instead of $(M,\,\cdot,\,\varepsilon)$. We also sometimes omit the symbol ``$\cdot$'' in equations, and simply write $xy$ instead of $x\cdot y$.
\end{definition}
\begin{example}
\index{natural numbers}%
Let's take our set of elements to be the natural numbers $\mathbb{N}$, the operation to be addition, and the identity element to be zero. Elementary properties of arithmetic show this satisfies the axioms of a monoid:
\begin{itemize}
\item The sum of two elements of $\mathbb{N}$ is again an element of $\mathbb{N}$.
\item The operation is associative: for all $x,y,z\in\mathbb{N}$, $(x+y)+z=x+(y+z)$.
\item We have an identity element: for all $x\in\mathbb{N}$, $x+0=0+x=x$.
\end{itemize}
Another rule of arithmetic states that each non-zero element of $\mathbb{N}$ can be expressed as a sum of $1$'s, for example $3=1+1+1$, $5=1+1+1+1+1$, and so on. Let's write $a$ instead of $1$, $\varepsilon$ instead of $0$, and $\cdot$ instead of $+$. Now, the equation $3+5=8$ becomes:
\[aaa\cdot aaaaa=aaaaaaaa\]
This notation is more cumbersome, but hold tight; it is more consistent with what follows next. Let's denote this monoid by $\{a\}^*$. We will see shortly this is the \emph{free monoid on one generator}.
\end{example}
The axioms of a monoid are very general, and countless other mathematical objects can be equipped with the structure of a monoid. On the other hand, since these axioms are so general, there isn't a whole lot we can say that is true of \emph{all} monoids. One thing we can do is show that the \index{natural numbers}natural numbers ``act'' on any monoid in a straightforward way.
\begin{definition}
Let $x\in M$ be an element of a monoid $M$, and let $n\in\mathbb{N}$. We define the ``$n$th power'' of $x$, denoted as $x^n$, as follows:
\[
x^n = \begin{cases}
\varepsilon&\text{(if $n=0$)}\\
x&\text{(if $n=1$)}\\
x^{n-1}\cdot x&\text{(if $n>1$)}
\end{cases}
\]
\end{definition}
(The $n=1$ case is actually redundant with the other two, but is included for clarity.) With this notation, we can state a result which holds in all monoids.
\begin{proposition}\label{monoid exp law}
Let $x\in M$ be an element of a monoid $M$, and let $m,n\in\mathbb{N}$. Then the above \index{action}action of $\mathbb{N}$ on $M$ satisfies something akin to ``the law of exponents'':
\[x^m\cdot x^n=x^{m+n}\]
\end{proposition}
\begin{proof}
\index{induction}
\index{proof by induction|see{induction}}
We show this to be true using induction, a fundamental property of the arithmetic of $\mathbb{N}$ we encountered in Section~\ref{generic signature validity}. First, we prove the base case, that $x^m\cdot x^0=x^{m+0}$. Then, we assume the inductive hypothesis, $x^m\cdot x^{n-1}=x^{m+n-1}$, and show that $x^m\cdot x^n=x^{m+n}$. By induction over $\mathbb{N}$, this proves the result for all $n\in\mathbb{N}$.
The base case follows from the definition of $x^0$, the identity element axiom of $M$, and the fact that $m=m+0$ in $\mathbb{N}$:
\[x^m\cdot x^0=x^m\cdot\varepsilon=x^m=x^{m+0}\]
Now, for the inductive step, we assume that the inductive hypothesis holds:
\[x^m\cdot x^{n-1}=x^{m+n-1}\]
We multiply both sides by $x$:
\[(x^m\cdot x^{n-1})\cdot x =x^{m+n-1}\cdot x\]
Using the fact that the binary operation $\cdot$ of $M$ is associative together with the definition of $x^n$, the left-hand side is equal to:
\[
(x^m\cdot x^{n-1})\cdot x = x^m\cdot (x^{n-1}\cdot x) = x^m\cdot x^{n-1+1}=x^m\cdot x^n
\]
Finally, using the definition of $x^n$, the right-hand side is equal to:
\[
x^{m+n-1}\cdot x = x^{m+n-1+1}=x^{m+n}
\]
Therefore the inductive hypothesis implies that $x^m\cdot x^n=x^{m+n}$. Since the base case also holds, this concludes our proof.
\end{proof}
As this result holds in all monoids, we know it holds true in $\{a\}^*$ as well. Also, $\{a\}^*$ has the very special property that every element can be written as $a^n$ for some $n\in\mathbb{N}$. So given two arbitrary elements $a^m$, $a^n\in \{a\}^*$, our proposition shows that $a^m\cdot a^n=a^{m+n}$, which is what we expect since this monoid models the \index{natural numbers}natural numbers under addition. This also shows that the monoid operation on $\{a\}^*$ is \index{commutative operation}\emph{commutative}, that is, $x\cdot y=y\cdot x$ for all $x$, $y\in\{a\}^*$.
\medskip
Now we've defined our algebraic structure, shown a specific example of this structure, and proved a result that holds in all such structures. Recall that the next step in our little program is gaining a deeper understanding of an algebraic structure by introducing further classification.
A typical abstract algebra course spends little time on monoids; one quickly moves on to \emph{groups}, which add the axiom that for each element $x$, there is an inverse element $x^{-1}$ such that $x\cdot x^{-1}=\varepsilon$. There is a lot more we can say about groups than monoids, all because of this one additional axiom. The natural numbers under addition are not a group, but if we instead consider the set of all integers $\mathbb{Z}$, positive and negative, then we see that for each $n\in\mathbb{Z}$, $n+(-n)=(-n)+n=0$. Thus, $\mathbb{Z}$ is a group under addition.
\index{group}
\index{ring}
\index{field}
\index{integers}
\index{rational numbers}
\index{complex numbers}
\index{quaternions}
\index{matrices}
\index{polynomials}
After groups, the next level of structure one might build is the \emph{ring}. A ring is a set $R$ with \emph{two} binary operations, denoted $+$ and $\cdot$, and two special elements, $0$~and~$1$. The ring axioms state that $(R,\,+,\,0)$ is a group, $(R,\,\cdot,\,1)$ is a monoid, and $+$ and $\cdot$ are related via the \emph{distributive law}: $a(b+c)=ab+ac$, $(a+b)c=ac+bc$. With the usual addition and multiplication operations, $\mathbb{Z}$ becomes a ring. If we further require multiplication to be commutative, and that every non-zero element has a multiplicative inverse, we get to the study of \emph{fields}. The rational numbers $\mathbb{Q}$ are an example of a field. This tower of abstractions describes complex numbers, quaternions, matrices, polynomials, modular arithmetic, and other ``number-like'' things which commonly arise in various applications of mathematics.
We're going in a rather different direction. Our domain of discourse is Swift generic signatures and the type parameters they generate, so we don't need the theory of groups, rings, or fields. Instead, in what follows, we're going to concern ourselves with \emph{free monoids}, \emph{finitely-presented monoids}, and \emph{rewrite systems}. We will use these abstractions to build an algebra that models the behavior of Swift generic signatures.
\begin{definition}
\IndexDefinition{free monoid}%
\IndexDefinition{term}%
\IndexDefinition{generating set}%
\index{alphabet!z@\igobble|seealso{generating set}}%
Given some \index{set}set $A$, the \emph{free monoid} over $A$, denoted $A^*$, is the set of all ordered sequences of elements from $A$. The binary operation concatenates two sequences together, and the identity element $\varepsilon$ is the empty sequence. The set $A$ is called the \emph{alphabet} or \emph{generating set} of $A^*$; the elements of $A^*$ are called \emph{terms}. The set $A$ may be finite or infinite in general, but for our purposes, we take it to be finite.
\end{definition}
\begin{example}
\IndexDefinition{trivial monoid}%
\index{monoid isomorphism}%
\index{empty set}%
The free monoid over the empty set $\varnothing$ contains just the identity element and nothing else. To see why, notice that the only way to form an ordered sequence whose elements are drawn from the empty set is if the sequence itself is empty.
\end{example}
\begin{example}
\index{commutative operation}%
The free monoid over two elements $\{a,b\}^*$ consists of all finite strings made up of $a$ and $b$. Two typical elements of $\{a,b\}^*$ are $abba$ and $ba$, and $abba\cdot ba=abbaba$. Note that $abba\cdot ba\neq ba\cdot abba$, so unlike $\{a\}^*$, the free monoid over two elements is no longer commutative: in general, $x\cdot y \neq y \cdot x$. However, Proposition~\ref{monoid exp law} still holds; for example, if we let $x:=abb$, a quick computation shows that $x^2\cdot x^3=x^3\cdot x^2=x^5$.
\end{example}
\section{Finitely-Presented Monoids}\label{finitely presented monoids}
The free monoids are, in a sense, the ``most general'' objects which satisfy the monoid axioms.\footnote{The idea of a free object can actually be formalized using category theory.} In a free monoid $A^*$, the only propositions we can prove are those that follow directly from the monoid axioms, together with the fact that every term has a \emph{unique} representation as a sequence of elements from the generating set $A$. If the generating set is finite, the terms of a free monoid have an appealing interpretation as finite strings drawn from a finite alphabet.
\index{finitely-presented monoid}Finitely-presented monoids start with a \index{set}\index{generating set}finite generating set and also define the monoid operation to be concatenation of terms, but they have some additional structure in the form of \index{relation}\emph{relations}. We use the following notation to describe a finitely-presented monoid:
\[\langle \underbrace{a_1,\,\ldots,\, a_m}_{\text{generators}};\, \underbrace{u_1 \sim v_1,\,\ldots,\, u_n \sim v_n}_{\text{relations}}\rangle\]
Each relation is an \index{ordered pair}ordered pair of terms $(u,\,v)$ with the following interpretation. Suppose we have a term containing $u$ as a subterm; we can write such a term as $xuy$. The relation allows us to ``replace'' $u$ with $v$ to get the term $xvy$. We then say that $xuy$ is equivalent to $xvy$, denoted $xuy\sim xvy$. Either one of $x$ or $y$ can also be the \index{empty term}empty term $\varepsilon$, in which case we get equivalences like $u\sim v$, $xu\sim xv$, or $uy\sim vy$. Relations are symmetric, so we can also go backwards, replacing a subterm $v$ with $u$. We're going to formalize this in the next section, but this is good enough for now.
A few words on notation:
\begin{itemize}
\item If we have two finite sets $A := \{a_1,\ldots, a_m\}$ and $R := \{(u_1,v_1),\,\ldots,\,(u_n,v_n)\}$ where $u_i$, $v_i\in A^*$, we can also write the monoid as $\AR$.
\item The relations here are not to be confused with the concept of a relation over a set from Section~\ref{typeparams}. Indeed, the full set of relations $R$, being a finite subset of $A^*\times A^*$, can be thought of as a relation over $A^*$, but we won't use this interpretation here and leave the two concepts wholly separate.
\item We only use $x=y$ to mean that $x$ and $y$ are equal in $A^*$; that is, they have the same exact spelling as terms. To denote equivalence with respect to the set of relations $R$, we always write $x\sim y$.
\end{itemize}
Every free monoid with a finite generating set can be seen as a finitely-presented monoid if we take the set of relations to be empty, in which case $x\sim y$ if and only if $x=y$. Also, any relation where both sides are equal, like $(u,\,u)$, doesn't give us anything useful and can be discarded. Now, let's look at more interesting examples of finitely-presented monoids which actually have non-tivial relations.
\begin{example}
Consider the finite set $\{0,1,2,3,4\}$ and the binary operation given by the following table. This is the remainder after division by 5 of the sum of two numbers; or alternatively $m+n \pmod 5$:
\begin{quote}
\begin{tabular}{c|c|c|c|c|c}
$\oplus$&0&1&2&3&4\\
\hline
0&0&1&2&3&4\\
\hline
1&1&2&3&4&0\\
\hline
2&2&3&4&0&1\\
\hline
3&3&4&0&1&2\\
\hline
4&4&0&1&2&3
\end{tabular}
\end{quote}
Notice that 0 is the identity element in the above table. Once again, every non-zero element can be written as a sum of 1's, so we're going to write $a$ instead of $1$, $\varepsilon$ instead of $0$, and $\cdot$ instead of $\oplus$. Let's call this monoid $M$ for now.
\index{modular arithmetic}
Compare this monoid $M$ with the free monoid $\{a\}^*$. In the free monoid, $a^3\cdot a^4=a^7$, so in $M$, $a^3\cdot a^4\sim a^7$. On the other hand, our table above tells us that $3\oplus 4=2$. So it is also true that $a^3\cdot a^4\sim a^2$. Thus, $a^2\sim a^7$. Indeed, more generally, we have
\[a^m\sim a^n\qquad\text{if $m\equiv n \pmod 5$}\]
It so happens that all the equivalences that are true in $M$ are a consequence of the single relation $a^5\sim \varepsilon$. For example, to see that $a^2\sim a^7$, note that $a^7=a^2\cdot a^5$; replacing $a^5$ with $\varepsilon$ gives us $a^2\cdot\varepsilon=a^2$. Thus, $M$ is a finitely-presented monoid with a single relation:
\[M := \langle a;\,a^5\sim\varepsilon\rangle\]
We can say a couple more things about $M$. A visual inspection of the table shows that every element has an inverse, making this monoid into a group. For example, $a^2\cdot a^3=a^5$, but $a^5\sim\varepsilon$, so the inverse of $a^2$ is $a^3$ (and vice versa). Finally, the binary operation is \index{commutative operation}commutative. A group with a commutative operation is said to be \emph{Abelian}. So $M$ is a finite \index{group}\index{Abelian group}\index{Abelian group!z @\igobble|seealso{commutative operation}}Abelian group.
\end{example}
\iffalse
\begin{example}
Take a regular hexagon centered on the origin, and consider the linear transformations which map the outline of the hexagon back to itself. These are called the \emph{symmetries} of the hexagon. Each symmetry defines a permutation of the sides of the hexagon, and the composition of two symmetries is another symmetry. Two examples of such symmetries are rotation clockwise by $60^{\circ}$, and a reflection across the $x$ axis. In fact, every symmetry of the hexagon can be expressed as a composition of these two. Let's call the rotation $s$ and the reflection $t$, these two fundamental symmetries have the following properties:
\begin{itemize}
\item If we perform six rotations in a row, we end up where we started.
\end{example}
\fi
\begin{example}\label{bicyclic}
Consider strings made up of only the two characters ``\texttt{(}'' and ``\texttt{)}''. Then given two such strings where all the parentheses are balanced, such as ``\texttt{(()(()))}'' and ``\texttt{()((()))}'', we can transform one into another by repeatedly inserting or deleting two adjacent characters ``\texttt{()}''. An unbalanced string, such as ``\texttt{)(}'' cannot be transformed into a balanced string or vice versa. However, two unbalanced strings can still be equivalent if they have the same number of mismatched parentheses, for example ``\texttt{)(()(}'' and ``\texttt{)()((}''. Now, write $a$ instead of ``\texttt{(}'', $b$ instead of ``\texttt{)}'', and $\varepsilon$ instead of the empty string. This is the \IndexDefinition{bicyclic monoid}\emph{bicyclic monoid}:
\[\langle a, b;\; ab\sim\varepsilon\rangle\]
Let's again refer to this monoid as $M$, and also let $A:=\{a,b\}$ be our alphabet. As in the free monoid $A^*$, all elements of $M$ are finite sequences of $a$ and $b$, but we also have this relation $ab\sim\varepsilon$. Consider the term $baaba$, which contains $ab$ as a subterm. Since $ab\sim\varepsilon$, we have the following equivalence:
\[baaba=ba\cdot ab\cdot a\sim ba\cdot\varepsilon\cdot a=b\cdot aa = baa\]
We've ``deleted'' the occurrence of $ab$ in the middle without changing the ``meaning'' of the term. We can insert $ab$ anywhere in the middle of a term as well:
\[baa=b\cdot aa=b\cdot\varepsilon\cdot aa\sim b\cdot ab\cdot aa=babaa\]
This shows that the terms $baaba$, $baa$ and $babaa$ are all equivalent as elements of $M$. We will encounter the bicyclic monoid again in subsequent sections. More discussion of its algebraic properties can be found in \cite{semigroup}.
\end{example}
\begin{example}
Not every ``elementary'' monoid is finitely presented, one example being the \index{natural numbers}natural numbers under multiplication. This follows from four deep properties of the natural numbers:
\begin{itemize}
\item Every non-zero \index{prime number}natural number has a unique representation as a product of prime numbers (\emph{The Fundamental Theorem of Arithmetic}).
\item There are infinitely many prime numbers (\emph{Euclid's Theorem}).
\item Multiplication of natural numbers is \index{commutative operation}commutative.
\item Multiplication by zero always yields zero.
\end{itemize}
These give us an \emph{infinite} presentation of $(\mathbb{N},\,\times,\,1)$: we need a generator $z$ representing zero, and an infinite set of generators $\{p_i\}_{i\geq 0}$ to represent the prime numbers. Then, we add the relation $z\cdot z\sim z$ to encode that zero multiplied by zero is zero, followed by two infinite sets of relations relating the primes with each other and zero:
\begin{itemize}
\item For all $i$, $j\in\mathbb{N}$, $p_i\cdot p_j\sim p_j\cdot p_i$ makes multiplication commutative.
\item For all $i\in\mathbb{N}$, $z\cdot p_i\sim z$ and $p_i\cdot z\sim z$ makes $z$ into the \index{zero element}zero element.
\end{itemize}
In fact, no ``smaller'' presentation exists. It is perhaps surprising to constrast this with $(\mathbb{N},\,+,\,0)$, which just the free monoid with one generator.
\end{example}
\section{The Rewrite Graph}\label{rewrite graph}
Having informally sketched out finitely-presented monoids, we now wish to make precise what is meant when we say that a relation like $ab\sim\varepsilon$ allows us to transform $baaba$ into $baa$ and then $baa$ into $babaa$. We follow \cite{SQUIER1994271} in associating a directed graph with the description of a finitely-presented monoid $\AR$. While the aforesaid paper calls this graph the \index{derivation graph|see{rewrite graph}}derivation graph, we prefer to call it the \emph{rewrite graph}, to avoid confusion with derivations in the sense of Section~\ref{derived req}.
The \IndexDefinition{rewrite graph}rewrite graph explicitly encodes transformations of terms as paths in this graph. The vertices are the terms of the free monoid $A^*$, and the edge set in defined in such a way that two vertices $x$ and $y$ are joined by a path if $x\sim y$ under the set of relations $R$. This provides a more constructive definition than the classical approach of starting from the intersection of all equivalence relations that contain $R$. The rewrite graph also forms the theoretical basis of the rewrite system minimization algorithm in Chapter~\ref{rqm minimization}. We begin by defining the structure of the edge set.
\begin{definition}
Given an alphabet $A$ and a set of relations $R$, a \IndexDefinition{rewrite step}\emph{rewrite step} is an \index{ordered tuple}ordered 4-tuple of terms such that $x$, $y\in A^*$ and either $(u,\,v)$ or $(v,\,u)\in R$. We write this rewrite step as \index{$\Rightarrow$}\index{$\Rightarrow$!z@\igobble|seealso{rewrite path, rewrite step}}$x(u\Rightarrow v)y$. The terms $x$ and $y$ are called the left and right \emph{whiskers}, respectively. If $x=\varepsilon$, we write the rewrite step as $(u\Rightarrow v)y$, and if $y=\varepsilon$, we similarly write it as $x(u\Rightarrow v)$. Of course, it may be the case that $x=y=\varepsilon$, in which case the rewrite step is simply denoted $(u\Rightarrow v)$.
\end{definition}
Intuitively, the rewrite step $x(u\Rightarrow v)y$ represents the transformation of $xuy$ into $xvy$ via the relation $u\sim v$; it leaves the whiskers $x$ and $y$ unchanged, while rewriting $u$ to $v$ ``in the middle.'' We can draw a picture:
\[
\begin{array}{|c|c|c|}
\hline
\multicolumn{3}{|c|}{xuy}\\
\hline
\hline
&u&\\
x&\Downarrow&y\\
&v&\\
\hline
\hline
\multicolumn{3}{|c|}{xvy}\\
\hline
\end{array}
\]
Note that unlike relations, rewrite steps encode the \emph{direction} of transformation.
\begin{definition}
Given an alphabet $A$ and a set of relations $R$, the rewrite graph of $A$ and $R$ has as vertices the terms of $A^*$, and as edges the collection of all possible rewrite steps with respect to $A$ and $R$; that is, for each relation $(u,\,v)\in R$, and every pair of terms $x$, $y\in A^*$, the edge set contains a complementary pair of rewrite steps:
\begin{itemize}
\item A rewrite step $x(u\Rightarrow v)y$, representing the transformation of $xuy$ to $xvy$.
\item A rewrite step $x(v\Rightarrow u)y$, representing the transformation of $xvy$ to $xuy$.
\end{itemize}
\end{definition}
For rewrite steps to function as edges in a directed graph, we also need a pair of mappings associating a source and destination vertex with each rewrite step.
\begin{definition}
If $s$ is the rewrite path $x(u\Rightarrow v)y$, we define the \emph{source} and \emph{destination} of $s$ as the following two terms of $A^*$:
\begin{gather*}
\Src(s) := xuy\\
\Dst(s) := xvy
\end{gather*}
We also have a more concise pictorial notation for visualizing a rewrite step as an edge between two vertices:
\begin{quote}
\begin{tikzcd}[column sep=huge]
xuy\arrow[r, Rightarrow, "x(u\Rightarrow v)y"] &xvy
\end{tikzcd}
\end{quote}
\end{definition}
\begin{example}
Returning to the bicyclic monoid, when we transformed $baaba$ into $baa$, we performed this rewrite step:
\begin{quote}
\begin{tikzcd}[column sep=huge]
baaba\arrow[r, Rightarrow, "ba(ab\Rightarrow \varepsilon)a"] & baa
\end{tikzcd}
\end{quote}
Similarly, to transform $baa$ into $babaa$, we performed this rewrite step:
\begin{quote}
\begin{tikzcd}[column sep=huge]
baa\arrow[r, Rightarrow, "b(\varepsilon \Rightarrow ab)aa"] & babaa
\end{tikzcd}
\end{quote}
\end{example}
\begin{definition}
\IndexDefinition{inverse rewrite step}
The \emph{inverse} of a rewrite step $s:=x(u\Rightarrow v)y$, denoted $s^{-1}$, is defined as $x(v\Rightarrow u)y$; that is, we swap the $u$ and the $v$. Note that $\Src(s^{-1})=\Dst(s)$, and likewise $\Dst(s^{-1})=\Src(s)$. We can draw a diagram showing a rewrite step and its inverse:
\begin{quote}
\begin{tikzcd}[column sep=huge]
xuy\arrow[r, Rightarrow, "x(u\Rightarrow v)y", bend left] \arrow[r, Leftarrow, "x(v\Rightarrow u)y"', bend right] & xvy
\end{tikzcd}
\end{quote}
\end{definition}
\begin{definition}
\IndexDefinition{rewrite step whiskering}
If $s:=x(u\Rightarrow v)y$ and $z\in A^*$, we define the left and right \emph{whiskering} action of $z$ on $s$:\index{$\star$}\index{$\star$!z@\igobble|seealso{rewrite path whiskering}}
\begin{gather*}
z\star s := zx(u\Rightarrow v)y\\
s\star z := x(u\Rightarrow v)yz
\end{gather*}
Whiskering produces a rewrite step which leaves a longer left or right whisker unchanged:
\begin{quote}
\begin{tikzcd}
zxuy\arrow[r, Rightarrow, "z\star s"]& zxvy\\
xuy\arrow[r, Rightarrow, "s"]& xvy\\
xuyz\arrow[r, Rightarrow, "s\star z"]& xvyz
\end{tikzcd}
\end{quote}
A computation shows whiskering is related to the monoid operation of $A^*$ in two ways. First, we have:
\begin{gather*}
\Src(z\star s)=z\cdot \Src(s)\\
\Dst(z\star s)=z\cdot \Dst(s)\\[\medskipamount]
\Src(s\star z)=\Src(s)\cdot z\\
\Dst(s\star z)=\Dst(s)\cdot z
\end{gather*}
But also:
\begin{gather*}
z\star (z^\prime \star s)=(z\cdot z^\prime) \star s\\
(s\star z)\star z^\prime=s\star (z\cdot z^\prime)
\end{gather*}
Finally, by expanding both sides of the below equality and recalling that the monoid operation is associative, we see that left and right whiskering actions are compatible:
\[(z\star s)\star z^\prime = z\star(s\star z^\prime)\]
\end{definition}
These operations have an important property, which follows immediately from the definition of the edge set.
\begin{proposition}\label{edge set closed}
Given an alphabet $A$ and set of relations $R$, let $S$ be the edge set of the rewrite graph of $A$ and $R$. Then, if $s\in S$ and $z\in A^*$, we also have $s^{-1}$, $z\star s$ and $s\star z\in S$. That is, the edge set is closed under inverses and whiskering.
\end{proposition}
\begin{example}
In the bicyclic monoid, inverting the rewrite step $ba(ab\Rightarrow\varepsilon)a$ gives us $ba(\varepsilon\Rightarrow ab)a$. The original rewrite step defined a transformation from the term $baaba$ to $baa$; its inverse is a transformation from $baa$ to $baaba$. If we whisker the rewrite step $ba(ab\Rightarrow\varepsilon)a$ on the left by $b$, we get a transformation from $bbaaba$ to $bbaa$:
\[b\star ba(ab\Rightarrow\varepsilon)a=bba(ab\Rightarrow\varepsilon)a\]
Now, we are ready to introduce rewrite paths, which are paths in the rewrite graph.
\end{example}
\begin{definition}
\IndexDefinition{rewrite path}%
A \emph{rewrite path} is an ordered pair consisting of an initial term $t\in A^*$ followed by a sequence of zero or more rewrite steps $\{s_i\}_{0\leq i<n}$ which satisfy the following conditions:
\begin{enumerate}
\item If the rewrite path contains at least one rewrite step, the source of the first rewrite step must equal the initial term: $\Src(s_1)=t$.
\item If the rewrite path contains at least two steps, the source of each subsequent step must equal the destination of the preceding step: $\Src(s_{i+1})=\Dst(s_i)$ for $0<i\leq n$.
\end{enumerate}
Notice how this definition is actually identical to the definition of a path in a \index{directed graph}directed graph from Section~\ref{type parameter graph}. Like rewrite steps, rewrite paths have a source and destination:
\begin{gather*}
\Src(p):=t\\
\Dst(p):=\begin{cases}
\Dst(s_n)&\text{(if $n>0$)}\\
t&\text{(if $n=0$)}
\end{cases}
\end{gather*}
If the sequence of rewrite steps in a rewrite path is empty, we say it is an \index{$1_t$|see{empty rewrite path}}\IndexDefinition{empty rewrite path}empty rewrite path. Each term $t\in A^*$ has a unique empty rewrite path, denoted $1_t$, which starts and ends at this term; thus $\Src(1_t)=\Dst(1_t)=t$.
If a rewrite path $p$ is non-empty, we can denote it as the composition of a series of rewrite steps: $p=s_1\circ\cdots\circ s_n$. We can also visualize a non-empty rewrite path like so:
\begin{quote}
\begin{tikzcd}[column sep=large]
\Src(p)\arrow[r, "s_1", Rightarrow]&\cdots\arrow[r, Rightarrow, "\cdots"]&\cdots\arrow[r, "s_n", Rightarrow]&\Dst(p)
\end{tikzcd}
\end{quote}
\end{definition}
\begin{example}
In the bicyclic monoid, the transformation from $baaba$ to $babaa$ can be expressed as a rewrite path composed of two rewrite steps, with source $baaba$ and destination $babaa$:
\[ba(ab\Rightarrow\varepsilon)a\circ b(\varepsilon\Rightarrow ab)aa\]
Note that our rewrite path is well-formed; the destination of the first rewrite step, $baa$, is also the source of the second rewrite step. The diagram makes this clear:
\begin{quote}
\begin{tikzcd}[column sep=huge]
baaba\arrow[r, "ba(ab\Rightarrow\varepsilon)a", Rightarrow]&baa\arrow[r, "b(\varepsilon\Rightarrow ab)aa", Rightarrow]&babaa
\end{tikzcd}
\end{quote}
\end{example}
\begin{definition}
We can \IndexDefinition{inverse rewrite path}invert a rewrite path to get a new rewrite path which performs the same transformation backwards. The \index{empty rewrite path}empty rewrite path is its own inverse. For a non-empty rewrite path, we invert each rewrite step and perform them in reverse order; the initial term of the inverse is the destination of the original rewrite path:
\[p^{-1}:=\begin{cases}
1_t&\text{(if $p=1_t$)}\\
s_n^{-1}\circ\cdots\circ s_1^{-1}&\text{(otherwise)}
\end{cases}
\]
The above implies that the source and destination of the inverse rewrite path is the destination and source of the original rewrite path:
\begin{gather*}
\Src(p^{-1})=\Dst(p)\\
\Dst(p^{-1})=\Src(p)
\end{gather*}
When drawn as a diagram, the inverse path $p^{-1}$ is $p$ but ``backwards in time'':
\begin{quote}
\begin{tikzcd}[column sep=large]
\Src(p^{-1})\arrow[r, "s_n^{-1}", Rightarrow]&
\cdots\arrow[r, "\cdots", Rightarrow]&
\cdots\arrow[r, "s_1^{-1}", Rightarrow]&
\Dst(p^{-1})
\end{tikzcd}
\end{quote}
\end{definition}
\begin{example}
In the bicyclic monoid, we can invert our rewrite path to get a transformation from $babaa$ to $baaba$:
\[b(ab\Rightarrow\varepsilon)aa\circ ba(\varepsilon\Rightarrow ab)a\]
Or as a diagram:
\begin{quote}
\begin{tikzcd}[column sep=huge,row sep=large]
babaa\arrow[r, "b(ab\Rightarrow \varepsilon)aa", Rightarrow]&baa\arrow[r, Rightarrow, "ba(\varepsilon\Rightarrow ab)a"]&baaba
\end{tikzcd}
\end{quote}
\end{example}
\begin{definition}
The left and right \IndexDefinition{rewrite path whiskering}whiskering actions generalize to rewrite paths by applying the monoid operation on the left or right of the initial term, and then whiskering each rewrite step; note that this preserves the compatibility condition from the definition of a rewrite path. If the path is empty, whiskering produces another empty rewrite path on a new term:
\begin{gather*}
z\star (1_t) = 1_{z\cdot t}\\
(1_t)\star z = 1_{t\cdot z}
\end{gather*}
If the path is non-empty, we can understand whiskering as a sort of distributive law:
\begin{gather*}
z\star (s_1\circ\cdots\circ s_n) = (z\star s_1)\circ\cdots\circ (z\star s_n)\\
(s_1\circ\cdots\circ s_n)\star z = (s_1\star z)\circ\cdots\circ (s_n\star z)
\end{gather*}
In the form of a diagram, we have $z\star p$:
\begin{quote}
\begin{tikzcd}[column sep=large]
z\cdot\Src(p)\arrow[r, "z\star s_1", Rightarrow]&z\cdots\arrow[r, Rightarrow]&z\cdots\arrow[r, "z\star s_n", Rightarrow]&z\cdot \Dst(p)
\end{tikzcd}
\end{quote}
Similarly, $p\star z$:
\begin{quote}
\begin{tikzcd}[column sep=large]
\Src(p)\cdot z\arrow[r, "s_1\star z", Rightarrow]&\cdots z\arrow[r, Rightarrow]&\cdots z\arrow[r, "s_n\star z", Rightarrow]&\Dst(p)\cdot z
\end{tikzcd}
\end{quote}
As with rewrite steps, the left and right whiskering actions on rewrite paths are compatible with each other:
\[(z\star p)\star z^\prime = z\star(p\star z^\prime)\]
We also have the following identities:
\begin{gather*}
\Src(z\star p)=z\cdot\Src(p)\\
\Dst(z\star p)=z\cdot\Dst(p)\\[\medskipamount]
\Src(p\star z)=\Src(p)\cdot z\\
\Dst(p\star z)=\Dst(p)\cdot z
\end{gather*}
Our theory has a certain consistency: if we think of a rewrite step $s$ as a one-element rewrite path, all of the above definitions remain unambiguous under either interpretation.
\end{definition}%
\begin{definition}
\IndexDefinition{rewrite path composition}%
If $p_1$ and $p_2$ are two rewrite paths with $\Dst(p_1)=\Src(p_2)$, we define their \emph{composition} \index{$\circ$}\index{$\circ$!z@\igobble|seealso{rewrite path}}$p_1 \circ p_2$ as the rewrite path consisting of the initial term of $p_1$, followed by the rewrite steps of $p_1$, and then $p_2$. This gives us a new rewrite path whose source and destination are the source and destination of $p_1$ and $p_2$, respectively:
\begin{gather*}
\Src(p_1 \circ p_2) = \Src(p_1)\\
\Dst(p_1 \circ p_2) = \Dst(p_2)
\end{gather*}
Shown as a diagram, composition joins two paths (here the ``$=$'' is not an edge in the graph, but denotes equality of terms):
\begin{quote}
\begin{tikzcd}
\Src(p_1)\arrow[r, Rightarrow]&\cdots\arrow[r, Rightarrow]&(\Dst(p_1)=\Src(p_2))\arrow[r, Rightarrow]&\cdots\arrow[r, Rightarrow]&\Dst(p_2)
\end{tikzcd}
\end{quote}
Empty rewrite paths \index{empty rewrite path}act as the identity with respect to composition:
\begin{gather*}
(1_{\Src(p)})\circ p = p\\
p\circ (1_{\Dst(p)}) = p
\end{gather*}
The inverse of the composition is the composition of the inverses, but flipped:
\[(p_1 \circ p_2)^{-1} = p_2^{-1} \circ p_1^{-1}\]
Rewrite path composition is compatible with whiskering:
\begin{gather*}
z\star (p_1 \circ p_2) = (z\star p_1) \circ (z\star p_2)\\
(p_1 \circ p_2)\star z = (p_1\star z) \circ (p_2\star z)
\end{gather*}
Finally, rewrite path composition is associative; if $p_3$ is another rewrite path such that $\Dst(p_2)=\Src(p_3)$, then $p_1 \circ (p_2 \circ p_3) = (p_1 \circ p_2) \circ p_3$.
\end{definition}
Just as set of rewrite steps is closed under inverses and whiskering, the set of rewrite paths is closed under inverses, whiskering and composition.
\begin{proposition}
Given an alphabet $A$ and set of relations $R$, let $P$ be the set of paths in the rewrite graph of $A$ and $R$. This set $P$ has the following properties:
\begin{enumerate}
\item If $(u_i,\,v_i)\in R$, there is a one-element rewrite path $(u_i\Rightarrow v_i)\in P$.
\item (Closure under inversion) If $p\in P$, then $p^{-1}\in P$.
\item (Closure under composition) If $p_1$, $p_2\in P$ with $\Dst(p_1)=\Src(p_2)$, then $p_1\circ p_2\in P$.
\item (Closure under whiskering) If $p\in P$ and $z\in A^*$, then $z\star p$, $p\star z\in P$.
\end{enumerate}
\end{proposition}
\begin{proof}
This is an immediate consequence of Proposition~\ref{edge set closed}.
\end{proof}
We are almost ready to actually define our equivalence relation $\sim$. Recall equivalence relations and equivalence classes from Section~\ref{reducedtypes}. While the elements of a free monoid are terms, the elements of a finitely-presented monoid are equivalence classes of terms. For the set of equivalence classes to have a monoid structure, we need our equivalence relation to be compatible with term concatenation.
\begin{definition}
\IndexDefinition{translation-invariant relation}%
A relation $R$ on a monoid $M$ is \emph{translation-invariant} if for any ordered pair $(x,y)\in R$ and for any $z\in M$, we have $(zx,zy)\in R$ and $(xz,yz)\in R$.
\end{definition}
\begin{example}
The standard ``less-than'' relation $<$ on \index{natural numbers}$\mathbb{N}$ is a \index{linear order}linear order which is also translation-invariant if we view $\mathbb{N}$ as a monoid under addition; for example, since $5<7$, we also have $(5+2)<(7+2)$.
\end{example}
As we said earlier, two terms of $A^*$ are equivalent by $\sim$ if a path joins them in the rewrite graph of $A$ and $R$. Now we have the tools to state this formally.
\begin{proposition}\label{monoid congruence}
Given an alphabet $A$ and set of relations $R$, let $P$ be the set of paths in the rewrite graph of $A$ and $R$. Define the relation \index{$\sim$}\index{$\sim$!z@\igobble|seealso{monoid congruence}}$\sim$ on $A^*$ such that $x\sim y$ if there is a rewrite path $p\in P$ with $x=\Src(p)$ and $y=\Dst(p)$. Then $\sim$ is a translation-invariant \index{equivalence relation}equivalence relation on $A^*$.
\end{proposition}
\begin{proof}
We prove each condition in turn using the properties of rewrite paths.
\begin{enumerate}
\item (Reflexivity) For each $x\in A^*$, there is an \index{empty rewrite path}empty rewrite path $1_x\in P$. Since $\Src(1_x)=\Dst(1_x)=x$, it follows that $x\sim x$.
\item (Symmetry) Assume $x\sim y$. Then there is a $p\in P$ such that $x=\Src(p)$ and $y=\Dst(p)$. Since $P$ is closed under taking inverses, $p^{-1}\in P$. But $y=\Src(p^{-1})$ and $x=\Dst(p^{-1})$, so we have $y\sim x$.
\item (Transitivity) Assume $x\sim y$ and $y\sim z$. Then there is a $p_1\in P$ with $x=\Src(p_1)$ and $y=\Dst(p_1)$, and a $p_2\in P$ with $y=\Src(p_2)$ and $z=\Dst(p_2)$. The composition $p_1\circ p_2$ is valid since $\Dst(p_1)=\Src(p_2)$, and $P$ is closed under composition so $p_1\circ p_2 \in P$. Now, from the definition of rewrite path composition, we also know that
\begin{gather*}
\Src(p_1\circ p_2)=\Src(p_1)=x\\
\Dst(p_1\circ p_2)=\Dst(p_2)=z
\end{gather*}
Thus, $x\sim z$.
\item (Translation invariance) If $x\sim y$ and $z\in A^*$, we have a $p\in P$ such that $x=\Src(p)$ and $y=\Dst(p)$. Since $P$ is closed under whiskering, $z\star p\in P$. Recall that
\begin{gather*}
\Src(z\star p)=z\cdot \Src(p)=zx\\
\Dst(z\star p)=z\cdot \Dst(p)=zy
\end{gather*}
This shows that $zx \sim zy$. The proof that $xz \sim yz$ is similar.
\end{enumerate}
A translation-invariant equivalence relation is also known as a \IndexDefinition{monoid congruence}\emph{monoid congruence}.
\end{proof}
Given a term $x\in A^*$ and a set of relations $R$, the \emph{equivalence class} of $x$ is the set of all $y\in A^*$ such that $x\sim y$.
\begin{example}
Here is a table listing some equivalence classes and their representatives in the bicyclic monoid; each row is a distinct equivalence class, and the terms within each row are equivalent:
\begin{quote}
\begin{tabular}{l}
\toprule
$\varepsilon$, $ab$, $aabb$, $abaabb$, \ldots\\
\midrule
$a$, $aab$, $aaabb$, $aabab$, $aaabbab$, \ldots\\
\midrule
$b$, $bab$, $baabb$, $babab$, $baabbab$, \ldots\\
\midrule
$ba$, $baab$, $abba$, $abbaab$, \ldots\\
\midrule
$baa$, $babaa$, $baaba$, $babaaba$, \ldots\\
\midrule
\ldots\\
\bottomrule
\end{tabular}
\end{quote}
Indeed, the bicyclic monoid has infinitely many equivalence classes, and each equivalence class contains infinitely many terms.
Since the elements of the bicyclic monoid are actually equivalence classes, the monoid operation also operates on equivalence classes, and not terms. What does this mean? Well, let's define $x := baba$, $x^\prime := ba$, $y := a$, $y^\prime := aba$. From the above table, we see that $x$ and $x^\prime$ belong to the same equivalence class (the 4th row in the table), and so do $y$ and $y^\prime$ (the 2nd row). It turns out that $xy$ and $x^\prime y^\prime$ are also both equivalent, appearing in the 5th row of the table; in fact, we already knew about this specific equivalence earlier when we exhibited a rewrite path from $xy=babaa$ to $x^\prime y^\prime=baaba$.
If we didn't have this rewrite path already though, we could derive one. We first write down a rewrite path $p_x$ from $baba$ to $ba$, and another rewrite path $p_y$ from $a$ to $aba$:
\begin{gather*}
p_x := b(ab\Rightarrow \varepsilon)a\\
p_y := (\varepsilon\Rightarrow ab)a
\end{gather*}
If we whisker $p_x$ on the right by $y$, we get a rewrite path $p_x\star a$ going from $babaa$ to $baa$:
\[b(ab\Rightarrow\varepsilon)aa\]
Then, we whisker $p_y$ on the left by $x^\prime$, and get a rewrite path $ba\star p_y$ from $baa$ to $baaba$:
\[ba(\varepsilon\Rightarrow ab)a\]
Their composition is exactly the same as our earlier rewrite path from $babaa$ to $baaba$:
\[b(ab\Rightarrow\varepsilon)aa\circ ba(\varepsilon\Rightarrow ab)a\]
The next proposition proves that this construction actually works in general.
\end{example}
\begin{proposition}
\index{equivalence class}%
The monoid operation on the equivalence classes of $\sim$ does not depend on the choice of representatives; that is, if $x\sim x^\prime$ and $y\sim y^\prime$, then $xy\sim x^\prime y^\prime$. (Another way of saying this is that $\cdot$ is \index{well-defined mapping}\emph{well-defined} on the equivalence classes of $\sim$.)
\end{proposition}
\begin{proof}
Let $P$ be the set of rewrite paths in the rewrite graph of $A$ and $R$, as before. Since $x\sim x^\prime$ and $y\sim y^\prime$, we have a pair of rewrite paths $p_x,\,p_y\in P$ where:
\begin{gather*}
\Src(p_x)=x\\
\Dst(p_x)=x^\prime\\[\medskipamount]
\Src(p_y)=y\\
\Dst(p_y)=y^\prime
\end{gather*}
We also know that $P$ is closed under whiskering, so $p_x\star y\in P$ is a rewrite path from $xy$ to $x^\prime y$:
\begin{gather*}
\Src(p_x\star y)=\Src(p_x)\cdot y=xy\\
\Dst(p_x\star y)=\Dst(p_x)\cdot y=x^\prime y
\end{gather*}
And similarly $x^\prime\star p_y\in P$ is a rewrite path from $x^\prime y$ to $x^\prime y^\prime$:
\begin{gather*}
\Src(x^\prime \star p_y)=x^\prime \cdot\Src(p_y)=x^\prime y\\
\Dst(x^\prime \star p_y)=x^\prime \cdot\Dst(p_y)=x^\prime y^\prime
\end{gather*}
Now, we define $p_{xy}:=(p_x\star y) \circ (x^\prime\star p_y)$; this is valid, because $\Src(x^\prime \star p_y)=\Dst(p_x\star y)$.
Also, $P$ is closed under composition, so $p_{xy}\in P$. We have:
\begin{gather*}
\Src(p_{xy}) = \Src(p_x\star y) = xy\\
\Dst(p_{xy}) = \Dst(x^\prime \star p_y) = x^\prime y^\prime
\end{gather*}
Thus $xy\sim x^\prime y^\prime$. We can also prove this with a diagram as shown in Figure~\ref{monoid operation well defined}.
\end{proof}
\begin{figure}\captionabove{The monoid operation on the equivalence classes of $\sim$ is well-defined}\label{monoid operation well defined}
\[
\begin{array}{c}
\begin{tikzcd}
x\arrow[d, Rightarrow]\\
\cdots\arrow[d, Rightarrow]\\
x^\prime
\end{tikzcd}\\
\\
\begin{tikzcd}
y\arrow[d, Rightarrow]\\
\cdots\arrow[d, Rightarrow]\\
y^\prime
\end{tikzcd}
\end{array} \qquad \xrightarrow{whiskering} \qquad
\begin{array}{c}
\begin{tikzcd}
x\cdot y\arrow[d, Rightarrow]\\
\cdots \arrow[d, Rightarrow]\\
x^\prime\cdot y
\end{tikzcd}\\
\\
\begin{tikzcd}
x^\prime\cdot y\arrow[d, Rightarrow]\\
\cdots\arrow[d, Rightarrow]\\
x^\prime\cdot y^\prime
\end{tikzcd}
\end{array} \qquad \xrightarrow{composition} \qquad
\begin{tikzcd}
x\cdot y\arrow[d, Rightarrow]\\
\cdots\arrow[d, Rightarrow]\\
x^\prime\cdot y\arrow[d, Rightarrow]\\
\cdots\arrow[d, Rightarrow]\\
x^\prime\cdot y^\prime
\end{tikzcd}
\]
\end{figure}
Here is the culmination of everything we've learned so far.
\begin{definition} A \IndexDefinition{finitely-presented monoid}\emph{finitely-presented monoid} $\langle A;\, R\rangle$ with alphabet $A:=\{a_1,\,\ldots,\,a_n\}$ and relations $\{(u_1,\,v_1),\,\ldots,\,(u_n,\,v_n)\}$ is constructed from the \index{free monoid}free monoid $A^*$ and \index{rewrite graph}rewrite graph of $A$ and $R$:
\begin{itemize}
\item The elements of $\langle A;\, R\rangle$ are the \index{equivalence class}equivalence classes of $A^*$ under the equivalence relation $\sim$ from Proposition~\ref{monoid congruence}.
\item The monoid operation of $\langle A;\, R\rangle$ picks an arbitrary representative term from each of the two equivalence classes, concatenates the terms using the monoid operation on $A^*$, and returns the equivalence class of the result.
\item The identity element of $\langle A;\, R\rangle$ is the equivalence class of the \index{identity element}identity element $\varepsilon\in A^*$.
\end{itemize}
\end{definition}
\section{The Swift Connection}\label{monoidsasprotocols}
In this section, we will show that any finitely-presented monoid defines a Swift protocol in a natural way. We make use of the \index{derived requirement}derived requirements formalism we defined in Section~\ref{derived req}, and further results from Section~\ref{generic signature validity}.
\paragraph{Free monoids.} The below protocol---or more precisely, the protocol generic signature $G_\texttt{P}$---models the \index{free monoid}free monoid over the two-element alphabet $\{a,b\}^*$:
\begin{Verbatim}
protocol P {
associatedtype A: P
associatedtype B: P
}
\end{Verbatim}
We first define a mapping $\varphi\colon\{a,b\}^*\rightarrow G_\texttt{P}$:
\begin{itemize}
\item $\varphi$ maps the identity element $\varepsilon$ to $\texttt{Self}$; that is, $\varphi(\varepsilon):=\texttt{Self}$.
\item $\varphi$ maps a term $ta$ to a \index{dependent member type}dependent member type with base type $\varphi(t)$ and associated type \texttt{[P]A}; that is, $\varphi(ta):=\texttt{$\varphi(t)$.[P]A}$.
\item $\varphi$ maps a term $tb$ to a dependent member type with base type $\varphi(t)$ and associated type \texttt{[P]B}; that is, $\varphi(tb):=\texttt{$\varphi(t)$.[P]B}$.
\end{itemize}
The above definition is exhaustive since every term in $\{a,b\}^*$ is either the identity, or a composition of a shorter term with $a$ or $b$. For example, the four distinct terms of length 2 map to the following type parameters:
\begin{gather*}
\varphi(aa)=\texttt{Self.[P]A.[P]A}\\
\varphi(ab)=\texttt{Self.[P]A.[P]B}\\
\varphi(ba)=\texttt{Self.[P]B.[P]A}\\
\varphi(bb)=\texttt{Self.[P]B.[P]B}
\end{gather*}
In fact, while the definition of $\varphi$ is ``syntactic,'' every type parameter produced by $\varphi$ is also semantically valid in $G_\texttt{P}$, and every such type parameter also conforms to \texttt{P}:
\begin{proposition}\label{monoid type lemma} If $t\in \{a,b\}^*$ and $\texttt{T}:=\varphi(t)$, then $G_\texttt{P}\vDash\texttt{T}$ and $G_\texttt{P}\vDash\ConfReq{T}{P}$.
\end{proposition}
\begin{proof}
We proceed by \index{induction}induction on the length of $t$. If $|t|=0$, then $t=\varepsilon$, and $\varphi(\varepsilon)=\texttt{Self}$ by definition. We can derive \texttt{Self} from $G_\texttt{P}$ with a single \IndexStep{GenSig}\textsc{GenSig} derivation step:
\begin{gather*}
\vdash\texttt{Self}\tag{1}
\end{gather*}
Likewise for $\ConfReq{Self}{P}$:
\begin{gather*}
\vdash\ConfReq{Self}{P}\tag{1}
\end{gather*}
In the inductive step, we have a term of non-zero length, so $ta$ or $tb$. Let $\texttt{T}:=\varphi(t)$. By the inductive hypothesis, $G_\texttt{P}\vDash\texttt{T}$ and $G_\texttt{P}\vDash\ConfReq{T}{P}$. Now, we can extend the derivation of $\ConfReq{T}{P}$ with a \IndexStep{Member}\textsc{Member} derivation step to obtain a derivation of \texttt{T.[P]A} or \texttt{T.[P]B} as required:
\begin{gather*}
\ldots\vdash\ConfReq{U}{P}\tag{1}\\
(1)\vdash\texttt{U.[P]A}\tag{2}\\
(1)\vdash\texttt{U.[P]B}\tag{3}
\end{gather*}
We similarly obtain a derivation of $\ConfReq{T.[P]A}{P}$ or $\ConfReq{T.[P]B}{P}$ from $\ConfReq{T}{P}$ by adding a \IndexStep{ReqSig}\textsc{ReqSig} derivation step for the requirement $\ConfReq{Self.[P]A}{P}$ or $\ConfReq{Self.[P]B}{P}$ of \texttt{P}, followed by a \IndexStep{Conf}\textsc{Conf} step:
\begin{gather*}
\ldots\vdash\ConfReq{T}{P}\tag{1}\\
\vdash\ConfReq{Self.[P]A}{P}\tag{2}\\
\vdash\ConfReq{Self.[P]B}{P}\tag{3}\\
(1),\,(2)\vdash\ConfReq{T.[P]A}{P}\tag{4}\\
(1),\,(3)\vdash\ConfReq{T.[P]B}{P}\tag{5}
\end{gather*}
This completes the induction.
\end{proof}
The next step is to define an associative \index{binary operation}binary operation ``$\cdot$'' on $G_\texttt{P}$ and show that $(G_\texttt{P},\,\cdot,\,\texttt{Self})$ is a monoid. Consider the two type parameters \texttt{Self.[P]A.[P]A} and \texttt{Self.[P]B.[P]B}. Intuitively, we can drop ``\texttt{Self}'' from the right-hand side, and then concatenate this onto the left-hand side, as a string:
\[(\texttt{Self.[P]A.[P]A})\cdot(\texttt{Self.[P]B.[P]B})=\texttt{Self.[P]A.[P]A.[P]B.[P]B}\]
Manipulation of string representations is unsatisfying, though; we would like our theory to be more rigorous than that. In fact, we can formalize this operation in one of two equivalent ways.
The first approach uses the \index{type substitution}type substitution algebra. We build a \index{protocol substitution map}protocol substitution map from the left-hand side, which we know conforms to \texttt{P}, and then apply it to the right-hand side, which we can decompose as a \index{conformance path}conformance path:
\begin{gather*}
\texttt{Self.[P]B.[P]B}\otimes\Sigma_{\ConfReq{Self.[P]A.[P]A}{P}}\\
\qquad {} = \AssocType{[P]B} \otimes \AssocConf{Self.[P]B}{P} \otimes \ConfReq{Self}{P} \otimes \Sigma_{\ConfReq{Self.[P]A.[P]A}{P}}\\
\qquad {} = \AssocType{[P]B} \otimes \AssocConf{Self.[P]B}{P} \otimes \ConfReq{Self.[P]A.[P]A}{P}\\
\qquad {} = \AssocType{[P]B} \otimes \AssocConf{Self.[P]B}{P} \otimes \ConfReq{Self.[P]A.[P]A}{P}\\
\qquad {} = \texttt{Self.[P]A.[P]A.[P]B.[P]B}
\end{gather*}
More generally, for two type parameters \texttt{U} and \texttt{V} of $G_\texttt{P}$, we define:
\[\texttt{U}\cdot\texttt{V}:=\texttt{V}\otimes\Sigma_{\ConfReq{U}{P}}\]
It is easy to see that \Index{protocol Self type@protocol \texttt{Self} type}\texttt{Self} acts as the \index{identity element}identity element on the left, since $\Sigma_{\ConfReq{Self}{P}}$ is the \index{identity substitution map}identity substitution map of $G_\texttt{P}$:
\[
\texttt{Self}\cdot\texttt{T} = \texttt{T}\otimes\Sigma_{\ConfReq{Self}{P}} = \texttt{T}
\]
Similarly, it acts as the identity element on the right, since applying a protocol substitution map to \texttt{Self} projects the conforming type:
\[
\texttt{T}\cdot\texttt{Self} = \texttt{Self}\otimes\Sigma_{\ConfReq{T}{P}} = \texttt{T}
\]
We won't prove that ``$\cdot$'' is associative, but this can be shown with enough effort by considering the conformance path decomposition, as above.
Another approach to formalizing ``$\cdot$'' is to use Lemma~\ref{subst lemma}. This has the advantage that it avoids the type substitution algebra entirely, instead being defined in terms of the more restricted idea of \index{formal substitution}formal substitution in the derived requirements formalism. Suppose we are given two type parameters \texttt{U} and \texttt{V} of $G_\texttt{P}$. By Proposition~\ref{monoid type lemma}, we know that $G_\texttt{P}\vDash\ConfReq{U}{P}$ and $G_\texttt{P}\vDash\texttt{V}$. Thus, the conditions of Lemma~\ref{subst lemma} are satisfied, so we can define $\texttt{U}\cdot\texttt{V}$ as the valid type parameter obtained by performing a formal substitution of \texttt{Self} with \texttt{U} in \texttt{V}.
The last question is then, in what sense is $(G_\texttt{P},\,\cdot,\,\texttt{Self})$ ``equivalent'' to the free monoid $\{a,b\}^*$? To answer this, we return to the mapping $\varphi$. This is actually a special kind of mapping, which we define now.
\begin{definition}
\index{homomorphism}%
\IndexDefinition{monoid homomorphism}%
\IndexDefinition{monoid isomorphism}%
\index{homomorphism}%
A function $f\colon M\rightarrow N$ is a \emph{monoid homomorphism} if it maps the identity of $M$ to the identity of $N$ and is compatible with the binary operation of $M$ and $N$; that is, $f(\varepsilon)=\varepsilon$, and for all $x$, $y\in M$, $f(x\cdot y)=f(x)\cdot f(y)$. Furthermore if $f$ has an inverse $f^{-1}$ such that $f^{-1}(f(x))=x$ for all $x\in M$ and $f(f^{-1}(f(x))=x$ for all $x\in N$, then $f$ is a \emph{monoid isomorphism}, in which case we say $M$ and $N$ are \emph{isomorphic}.
\end{definition}
To see that $\varphi$ is a monoid isomorphism, we consider each part of the definition. We know that $\varphi$ maps the identity element $\varepsilon$ of $\{a,b\}^*$ to the identity element \texttt{Self} of $G_\texttt{P}$. It also respects the monoid operation. For example,
\begin{gather*}
\varphi(aa)\cdot\varphi(bb)\\
\qquad {}=(\texttt{Self.[P]A.[P]A})\cdot(\texttt{Self.[P]B.[P]B})\\
\qquad {}=\texttt{Self.[P]A.[P]A.[P]B.[P]B}\\
\qquad {}=\varphi(aabb)
\end{gather*}
More generally, we claim without proof that $\varphi(uv)=\varphi(u)\cdot\varphi(v)$, for all $u$, $v\in\{a,b\}^*$. We can also define an inverse mapping $\PhiInv\colon G_\texttt{P}\rightarrow\{a,b\}^*$:
\begin{gather*}
\varphi(\texttt{Self}):=\varepsilon\\
\varphi(\texttt{\texttt{T}.[P]A}):=\PhiInv(\texttt{T})\cdot a\\
\varphi(\texttt{\texttt{T}.[P]B}):=\PhiInv(\texttt{T})\cdot b
\end{gather*}
Thus, $\varphi$ is a monoid isomorphism, and $(G_\texttt{P},\,\cdot,\,\texttt{Self})$ is isomorphic to $\{a,b\}^*$. Also, nothing in our construction depends on there being exactly two associated type declarations in \texttt{P}. The protocol \texttt{N} from Section~\ref{recursive conformances} defines a set of type parameters isomorphic to the free monoid over the one-element alphabet $\{a\}^*$, which is just the set of \index{natural numbers}natural numbers $\mathbb{N}$ under addition; this justifies the name \texttt{N} we've been using for this protocol throughout! Similarly, adding a third associated type \texttt{C} to \texttt{P}, together with an \index{associated conformance requirement}associated conformance requirement $\ConfReq{Self.C}{P}_\texttt{P}$, gives us a monoid isomorphic to $\{a,b,c\}^*$. For any finite alphabet $A$, we can similarly obtain a protocol isomorphic to $A^*$.
\paragraph{Finitely-presented monoids.}
Now, we generalize our construction of a free monoid $A^*$ to a \index{finitely-presented monoid}finitely-presented monoid $\AR$. We define a protocol \texttt{P} containing a series of associated types corresponding to the elements of $A$, as before. Then, if $R$ has at least one element, we attach a \Index{where clause@\texttt{where} clause}trailing \texttt{where} clause to \texttt{P}, made up of \index{same-type requirement}same-type requirements constructed from elements of $R$: for each $(u_i,\,v_i)\in R$, we apply $\varphi$ to both $u_i$ and $v_i$ to form $\FormalReq{$\varphi(u_i)$ == $\varphi(v_j)$}_\texttt{P}$. If $R$ is empty, we get the same protocol as before, because the finitely-presented monoid $\langle A;\varnothing\rangle$ is isomorphic to the free monoid $A^*$.
These same-type requirements of \texttt{P} give the protocol generic signature $G_\texttt{P}$ a non-trivial \index{equivalence class}equivalence class structure under the \index{reduced type equality}reduced type equality relation. We now show that this is the \emph{same} equivalence class structure as $\AR$ by arguing two facts:
\begin{enumerate}
\item The mapping $\varphi$ is \index{well-defined mapping}well-defined. That is, if $x\sim y$ as elements of $\AR$, then $\varphi(x)$ and $\varphi(y)$ must also have the same reduced type.
\item The mapping $\varphi$ is an \index{monoid isomorphism}isomorphism. That is, if $\varphi(x)$ and $\varphi(y)$ are equivalent type parameters, then it is also the case that $x\sim y$ as elements of $\AR$.
\end{enumerate}
We used an informal argument to show that the above holds in the absence of same-type requirements, when our protocol modeled the free monoid $A^*$. A finitely-presented monoid $\AR$ requires more care. We recall that two terms $x$, $y\in A^*$ are equivalent under $\sim$ if there exists a \index{rewrite path}rewrite path $p$ with $\Src(p)=x$ and $\Dst(p)=y$, and two type parameters \texttt{X} and \texttt{Y} are equivalent in $G_\texttt{P}$ if we can \index{derived requirement}derive a same-type requirement $\FormalReq{X == Y}$ from $G_\texttt{P}$. Thus, every rewrite path of $\AR$ must correspond to a derivation of a same-type requirement of $G_\texttt{P}$, and vice versa. We prove this correspondence with a pair of propositions.
\begin{theorem}\label{path to derivation}
If $x\sim y$ in $\AR$, then $G_\texttt{P}\vDash\FormalReq{X == Y}$, where $\texttt{X}:=\varphi(x)$ and $\texttt{Y}:=\varphi(y)$.
\end{theorem}
\begin{proof}
We are given a rewrite path $p$ with $\Src(p)=x$ and $\Dst(p)=y$, and we must construct a derivation of $\FormalReq{X == Y}$ in $G_\texttt{P}$.
If the \index{empty rewrite path}rewrite path is empty, that is, $p=1_t$ for some $t\in A^*$, we construct a derivation of $\texttt{T}:=\varphi(t)$ using Proposition~\ref{monoid type lemma}, and then extend it with an \IndexStep{Equiv}\textsc{Equiv} rule to get a derivation of $\FormalReq{T == T}$:
\begin{gather*}
\ldots\vdash\texttt{T}\tag{1}\\
(1)\vdash\FormalReq{T == T}\tag{2}
\end{gather*}
If the rewrite path contains more than one rewrite step, we derive a same-type requirement for each rewrite step, and combine them into one with a sequence of \IndexStep{Equiv}\textsc{Equiv} derivation steps:
\begin{itemize}
\item The left-hand side of the first same-type requirement corresponds to the source of the first rewrite step, which is the source of the rewrite path, and is therefore the type parameter \texttt{X}.
\item The left-hand side of each subsequent requirement is identical to the right-hand side of the previous requirement.
\item Finally, the right-hand side of the final requirement corresponds to the destination of the final rewrite step, which is the destination of the rewrite path, and is therefore the type parameter \texttt{Y}.
\end{itemize}
It remains to consider the case of a single rewrite step. Suppose we are given a rewrite step $t(u\Rightarrow v)w$, with $t$, $u$, $v$, $w\in A^*$. We first define the following type parameters (the ``\texttt{Self.}'' is explicit in all but the first one, because for notational convenience we want to write down type parameters like \texttt{T.U} and such, which wouldn't make sense if \texttt{U} included the ``\texttt{Self.}'' part):
\begin{gather*}
\texttt{T} := \varphi(t)\\
\texttt{Self.U} := \varphi(u)\\
\texttt{Self.V} := \varphi(v)\\
\texttt{Self.W} := \varphi(w)
\end{gather*}
Now, we construct a derivation as follows:
\begin{itemize}
\item First, we derive $\ConfReq{T}{P}$ using Proposition~\ref{monoid type lemma}.
\item The terms $u$ and $v$ are related by a relation of the form $(u,v)$ or $(v,u)\in R$. This corresponds to an explicit same-type requirement of \texttt{P}, either $\FormalReq{Self.U == Self.V}$ or $\FormalReq{Self.V == Self.U}$.
We introduce this requirement using a \textsc{ReqSig} derivation step, and combine it with the derivation of $\ConfReq{T}{P}$ using a \textsc{Conf} step, to get a derivation of $\FormalReq{T.U == T.V}$ or $\FormalReq{T.V == T.U}$.
If the requirement has the form $\FormalReq{T.V == T.U}$, we also add an \textsc{Equiv} step, flipping it around to get $\FormalReq{T.U == T.V}$. We now have a derivation corresponding to the rewrite step $t(u\Rightarrow v)$.
\item Next, we construct a derivation of $\texttt{Self.W}$ using Proposition~\ref{monoid type lemma}.
\item Finally, we make use of Lemma~\ref{subst lemma}, and combine the derivation of $\FormalReq{T.U == T.V}$ with the derivation of \texttt{Self.W}, to get a derivation of $\FormalReq{T.U.W == T.V.W}$. This corresponds to the rewrite step $t(u\Rightarrow v)w$.
\end{itemize}
Thus, every rewrite path has a corresponding derived same-type requirement.
\end{proof}
Now, we prove the other direction.
\begin{theorem}\label{derivation to path}
If $G_\texttt{P}\vDash\FormalReq{X == Y}$, then $x\sim y$ in $\AR$, where $x:=\PhiInv(\texttt{X})$ and $y:=\PhiInv(\texttt{Y})$.
\end{theorem}
\begin{proof}
We are given a derivation of $\FormalReq{X == Y}$. We will show that $x\sim y$ by constructing a rewrite path $p$ corresponding to $\FormalReq{X == Y}$, meaning its source and destination is the left-hand and right-hand side of the requirement, respectively; that is, $\Src(p)=x$ and $\Dst(p)=y$.
We proceed by \index{structural induction}structural induction on derivations. It suffices to only consider the derivation steps that produce new same-type requirements. At each step, the inductive hypothesis gives us a rewrite path corresponding to every same-type requirement on the left-hand side of $\vdash$, if there are any. We must then use these rewrite paths to construct a new rewrite path which corresponds to the same-type requirement on the right-hand side of $\vdash$. We do this by making use of the \index{rewrite path composition}composition and \index{rewrite path whiskering}whiskering operations on rewrite paths.
Each case deals with one, two, or three type parameters, denoted \texttt{T}, \texttt{U}, and \texttt{V}; and possibly an associated type declaration \texttt{[P]A}. For concision, we write the corresponding terms of $A^*$ as lowercase italic letters; also note that $|a|=1$, so in fact $a\in A$:
\begin{gather*}
t:=\PhiInv(\texttt{T})\\
u:=\PhiInv(\texttt{U})\\
v:=\PhiInv(\texttt{V})\\
a:=\PhiInv(\texttt{Self.[P]A})\qquad\mbox{(for (5))}
\end{gather*}
There are five cases to handle (the protocol generic signature $G_\texttt{P}$ does not have explicit same-type requirements, so we do not need to consider \IndexStep{GenSig}\textsc{GenSig} derivation steps here):
\begin{enumerate}
\item A \IndexStep{Conf}\textsc{Conf} step with a same-type requirement from the requirement signature of \texttt{P}; $\ConfReq{T}{P},\,\FormalReq{Self.U == Self.V}\vdash\FormalReq{T.U == T.V}$.
\item An \IndexStep{Equiv}\textsc{Equiv} step of the first kind; $\texttt{T}\vdash\FormalReq{T == T}$.
\item An \textsc{Equiv} step of the second kind; $\FormalReq{T == U}\vdash\FormalReq{U == T}$.
\item An \textsc{Equiv} step of the third kind; $\FormalReq{T == U},\,\FormalReq{U == V}\vdash\FormalReq{T == V}$.
\item A \IndexStep{Member}\textsc{Member} step; $\ConfReq{U}{P},\,\FormalReq{T == U}\vdash\FormalReq{T.[P]A == U.[P]A}$.
\end{enumerate}
For (1), the explicit same-type requirement $\FormalReq{Self.U == Self.V}$ in the requirement signature of \texttt{P} corresponds to the relation $(u, v)$ or $(v, u)$ of $R$. Take $p:=t(u\Rightarrow v)$. This corresponds to $\FormalReq{T.U == T.V}$, because $\Src(p)=tu$ and $\Dst(p)=tv$.
For (2), take $p:=1_t$, the identity rewrite path at $t$. This corresponds to $\FormalReq{T == T}$, since $\Src(p)=\Dst(p)=t$.
For (3), we have a rewrite path $p^\prime$ for $\FormalReq{T == U}$, where $\Src(p^\prime)=t$ and $\Dst(p^\prime)=u$. Take $p := (p^\prime)^{-1}$. This corresponds to $\FormalReq{U == T}$, because $\Src(p)=u$ and $\Dst(p)=t$.
For (4), we have a pair of rewrite paths $p_1$ and $p_2$, with $\Src(p_1)=t$, $\Dst(p_1)=\Src(p_2)=u$, and $\Dst(p_2)=v$. We compose the two rewrite paths, and take $p:=p_1\circ p_2$. This corresponds to $\FormalReq{T == V}$, because $\Src(p)=\Src(p_1)=t$, and $\Dst(p)=\Dst(p_2)=v$.
For (5), we have a rewrite path $p^\prime$, with $\Src(p^\prime)=t$ and $\Dst(p^\prime)=u$. We whisker the rewrite path on the right, and take $p:=p^\prime\star a$. This corresponds to $\FormalReq{T.[P]A == U.[P]A}$, because $\Src(p)=ta$ and $\Dst(p)=ua$.
\end{proof}
Let's try this out on the \index{bicyclic monoid}bicyclic monoid $\langle a,\,b;\,ab\sim\varepsilon\rangle$. If we apply $\varphi$ to both sides of the relation $(ab,\varepsilon)$, we get a same-type requirement $\FormalReq{Self == Self.A.B}$. Let's add this requirement to our protocol \texttt{P}:
\begin{Verbatim}
protocol P {
associatedtype A: P
associatedtype B: P where Self == Self.A.B
}
\end{Verbatim}
We previously claimed that the terms $baaba$ and $babaa$ belong to the same equivalence class in the bicyclic monoid, and we exhibited the following rewrite path:
\[
ba(ab\Rightarrow\varepsilon)a \circ
b(\varepsilon\Rightarrow ab)aa
\]
We now show the type parameters \verb|Self.B.A.A.B.A| and \verb|Self.B.A.B.A.A| belong to the same equivalence class, using Theorem~\ref{path to derivation} to translate our rewrite path into a derivation. The rewrite path consists of two rewrite steps. The first rewrite step, $ba(ab\Rightarrow\varepsilon)a$, proves an equivalence between $baaba$ and $baa$. We build up the derived same-type requirement $\FormalReq{Self.B.A.A.B.A == B.A.A}$ in piecemeal fashion; first, we prove $\ConfReq{Self.[P]B.[P]A}{P}$, corresponding to the $ab$ on the left:
\begin{gather*}
\vdash\ConfReq{Self}{P}\tag{1}\\
\vdash\ConfReq{Self.[P]B}{P}_\texttt{P}\tag{2}\\
(1),\,(2)\vdash\ConfReq{Self.[P]B}{P}\tag{3}\\
\vdash\ConfReq{Self.[P]A}{P}_\texttt{P}\tag{4}\\
(3),\,(4)\vdash\ConfReq{Self.[P]B.[P]A}{P}\tag{5}
\end{gather*}
We then introduce the same-type requirement corresponding to $(\varepsilon\Rightarrow ab)$ and combine it with $\ConfReq{Self.[P]B.[P]A}{P}$ using a \IndexStep{Conf}\textsc{Conf} step, to get a derivation for the rewrite step $ba(\varepsilon\Rightarrow ab)$:
\begin{gather*}
\vdash\FormalReq{Self == Self.[P]A.[P]B}_\texttt{P}\tag{6}\\
(5),\,(6)\vdash\FormalReq{Self.[P]B.[P]A == Self.[P]B.[P]A.[P]A.[P]B}\tag{7}
\end{gather*}
We invert it with an \IndexStep{Equiv}\textsc{Equiv} step, to get a derivation for the rewrite step $ba(ab\Rightarrow \varepsilon)$:
\begin{gather*}
(7)\vdash\FormalReq{Self.[P]B.[P]A.[P]A.[P]B == Self.[P]B.[P]A}\tag{8}
\end{gather*}
We derive $\ConfReq{Self.[P]B.[P]A.[P]A.[P]B}{P}$, corresponding to the term $baab$:
\begin{gather*}
(4),\,(5)\vdash\ConfReq{Self.[P]B.[P]A.[P]A}{P}\tag{9}\\
(2),\,(9)\vdash\ConfReq{Self.[P]B.[P]A.[P]A.[P]B}{P}\tag{10}
\end{gather*}
Finally, we add a \IndexStep{Member}\textsc{Member} step, and get a derivation corresponding to the actual rewrite step $ba(ab\Rightarrow\varepsilon)a$:
\begin{gather*}
(8),\,(10)\vdash\FormalReq{Self.[P]B.[P]A.[P]A.[P]B.[P]A == Self.[P]B.[P]A.[P]A}\tag{11}
\end{gather*}
The second rewrite step, $b(\varepsilon\Rightarrow ab)a$, proves an equivalence between $baa$ and $babaa$. We can reuse some of the above derivation steps here. We first build a derivation for the rewrite step $b(\varepsilon\Rightarrow ab)$:
\begin{gather*}
(3),\,(7)\vdash\FormalReq{Self.[P]B == Self.[P]B.[P]A.[P]B}\tag{12}\\
(12)\vdash\FormalReq{Self.[P]B.[P]A == Self.[P]B.[P]A.[P]B.[P]A}\tag{13}
\end{gather*}
Finally, we add a \IndexStep{Member}\textsc{Member} derivation step, which gives us the derived same-type requirement $\FormalReq{Self.B.A.A == B.A.B.A.A}$:
\[
(5),\,(13)\vdash\FormalReq{Self.[P]B.[P]A.[P]A == Self.[P]B.[P]A.[P]B.[P]A.[P]A}\tag{14}
\]
Now that we have a derived same-type requirement for both rewrite steps in our rewrite path, we can combine them with a \IndexStep{Equiv}\textsc{Equiv} step:
\begin{gather*}
(11),\,(14)\vdash[\texttt{Self.[P]B.[P]A.[P]A.[P]B.[P]A ==}\\
\qquad\qquad\qquad\qquad\texttt{Self.[P]B.[P]A.[P]B.[P]A.[P]A}]\tag{15}
\end{gather*}
We're done. While this derivation is quite a bit longer than our rewrite path, the translation process was entirely mechanical. The same procedure can be applied to any rewrite path. Now, here is the important thing---equivalence of type parameters is something the compiler must be able to \emph{compute}, via the \IndexDefinition{areReducedTypeParametersEqual()@\texttt{areReducedTypeParametersEqual()}}\texttt{areReducedTypeParametersEqual()} generic signature query.
Consider this program together with protocol \texttt{P} as above:
\begin{Verbatim}
extension P {
func testBicyclicMonoid() {
sameType(Self.B.A.A.B.A.self, Self.B.A.B.A.A.self) // ok
sameType(Self.A.A.A.self, Self.B.B.B.self) // error
}
}
\end{Verbatim}
By type checking calls to \texttt{sameType()}, we can tell if two type parameters are equivalent. A call type checks successfully only if both arguments are \index{metatype type}metatypes with the same \index{instance type}instance type. Inside the body of \texttt{testBicyclicMonoid()}, the first call is accepted because \verb|Self.A.B.B.B| and \verb|Self.B.A.B.B| are the same type, as we've already seen. However the second call is rejected because \verb|Self.A.A.A| and \verb|Self.B.B.B| are not equivalent; there is no derived same-type requirement relating them, and thus, no rewrite path in our monoid.
\section{The Word Problem}\label{word problem}
We showed that any finitely-presented monoid $\AR$ can be mechanically translated into a Swift protocol declaration, and we can write a program which asks the type checker to decide if two terms $x$ and $y$ are equivalent as elements of $\AR$. Our construction works with the examples of monoids we've seen so far---free monoids, the integers modulo $n$, the bicyclic monoid---and also countless others. An adventurous reader might try the \index{dihedral group}\emph{Dihedral group of order~12},
\[\langle s,\,t;\; s^6\sim\varepsilon,\, t^2\sim\varepsilon,\, tst\sim s^5\rangle\]
or the \index{plactic monoid}\emph{plactic monoid of order 2},
\[\langle a,\,b;\; aba\sim baa,\, bba\sim bab\rangle\]
We can translate these into protocol declarations, and play with our \texttt{sameType()} function to our heart's content. We are then led to ask if this always works, or if we somehow restrict which protocol declarations are accepted by the compiler. As it turns out, this is actually the well-known \IndexDefinition{word problem}\emph{word problem}:
\begin{quote}
Given a description of a finitely-presented monoid $\AR$, and a pair of terms $x$, $y\in A^*$, is there an effectively-computable algorithm to determine if $x\sim y$?
\end{quote}
This question was first posed in the early 20th century by \index{Axel Thue}Axel Thue, who studied various formalisms defined by the replacement of substrings within larger strings via a fixed set of rules. Thue described what later became known as \index{Thue system}\index{Thue system!z@\igobble|seealso{finitely-presented monoid}}\emph{Thue~systems} and \index{semi-Thue system}\index{semi-Thue system!z@\igobble|seealso{rewrite system}}\emph{semi-Thue~systems} in a 1914 paper \cite{thue_translation}. A Thue system is essentially a finitely-presented monoid, while a semi-Thue system is a string rewrite system, something we will define in the next section.
This brings us back to our previous discussion of computability, Turing machines and the halting problem from Section~\ref{tag systems}. In a 1947 paper, \index{Emil Post}Emil L.~Post showed that the halting problem reduces to the word problem with a specially-constructed Thue system whose rewrite steps correspond to the state transitions of a \index{Turing machine}Turing machine. In such a Thue system, an initial string is equivalent to a final string if and only if a Turing machine evaluation from the initial string reaches the final state. This shows that the word problem is undecidable, for if we could solve the word problem for an arbitrary Thue system, we could then solve the \index{halting problem}halting problem by asking if two specially-crafted strings are equivalent.
When we showed that termination checking for type substitution is an \index{undecidable problem}undecidable problem, we used recursive conformance requirements, introduced by \cite{se0157}, to encode arbitrary computation. Now, we see that reduced type equality is also undecidable; the present construction relies on the aforesaid recursive conformance requirements, together with same-type requirements in protocol \texttt{where} clauses \cite{se0142}.
Not only is the word problem undecidable when given an arbitrary monoid presentation, but we can construct a \emph{specific} finitely-presented monoid where it is undecidable. One way is via Turing's construction of a \index{universal Turing machine}\emph{universal machine}, which takes the \index{standard description}standard description of another machine as input, and simulates the execution of this machine step by step. In today's lingo, a universal machine is a ``Turing machine emulator'' or ``\index{metacircular interpreter}metacircular interpreter.'' Applying Post's construction to a universal Turing machine gives us a specific concrete monoid presentation in which the word problem cannot be solved. A universal Turing machine defines a rather ``large'' monoid presentation with many symbols and relations, so we might ask if all ``small'' monoid presentations have decidable word problems. While ``small'' is subjective of course, an example found by \index{Grigori Tseitin}G.~S.~Tseitin in 1958 certainly qualifies~\cite{undecidablesemigroup}.
\begin{theorem}\label{undecidablemonoid}
No algorithm can determine in finite time if an arbitrary input term is equivalent to $aaa$ given the relations below:
\begin{align*}
\langle a,\,b,\,c,\,d,\,e;\,&ac\sim ca,\,ad\sim da,\,bc\sim cb,\,bd\sim db,\\
&eca\sim ce,\,cdca\sim cdcae,\,caaa\sim aaa,\,daaa\sim aaa\rangle
\end{align*}
\end{theorem}
In fact, this monoid is a universal machine of sorts, and an more universal ``word problem interpreter'' is described in \cite{universalsemigroup}. Here is the corresponding protocol declaration:
\begin{Verbatim}
protocol P {
associatedtype A: P
associatedtype B: P
associatedtype C: P
associatedtype D: P
associatedtype E: P
where A.C == C.A, A.D == D.A,
B.C == C.B, B.D == D.B,
C.E == E.C.A, D.E == E.D.B,
C.C.A == C.C.A.E
}
\end{Verbatim}
The Swift compiler will reject this declaration and diagnose an error; the realization that this must be so \index{history}led to the development of the Requirement Machine, as mentioned in Section~\ref{type parameter graph}. The error message hints at our next topic:
\begin{quote}
\begin{verbatim}
error: cannot build rewrite system for protocol;
rule length limit exceeded
\end{verbatim}
\end{quote}
Indeed, we can implement an algorithm for solving the word problem in a suitably-restricted class of finitely-presented monoids. One such subclass contains those monoids which can be presented by a \index{rewrite system}\emph{convergent rewrite system}. The next section will show that a convergent rewrite system always has a decidable word problem.
\paragraph{The big picture.} We will revisit the encoding of a finitely-presented monoid as a Swift protocol in Example~\ref{double encoding}. To actually implement Swift generics via a rewrite system, however, we need the opposite: given a generic signature and some set of protocol declarations, we must translate this whole mess into a finitely-presented monoid. If we just ``invert'' the construction from the previous section, we would find ourselves with a very restricted form of generics which is quite useless in practice:
\begin{itemize}
\item A generic signature would just be a protocol generic signature: a single generic parameter conforming to a single protocol.
\item Every associated type of every protocol would conform to its parent protocol, and no other protocol.
\item The only other kind of requirement which could be stated is a same-type requirement in a protocol \texttt{where} clause.
\end{itemize}
Chapter~\ref{symbols terms rules} shows that by taking a suitable alphabet and set of relations, a finitely-presented monoid can describe Swift's generics system in all its glory: multiple generic parameters, conformance requirements to multiple protocols, different kinds of associated type requirements inside protocols, superclass requirements, and so on. In this section we showed that the undecidability of the word problem prevents us from accepting \emph{all} syntactically well-formed generic signatures and protocol declarations. We must be able to compute the reduced type equality relation in those generic signatures we \emph{do} accept, and most reasonable generic signatures fit within the restrictions of our model.
\section{Rewrite Systems}\label{rewritesystemintro}
As the \index{word problem}word problem is \index{undecidable problem}undecidable, there is no general algorithm that can find a rewrite path between a pair of terms in an arbitrary finitely-presented monoid. An intuitive way to think of this is that since the relations $(u, v)$ are bidirectional equivalences, a rewrite path might have to go ``up'' and ``down'' and make use of either or both rewrite steps $x(u\Rightarrow v)y$ and $x(v\Rightarrow u)y$ for a given relation $(u, v)\in R$; the intermediate terms visited along the way might be arbitrarily long:
\begin{quote}
\begin{tikzcd}
s\arrow[rd, Rightarrow]&&\cdots\arrow[rd, Rightarrow]&&t\\
&\cdots\arrow[ru, Rightarrow]&&\cdots\arrow[ru, Rightarrow]
\end{tikzcd}
\end{quote}
Now, instead of relations $(u,v)$, we might consider unidirectional \IndexDefinition{rewrite rule}\emph{rewrite rules} $u\Rightarrow v$. A rewrite rule allows us to replace the left hand side $u$ with the right-hand side $v$ when the left-hand side $u$ appears as a subterm of another term $xuy$, but not vice versa; that is, we can form the rewrite step $x(u\Rightarrow v)y$ but not $x(v\Rightarrow u)y$. With this restriction, we have the inkling of a computable algorithm.
\begin{algorithm}[Term reduction]\label{term reduction algo}
As input, takes a term $t\in A^*$ and a list of rewrite rules $R:=\{(u_1,\,v_1),\,\ldots\,(u_n,\,v_n)\}$, and outputs a rewrite path $p$.
\begin{enumerate}
\item Set $p\leftarrow 1_t$.
\item If $t=xu_iy$ for some $x$, $y\in A^*$ and $(u_i,\,v_i)\in R$, update $p\leftarrow p\circ x(u_i\Rightarrow v_i)y$ and $t\leftarrow xv_iy$. Go back to Step~1.
\item Otherwise, return $p$.
\end{enumerate}
\end{algorithm}
The algorithm outputs a rewrite path $p$ where $\Src(p)=t$, and $\Dst(p)$ is some term $t^\prime$ that does not contain the left-hand side of any rewrite rule as a subterm. We say that $t^\prime$ is a \IndexDefinition{reduced term}\index{irreducible term|see{reduced term}}\emph{reduced term}, and that $t$ \emph{reduces} to $t^\prime$. Suppose we wish to find a rewrite path from $s$ to $t$. We first \emph{orient} each relation $(u, v)$ to get a rewrite rule $u\Rightarrow v$ or $v\Rightarrow u$. Then we apply Algorithm~\ref{term reduction algo} to $s$ and $t$ to get a pair of rewrite paths $p_s$ and $p_t$, where $\Src(p_s)=s$, $\Src(p_t)=t$, and both $\Dst(p_s)$ and $\Dst(p_t)$ are reduced terms. If it so happens that $\Dst(p_s)=\Dst(p_t)$, then we know that $s\sim t$, and that $p:= p_s \circ p_t^{-1}$ is actually a rewrite path from $s$ to $t$:
\begin{quote}
\begin{tikzcd}
s\arrow[rd, Rightarrow]&&&&t\arrow[ld, Leftarrow]\\
&\cdots\arrow[rd, Rightarrow]&&\cdots\arrow[ld, Leftarrow]\\
&&s^\prime = t^\prime
\end{tikzcd}
\end{quote}
This looks like an appealing solution to the word problem, but we must overcome three difficulties before it can work:
\begin{enumerate}
\item Our algorithm does not actually guarantee termination as written. For example, consider these rules:
\begin{gather*}
ab\Rightarrow bc\\
ba\Rightarrow ab\\
c\Rightarrow a
\end{gather*}
Now, an attempt to reduce the term $ab$ gets stuck:
\[ab\rightarrow bc\rightarrow ba\rightarrow ab\rightarrow\ldots\]
\item Our algorithm is under-specified; there may be more than one rewrite rule that can apply at any given step. Typically, term reduction reduces the left-most possible subterm. However, without further conditions, the irreducible term we get may depend on these choices.
\item Indeed, if $s$ reduces to $s^\prime$, $t$ reduces to $t^\prime$, and $s^\prime=t^\prime$, then it is true that $s\sim t$. Unfortunately, \emph{a~priori} there is no reason for the converse to hold. It can happen that $s^\prime\neq t^\prime$ where both $s^\prime$ and $t^\prime$ are reduced, and yet, $s^\prime\sim t^\prime$.
\end{enumerate}
We wish to construct sufficiently ``well-behaved'' rewrite rules from the relations of a \index{finitely-presented monoid}finitely-presented monoid, in a manner that avoids these difficulties. When we succeed---and we know this cannot be possible in every case---we can solve the word problem using Algorithm~\ref{term reduction algo}.
What we're looking for here is the theory of \index{string rewrite system}\index{string rewrite system!z@\igobble|seealso{rewrite system}}\emph{string rewriting}; our rewrite rules operate on ``flat'' sequences of symbols from a finite alphabet. More general rewrite systems can also be considered; \index{term rewrite system}\index{term rewrite system!z@\igobble|seealso{rewrite system}}\emph{term rewriting} is the study of techniques for simplifying ``tree-like'' expressions, of which strings are a special case. An in-depth treatment of string rewriting can be found in \cite{book2012string}, while the more general theory of term rewriting is discussed in \cite{andallthat} and \cite{formalmans6}. (While all of our terms are strings in this setting, so we do not refer to them as strings to avoid confusion with the ``string'' data type in programming.)
\paragraph{Termination.} We will now consider each of our three ``problems'' in turn. To convert a relation $u\sim v$ into a rewrite rule, we need to pick one of the two orientations $u\Rightarrow v$ or $v\Rightarrow u$. The choice of orientation is not completely arbitrary; we already saw an example where term reduction gets stuck in a loop. Another possibility is that term reduction produces an infinite sequence of terms that never repeats; for example, with the single rewrite rule $a\Rightarrow aa$, attempting to reduce $a$ produces the following:
\[a\rightarrow aa\rightarrow aaa\rightarrow aaaa\rightarrow\ldots\]
To rule out both possibilities, we choose a suitable partial order, and orient each rewrite rule in such a way that term reduction produces a smaller term at each step. We can formalize this as follows.
\begin{definition}\label{reduction order def}
If $A$ is some alphabet, a \emph{reduction order} $<$ on the free monoid $A^*$ is a \index{translation-invariant relation}translation-invariant, well-founded, partial order. A \IndexDefinition{rewrite system}\emph{rewrite system} is a finitely-presented monoid $\AR$ together with a reduction order $<$, where the relations are oriented with respect to this order, meaning that $v<u$ for all $(u,v)\in R$. Note that if our reduction order is not a \index{linear order}linear order, it may happen that we have a \IndexDefinition{non-orientable relation}\emph{non-orientable} relation $u\sim v\in R$ where $u\not< v$ and $v\not< u$. We cannot build a rewrite system in this case.
\end{definition}
We defined well-founded orders in Section~\ref{reducedtypes}, where we saw that the existence of a reduced type, as the smallest element of an equivalence class of type parameters, was contingent upon the type parameter order being a well-founded order. We're going to re-state this definition here.
\begin{definition}
A \index{partial order}partial order $<$ over a set $S$ is \index{well-founded order}\emph{well-founded} if $S$ does not have an infinite descending chain; that is, there does not exist an infinite sequence of elements $x_i\in S$ such that:
\[\ldots <x_n<\ldots <x_3<x_2<x_1\]
\end{definition}
Translation invariance ensures that as long as each rewrite rule is oriented, term reduction produces a smaller term at each step, because if $u<v$, then it is also true that $xuy<xvy$. Well-foundedness ensures that since the term becomes smaller at each step, the term reduction must \IndexDefinition{terminating reduction relation}terminate after a finite number of steps.
\begin{definition}
The \index{term length}\emph{length} of a term $x\in A^*$ is the number of elements of $A$ appearing in $x$, counting duplicates. (Once again, this is defined on the terms of a free monoid, not the equivalence classes of a finitely-presented monoid.) For example, in $\{a,b,c\}^*$, $|\varepsilon|=0$, $|c|=1$, $|ab|=2$, $|aaa|=3$, and so on.
\end{definition}
Perhaps the simplest possible reduction order compares term length, so $x<y$ if $|x|<|y|$. Under the length order, length-preserving relations such as $(ab, ba)$ are non-orientable and cannot be considered.
\begin{example} Consider the following monoid presentation:
\[\langle a,\,b,\,c;\,cc\sim c,\,ca\sim a,\,a\sim ba\rangle\]
Using the length order as our reduction order, we're going to show that $acca\sim bcaa$ by constructing a rewrite path from $acca$ to $bcaa$. Each of our relations above equates a term of length 1 and length 2, so our reduction order gives us the following rewrite rules:
\begin{align*}
cc&\Rightarrow c\\
ca&\Rightarrow a\\
ba&\Rightarrow a
\end{align*}
First, we reduce the term $acca$ using Algorithm~\ref{term reduction algo}. We apply the first rewrite rule to replace the subterm $cc$ with $c$, leaving us with the term $aca$. This transformation is described by the rewrite step $a(cc\Rightarrow c)a$. Then, we apply the second rewrite rule to replace the subterm $ca$ with $a$, leaving us with $aa$. This is described by the rewrite step $a(ca\Rightarrow a)$. The term $aa$ cannot be reduced any further using our rewrite rules. We now have a rewrite path from $acca$ to $aa$, which we will call $p_1$:
\[p_1 := a(cc\Rightarrow c)a\circ a(ca\Rightarrow a)\]
Or in the form of a diagram:
\begin{quote}
\begin{tikzcd}[column sep=large]
acca\arrow[r, Rightarrow, "a(cc\Rightarrow c)a"]&aca\arrow[r, Rightarrow, "a(ca\Rightarrow a)"]&aa