--- created: 2026-04-08 08:52 course: "[[29593850 - Automationtheory]]" topic: "#languages #string #character #kleene #regularExpressions" related: "[[29593850 - Automationtheory]]" type: lecture status: 🔴 tags: - university --- ## 📌 Summary > [!abstract] > Overview of lecture 1 on `Wednesday, 2026/Apr/08` --- ## 📝 Content ### Alphabets Alphabets are formal, non-empty, finite, sets of characters (or _letters_ or _symbols_). They are denoted by $Sigma$. $Sigma = {a, b}$ > Alphabet $Sigma$ contains the characters $a$ and $b$. $Sigma = {a, ..., z, A, ..., Z, 0, ..., 9}$ > usual alphabet for writing text ### Strings A word (or _string_) is a finite sequence $w = a_1 a_2 ... a_n$ if characters from $Sigma$. > [!CONVENTION] > We will use small letters to describe strings that are part of a language. > [!EXAMPLE] > $"aa", "ab", "bba"$ and $"baab"$ are strings over $Sigma = {a, b}. #### Length of a string The _length_ $abs(x)$ of a string $x = a_1 ... a_n$ is its number $abs(x) = n$ of characters. #### Empty String The empty string is denoted by $epsilon$, this is the neutral element. -> $abs(epsilon) = 0$ ### Concatenation String can be concatenated, where one string is appended to another. For strings $x = a_1 ... a_n$ and $y = b_1 ... b_m$ over alphabets $Sigma_x$ and $Sigma_y$, their _concatenation_ over the alphabet $Sigma = Sigma_x union Sigma_y$ is the string $$x circle.small y = x y = a_1 a_2 ... a_n b_1 b_2 ... b_m$$ > This string is of the length $abs(x y) = n + m$ > [!EXAMPLE] > $x = "apple"$ > $y = "pie"$ > $x circle.small y = "applepie"$ Order of operations / Brackets do _not matter_. (Concatenation is associative but **not** commutative $x y eq.not y x$) > $(x circle.small y) circle.small z = x circle.small (y circle.small z)$ Any string concatenated with the empty string $epsilon$ will result in itself. > $x circle.small epsilon = x = epsilon circle.small x$ ### Exponentiation The $n^"th"$ power $x^n$ of a string $x$ is the $(n-1)$-fold concatenation of $x$ with itself. > $x^0 := epsilon$ > $x^n := x^(n-1) circle.small x$ for $n in NN$ > [!Example] > $x^4 = x x x x$ > $(a b)^3 = a b a b a b$ ### Reversing / Mirroring For a string $x = a_1 a_2 ... a_(n-1) a_n$ of length $n$, it's _mirrored string_ is given by $$ x^("Rev") = a_n a_(n-1)...a_2 a_1$$ ### Substrings A string $x$ is a _substring_ of a string $y$ if $y = u x v$, where $u$ and $v$ can be arbitrary strings. - If $u = epsilon$ then $x$ is a _prefix_ of $y$. - If $v = epsilon$ then $x$ is a suffix of $y$. For strings $x$ and $y$ the quantity $abs(y)_x$ is the number of times that $x$ is a substring of $y$. ### Kleene Star Denoted by $Sigma^*$. The Kleene Star (or _Kleene operator_ or _Kleene Closure_) gives an infinite amount of strings made up of the characters of the alphabet $Sigma ^ *$. $Sigma^*$ is the set of all string that can be generated by arbitrary concatenation of its characters. > $Sigma^* := union.big_(n>=0) A_n$ > where $A_n$ is the set of all string combinations of length $n$ #### Remarks - The same character can be used multiple times. - The empty string $epsilon$ is also part f $Sigma^*$. > [!Example] > $Sigma^* {a, b} = {epsilon, a, b, "aa", "ab", "ba", "bb", "aaa", "aab", ...}$ > [!FACT] > - The set $Sigma^*$ is infinite, since we defined $Sigma$ to be non-empty. > - It is _countable_ and has the same cardinality as the set $NN$ of natural numbers #### Kleene Plus The _Kleene Plus_ of an alphabet $Sigma$ is given by $Sigma^+ = Sigma^* backslash {epsilon}$ #### Lemma group structure The structure _Lemma_ is induced by the Kleene star - it is a monoid, that is a semigroup with a neutral element. > [!PROOF] > - Associativity has been shown > - Existence of a neutral element has been shown. > - Closure under $circle.small$: Let $x in Sigma^*$ and $y in Sigma^*$ be two string over the alphabet $Sigma$. Then $x circle.small y = x y in Sigma^*$ ### Formal Languages A formal _language_ of the alphabet $Sigma$ is a subset $L$ of $Sigma^*$ ### Finite representation of languages **Goal:** Represent a language using _finite_ information #### Using set notation $S = {a^n b^m bar n, m >= 0} = {epsilon, a, b, "aa", "ab", ...}$ > This is very inefficient. #### Using regular expressions A _regular expression_ $r$ over an alphabet $Sigma$ is defined recursively: - $emptyset, epsilon$ and each $a in Sigma$ are regular expression, which represent the Languages $L(emptyset) = emptyset, L(epsilon) = {epsilon}$ and $L(a) = {a}$ - If $r$ and $s$ are regular expressions then - $(r+s)$ - !! Complete from leture notes ### Regular languages A language $L$ that can be described by a regular expression $r$ (i. e. $L(r) = L$) is called _regular_.