Files
uni_notes/00 Inbox/29593852 - Strings.md
2026-04-08 10:11:50 +02:00

123 lines
4.7 KiB
Markdown

---
created: 2026-04-08 08:52
course: "[[29593850 - Automationtheory]]"
topic: "#languages #string #character #kleene #regularExpressions"
related: "[[29593850 - Automationtheory]]"
type: lecture
status: 🔴
tags:
- university
---
## 📌 Summary
> [!abstract]
> Overview of lecture 1 on `Wednesday, 2026/Apr/08`
---
## 📝 Content
### Alphabets
Alphabets are formal, non-empty, finite, sets of characters (or _letters_ or _symbols_). They are denoted by $Sigma$.
$Sigma = {a, b}$
> Alphabet $Sigma$ contains the characters $a$ and $b$.
$Sigma = {a, ..., z, A, ..., Z, 0, ..., 9}$
> usual alphabet for writing text
### Strings
A word (or _string_) is a finite sequence $w = a_1 a_2 ... a_n$ if characters from $Sigma$.
> [!CONVENTION]
> We will use small letters to describe strings that are part of a language.
> [!EXAMPLE]
> $"aa", "ab", "bba"$ and $"baab"$ are strings over $Sigma = {a, b}.
#### Length of a string
The _length_ $abs(x)$ of a string $x = a_1 ... a_n$ is its number $abs(x) = n$ of characters.
#### Empty String
The empty string is denoted by $epsilon$, this is the neutral element.
-> $abs(epsilon) = 0$
### Concatenation
String can be concatenated, where one string is appended to another.
For strings $x = a_1 ... a_n$ and $y = b_1 ... b_m$ over alphabets $Sigma_x$ and $Sigma_y$, their _concatenation_ over the alphabet $Sigma = Sigma_x union Sigma_y$ is the string
$$x circle.small y = x y = a_1 a_2 ... a_n b_1 b_2 ... b_m$$
> This string is of the length $abs(x y) = n + m$
> [!EXAMPLE]
> $x = "apple"$
> $y = "pie"$
> $x circle.small y = "applepie"$
Order of operations / Brackets do _not matter_. (Concatenation is associative but **not** commutative $x y eq.not y x$)
> $(x circle.small y) circle.small z = x circle.small (y circle.small z)$
Any string concatenated with the empty string $epsilon$ will result in itself.
> $x circle.small epsilon = x = epsilon circle.small x$
### Exponentiation
The $n^"th"$ power $x^n$ of a string $x$ is the $(n-1)$-fold concatenation of $x$ with itself.
> $x^0 := epsilon$
> $x^n := x^(n-1) circle.small x$ for $n in NN$
> [!Example]
> $x^4 = x x x x$
> $(a b)^3 = a b a b a b$
### Reversing / Mirroring
For a string $x = a_1 a_2 ... a_(n-1) a_n$ of length $n$, it's _mirrored string_ is given by
$$ x^("Rev") = a_n a_(n-1)...a_2 a_1$$
### Substrings
A string $x$ is a _substring_ of a string $y$ if $y = u x v$, where $u$ and $v$ can be arbitrary strings.
- If $u = epsilon$ then $x$ is a _prefix_ of $y$.
- If $v = epsilon$ then $x$ is a suffix of $y$.
For strings $x$ and $y$ the quantity $abs(y)_x$ is the number of times that $x$ is a substring of $y$.
### Kleene Star
Denoted by $Sigma^*$. The Kleene Star (or _Kleene operator_ or _Kleene Closure_) gives an infinite amount of strings made up of the characters of the alphabet $Sigma ^ *$.
$Sigma^*$ is the set of all string that can be generated by arbitrary concatenation of its characters.
> $Sigma^* := union.big_(n>=0) A_n$
> where $A_n$ is the set of all string combinations of length $n$
#### Remarks
- The same character can be used multiple times.
- The empty string $epsilon$ is also part f $Sigma^*$.
> [!Example]
> $Sigma^* {a, b} = {epsilon, a, b, "aa", "ab", "ba", "bb", "aaa", "aab", ...}$
> [!FACT]
> - The set $Sigma^*$ is infinite, since we defined $Sigma$ to be non-empty.
> - It is _countable_ and has the same cardinality as the set $NN$ of natural numbers
#### Kleene Plus
The _Kleene Plus_ of an alphabet $Sigma$ is given by $Sigma^+ = Sigma^* backslash {epsilon}$
#### Lemma group structure
The structure _Lemma_ is induced by the Kleene star - it is a monoid, that is a semigroup with a neutral element.
> [!PROOF]
> - Associativity has been shown
> - Existence of a neutral element has been shown.
> - Closure under $circle.small$: Let $x in Sigma^*$ and $y in Sigma^*$ be two string over the alphabet $Sigma$. Then $x circle.small y = x y in Sigma^*$
### Formal Languages
A formal _language_ of the alphabet $Sigma$ is a subset $L$ of $Sigma^*$
### Finite representation of languages
**Goal:** Represent a language using _finite_ information
#### Using set notation
$S = {a^n b^m bar n, m >= 0} = {epsilon, a, b, "aa", "ab", ...}$
> This is very inefficient.
#### Using regular expressions
A _regular expression_ $r$ over an alphabet $Sigma$ is defined recursively:
- $emptyset, epsilon$ and each $a in Sigma$ are regular expression, which represent the Languages $L(emptyset) = emptyset, L(epsilon) = {epsilon}$ and $L(a) = {a}$
- If $r$ and $s$ are regular expressions then
- $(r+s)$
<mark style="background: #FF5582A6;"> - !! Complete from leture notes</mark>
### Regular languages
A language $L$ that can be described by a regular expression $r$ (i. e. $L(r) = L$) is called _regular_.