# FLC - lesson 03 ##### Luca Oddone Breveglieri ###### 8 october 2015 ## Regular Expressions __Regular Languages__ are the simplest family of formal languages. Can be defined in different ways: - algebrically - through generating grammars - through recognizer machines __Regular Expression__ (*regexp*) is a string $r$ defined with the terminal alphabet $\Sigma$ containing some *metasymbols* ##### Possible cases: $r,s,t$ are regexp 1. $r=\varnothing$ 2. $r=a, \;\;a\in \Sigma$ 3. $r=s\cup t\;\; \text{or}\;\; r=s|t$ 4. $r=s.t\;\; \text{or}\;\; r=st$ 5. $r=s^*$ ##### Precedence of operators 1. star * 2. concatenation . 3. union $\cup$ or | ##### Derived operators - __Power__ $e^h=ee...e_{n-times}$ - __Repetition__ from k to n>k times $[a]_k^n=a^k\cup a^{k+1}\cup ...\cup a^n$ - __Optionality__ $[a]=(\varepsilon\cup a)$ - __Interval__ of an ordered set $(0...9)(a...z)(A...Z)$ By admitting the use of set-theoretic operations __Intersection__, __Complement__ and __Difference__ we obtain the so called __Extended regular expressions__. ##### Regular languages A language is __regular__ if it is generated by a *regular expression* regexp|language ---|--- $\varepsilon \;\text{or}\;\varnothing$|$\{\varepsilon\}$ $a\in\Sigma$|$\{a\}$ $s \cup t \; \text{or} \; s \mid t$ | $L_s \cup L_t$ $s.t \; \text{or} \; st$ | $L_s.L_t$ $s^*$|${L_s}^*$ ##### example >The regexp $e$ that generates the language of integers, possibly signed $e={(+\cup - \cup\varepsilon)dd}^*$ $L_e=\{+,-,\varepsilon\}\{d\}\{d\}^*$ #### Sets of languages __Family of regular languages__ or __REG__: Is the collection of all regular languages. __Family of finite languages__ or __FIN__: Is the collection of all languages of finite cardinality. Every __finite language__ is *regular* as it is the union of a finite number of finite strings. But family of __regular languages__ contains also languages of *infinite cardinality*. $$FIN\subset REG$$ #### Subexpression of a regexp 1. Consider a *fully parenthesized* regexp $$ e = (a\cup (bb))^*(c^+ \cup (a \cup (bb)))$$ 2. Number every alphabetic symbol that occurs in the regexp $$ e_N = (a_1\cup (b_2b_3))^*(c_4^+ \cup (a_5 \cup (b_6b_7)))$$ 3. Isolate the subexpressions and put them into evidence $$ (a_1\cup (b_2b_3))^*\;\;\;\;c_4^+ \cup (a_5 \cup (b_6b_7))$$ $$ a_1\cup (b_2b_3)\;\;\;\;\;\;c_4^+\;\;\;\;a_5 \cup (b_6b_7)$$ $$ a_1\;\;\;\;b_2b_3\;\;\;\;\;\;c_4^\;\;\;\;a_5 \;\;(b_6b_7)$$ $$ a_1\;\;\;\;b_2\;\;\;b_3\;\;\;\;\;\;c_4^\;\;\;\;a_5 \;\;\;b_6\;\;\;b_7$$ < #### Different parenthesization #### Derivation #### Ambiguity #### Closure of the $REG$ family > A family of languages is said to be closed with respect to an operator $\theta$ if the result language belongs to the same family as the operand languages. The $REG$ family is closed with respect to: - concatenation - union - star therefore it is closed also with respect to derived operators - cross - repetition - optionality Regular languages are also closed with respect to __mirroring__ This means that combining two regular languages with one of the said operators, we obtain another regular language. #####The family $REG$ of the regular languages is the smallest family of languages such that both the following properties hold: - $REG$ contains all finite languages - $REG$ is closet w.r.t conatenation, union, and star. The $LIB$ family of the context-free languages is closed w.r.t. concatenation, union and star but is not the smallest of such family, in fact: $$REG \subset LIB$$