|
@@ -7,32 +7,25 @@
|
|
|
__Regular Languages__ are the simplest family of formal languages.
|
|
|
Can be defined in different ways:
|
|
|
- algebrically
|
|
|
-- through generating grammars
|
|
|
-- through recognizer machines
|
|
|
+- through __generating grammars__
|
|
|
+- through __recognizer machines__
|
|
|
|
|
|
-__Regular Expression__ (*regexp*) is a string $r$ defined with the terminal alphabet $\Sigma$
|
|
|
-containing some *metasymbols*
|
|
|
-##### Possible cases:
|
|
|
+__Regular Expression__ or *regexp*
|
|
|
+is a string $r$ defined over the elements of a terminal alphabet $\Sigma$, using one of the following *metasymbols* $\;\cdot\;\cup\;\varnothing\;(\;)$
|
|
|
+#### regexp cases:
|
|
|
$r,s,t$ are regexp
|
|
|
1. $r=\varnothing$
|
|
|
2. $r=a, \;\;a\in \Sigma$
|
|
|
3. $r=s\cup t\;\; \text{or}\;\; r=s|t$
|
|
|
4. $r=s.t\;\; \text{or}\;\; r=st$
|
|
|
5. $r=s^*$ <!--*-->
|
|
|
-##### Precedence of operators
|
|
|
-1. star *
|
|
|
-2. concatenation .
|
|
|
+#### Precedence of operators
|
|
|
+1. star $*$
|
|
|
+2. concatenation $\;\cdot$
|
|
|
3. union $\cup$ or |
|
|
|
+<!--*-->
|
|
|
|
|
|
-##### Derived operators
|
|
|
-- __Power__ $e^h=ee...e_{n-times}$
|
|
|
-- __Repetition__ from k to n>k times $[a]_k^n=a^k\cup a^{k+1}\cup ...\cup a^n$
|
|
|
-- __Optionality__ $[a]=(\varepsilon\cup a)$
|
|
|
-- __Interval__ of an ordered set $(0...9)(a...z)(A...Z)$
|
|
|
-
|
|
|
-By admitting the use of set-theoretic operations __Intersection__, __Complement__ and __Difference__ we obtain the so called __Extended regular expressions__.
|
|
|
-
|
|
|
-##### Regular languages
|
|
|
+#### Regular languages
|
|
|
|
|
|
A language is __regular__ if it is generated by a *regular expression*
|
|
|
|
|
@@ -44,23 +37,27 @@ $s \cup t \; \text{or} \; s \mid t$ | $L_s \cup L_t$
|
|
|
$s.t \; \text{or} \; st$ | $L_s.L_t$
|
|
|
$s^*$|${L_s}^*$
|
|
|
|
|
|
-##### example
|
|
|
+##### examples
|
|
|
>The regexp $e$ that generates the language of integers, possibly signed
|
|
|
-$e={(+\cup - \cup\varepsilon)dd}^*$ $L_e=\{+,-,\varepsilon\}\{d\}\{d\}^*$
|
|
|
+$e={(+\cup - \cup\varepsilon)dd}^*\;\;\;L_e=\{+,-,\varepsilon\}\{d\}\{d\}^*$
|
|
|
+>The language over the binary alphabet $\{a,b\}$, such that the number of $a$ in every string is odd and every string contains at least one $b$
|
|
|
+$A_p=b^*({ab}^*{ab}^*)^*\;\;A_d=b^*{ab}^*({ab}^*{ab}^*)^*$
|
|
|
+$L=A_pbA_d | A_dbA_p$
|
|
|
+<!--*-->
|
|
|
|
|
|
-#### Sets of languages
|
|
|
-__Family of regular languages__ or __REG__:
|
|
|
-Is the collection of all regular languages.
|
|
|
+### Sets of languages
|
|
|
+__Family of regular languages__ or $REG$:
|
|
|
+Is the collection of all *regular languages*.
|
|
|
|
|
|
-__Family of finite languages__ or __FIN__:
|
|
|
-Is the collection of all languages of finite cardinality.
|
|
|
+__Family of finite languages__ or $FIN$:
|
|
|
+Is the collection of all *languages of finite cardinality*.
|
|
|
|
|
|
Every __finite language__ is *regular* as it is the union of a finite number of finite strings.
|
|
|
|
|
|
But family of __regular languages__ contains also languages of *infinite cardinality*.
|
|
|
$$FIN\subset REG$$
|
|
|
|
|
|
-#### Subexpression of a regexp
|
|
|
+### Subexpression of a regexp
|
|
|
1. Consider a *fully parenthesized* regexp
|
|
|
$$ e = (a\cup (bb))^*(c^+ \cup (a \cup (bb)))$$
|
|
|
2. Number every alphabetic symbol that occurs in the regexp
|
|
@@ -70,11 +67,67 @@ $$ (a_1\cup (b_2b_3))^*\;\;\;\;c_4^+ \cup (a_5 \cup (b_6b_7))$$
|
|
|
$$ a_1\cup (b_2b_3)\;\;\;\;\;\;c_4^+\;\;\;\;a_5 \cup (b_6b_7)$$
|
|
|
$$ a_1\;\;\;\;b_2b_3\;\;\;\;\;\;c_4^\;\;\;\;a_5 \;\;(b_6b_7)$$
|
|
|
$$ a_1\;\;\;\;b_2\;\;\;b_3\;\;\;\;\;\;c_4^\;\;\;\;a_5 \;\;\;b_6\;\;\;b_7$$
|
|
|
-<<!--*-->
|
|
|
-#### Different parenthesization
|
|
|
-#### Derivation
|
|
|
-#### Ambiguity
|
|
|
-#### Closure of the $REG$ family
|
|
|
+<!--*-->
|
|
|
+### Different parenthesization
|
|
|
+And hence different interpretations of a regexp
|
|
|
+$$f=(a\cup bb)^*(c^+\cup a\cup bb)$$
|
|
|
+$$f=(a\cup bb)^*(c^+\cup (a\cup bb))$$
|
|
|
+$$f=(a\cup bb)^*((c^+\cup a)\cup bb)$$
|
|
|
+<!--*-->
|
|
|
+Choosing one of the *interpretations*,
|
|
|
+I obtain a new regexp defining a smaller language that the original one.
|
|
|
+
|
|
|
+### Derivation
|
|
|
+Between two regexp $e'$ and $e''$:
|
|
|
+$$e'\Rightarrow e''$$
|
|
|
+If the two regexp $e'$ and $e''$ can be factored as
|
|
|
+$$e'=\alpha\beta\gamma \;\;\; e''=\alpha\delta\gamma$$
|
|
|
+With
|
|
|
+- $\beta$ subexpression of $e'$
|
|
|
+- $\gamma$ subexpression of $e''$
|
|
|
+- $\gamma$ is a *choice* of $\beta$
|
|
|
+##### examples
|
|
|
+- one step derivation
|
|
|
+$$a^*\cup b^+\Rightarrow a^*\;\;\;a^*\cup b^+\Rightarrow b^+$$
|
|
|
+- multiple derivation
|
|
|
+$$a^*\cup b^+\Rightarrow a^*\Rightarrow\varepsilon\;\;\;a^*\cup b^+\overset{2}{\Rightarrow} \varepsilon$$
|
|
|
+
|
|
|
+### Equivalence
|
|
|
+Two regexps are __equivalent__ if they define the same language
|
|
|
+
|
|
|
+The language defined by a *derived regexp* is contained in the language defined by the *deriving regexp*
|
|
|
+
|
|
|
+### Ambiguity
|
|
|
+A string may be obtained by two derivations differing not only in the order of choices
|
|
|
+$$(a\cup b)^*a(a\cup b)^*$$
|
|
|
+$$(a\cup b)^*a(a\cup b)^* \Rightarrow (a\cup b)a(a\cup b)^*\Rightarrow aa(a\cup b)^*\Rightarrow aa\varepsilon\Rightarrow aa$$
|
|
|
+$$(a\cup b)^*a(a\cup b)^* \Rightarrow \varepsilon a(a\cup b)^*\Rightarrow \varepsilon a(a\cup b)\Rightarrow \varepsilon aa\Rightarrow aa$$
|
|
|
+<!--*-->
|
|
|
+
|
|
|
+A regexp is __ambiguous__ if the corresponding marked (with numbers on letters) regexp $f_\#$ generates two marked strings $x$ and $y$ such that, removing the indices, we obtain the same string (sufficient condition, not necessary)
|
|
|
+##### examples
|
|
|
+$$(aa|ba)^*a|b(aa|ba)^*\;\; \text{is ambiguous}$$
|
|
|
+in fact the marking
|
|
|
+$$(a_1a_2|b_3a_4)^*a_5|b_6(a_7a_8|b_9a_{10})^*$$
|
|
|
+Generates the two strings
|
|
|
+$$b_3a_4a_5\;\;\;b_6a_7a_8$$
|
|
|
+That project ambiguously to the same phrase: $\;\;\;baa$
|
|
|
+
|
|
|
+### Other operators
|
|
|
+__Basic operators:__ union, concatenation, star
|
|
|
+
|
|
|
+__Derived operators:__
|
|
|
+- __power__ $e^h = \underset{h\;times}{ee...e}$
|
|
|
+- __cross__ $e^+ = {ee}^*$
|
|
|
+<!--*-->
|
|
|
+- __Repetition__ from k to n>k times $[a]_k^n=a^k\cup a^{k+1}\cup ...\cup a^n$
|
|
|
+
|
|
|
+- __Optionality__ $[a]=(\varepsilon\cup a)$
|
|
|
+- __Interval__ of an ordered set $(0...9)(a...z)(A...Z)$
|
|
|
+
|
|
|
+By admitting the use of set-theoretic operations __Intersection__, __Complement__ and __Difference__ we obtain the so called __Extended regular expressions__.
|
|
|
+
|
|
|
+### Closure of the $REG$ family to operations
|
|
|
|
|
|
> A family of languages is said to be closed with respect to an operator $\theta$ if the result language belongs to the same family as the operand languages.
|
|
|
|
|
@@ -92,9 +145,39 @@ Regular languages are also closed with respect to __mirroring__
|
|
|
|
|
|
This means that combining two regular languages with one of the said operators, we obtain another regular language.
|
|
|
|
|
|
-#####The family $REG$ of the regular languages is the smallest family of languages such that both the following properties hold:
|
|
|
+##### The family $REG$ of the regular languages is the smallest family of languages such that both the following properties hold:
|
|
|
- $REG$ contains all finite languages
|
|
|
- $REG$ is closet w.r.t conatenation, union, and star.
|
|
|
|
|
|
The $LIB$ family of the context-free languages is closed w.r.t. concatenation, union and star but is not the smallest of such family, in fact:
|
|
|
$$REG \subset LIB$$
|
|
|
+
|
|
|
+### Linguistic Abstraction
|
|
|
+Linguistic abstraction transforms the phrases of a real language to a simple form, called __abstract representation__
|
|
|
+
|
|
|
+Compiler design makers reference to the *abstract language* to process, not to the *effective language*
|
|
|
+
|
|
|
+Artificial languages (e.g. programming languages) contain few abstract structures, one of these is __lists__ which can be easily modeled by regexps
|
|
|
+
|
|
|
+## Effective and abstracts lists
|
|
|
+
|
|
|
+A __list__ is a sequence of *elements* with number not fixed in advance
|
|
|
+An __element__ can be a __terminal__ (alphabetic symbol) or a __compound object__ (for instance, a list itself)
|
|
|
+
|
|
|
+##### example:
|
|
|
+List with separators and start-marker of end-marker
|
|
|
+$$ie(se)^*f\;\;\;i[e(se)^*]f$$
|
|
|
+
|
|
|
+### Substitution
|
|
|
+Operation that replaces the terminal characters of the source language with the phrases of the destination languages
|
|
|
+##### example
|
|
|
+$$L\subset \Sigma^*\;\;\;L_b\subset\Delta^*$$
|
|
|
+$$x\in L \;\;\;x=a_1a_2...a_n\;\;\text{and for some}\;\;\;a_i=b$$
|
|
|
+Substituting the language $L_b$ to the letter b in the string x produces a language over the alphabet $(\Sigma\setminus \{b\})\cup \Delta$
|
|
|
+$$\{y|y=y_1y_2...y_n\wedge (ifa_i\neq b\;\text{then}\;y_i=a_i\;\text{else}\;y_i\in L_b)\}$$
|
|
|
+
|
|
|
+### Nested lists (or precedence lists)
|
|
|
+Lists may contain atomic objects as elements, but also other lists of lower level
|
|
|
+##### example
|
|
|
+>livello 1: begin istr1;istr2;...;istrn end
|
|
|
+livello 2: STAMPA(var1,var2,...,varn)
|