Regular Languages are the simplest family of formal languages.
Can be defined in different ways:
Regular Expression or regexp
is a string $r$ defined over the elements of a terminal alphabet $\Sigma$, using one of the following metasymbols $\;\cdot\;\cup\;\varnothing\;(\;)$
$r,s,t$ are regexp
A language is regular if it is generated by a regular expression
regexp | language |
---|---|
$\varepsilon \;\text{or}\;\varnothing$ | ${\varepsilon}$ |
$a\in\Sigma$ | ${a}$ |
$s \cup t \; \text{or} \; s \mid t$ | $L_s \cup L_t$ |
$s.t \; \text{or} \; st$ | $L_s.L_t$ |
$s^*$ | ${L_s}^*$ |
The regexp $e$ that generates the language of integers, possibly signed
$e={(+\cup - \cup\varepsilon)dd}^\;\;\;L_e={+,-,\varepsilon}{d}{d}^$
The language over the binary alphabet ${a,b}$, such that the number of $a$ in every string is odd and every string contains at least one $b$
$A_p=b^({ab}^{ab}^)^\;\;A_d=b^{ab}^({ab}^{ab}^)^$
$L=A_pbA_d | A_dbA_p$ <!---->
Family of regular languages or $REG$:
Is the collection of all regular languages.
Family of finite languages or $FIN$:
Is the collection of all languages of finite cardinality.
Every finite language is regular as it is the union of a finite number of finite strings.
But family of regular languages contains also languages of infinite cardinality.
$$FIN\subset REG$$
And hence different interpretations of a regexp
$$f=(a\cup bb)^(c^+\cup a\cup bb)$$
$$f=(a\cup bb)^(c^+\cup (a\cup bb))$$
$$f=(a\cup bb)^((c^+\cup a)\cup bb)$$
<!---->
Choosing one of the interpretations,
I obtain a new regexp defining a smaller language that the original one.
Between two regexp $e'$ and $e''$: $$e'\Rightarrow e''$$ If the two regexp $e'$ and $e''$ can be factored as $$e'=\alpha\beta\gamma \;\;\; e''=\alpha\delta\gamma$$ With
Two regexps are equivalent if they define the same language
The language defined by a derived regexp is contained in the language defined by the deriving regexp
A string may be obtained by two derivations differing not only in the order of choices $$(a\cup b)^a(a\cup b)^$$ $$(a\cup b)^a(a\cup b)^ \Rightarrow (a\cup b)a(a\cup b)^\Rightarrow aa(a\cup b)^\Rightarrow aa\varepsilon\Rightarrow aa$$ $$(a\cup b)^a(a\cup b)^ \Rightarrow \varepsilon a(a\cup b)^\Rightarrow \varepsilon a(a\cup b)\Rightarrow \varepsilon aa\Rightarrow aa$$ <!---->
A regexp is ambiguous if the corresponding marked (with numbers on letters) regexp $f_#$ generates two marked strings $x$ and $y$ such that, removing the indices, we obtain the same string (sufficient condition, not necessary)
$$(aa|ba)^a|b(aa|ba)^\;\; \text{is ambiguous}$$ in fact the marking $$(a_1a_2|b_3a_4)^a_5|b_6(a_7a_8|b9a{10})^$$ Generates the two strings $$b_3a_4a_5\;\;\;b_6a_7a_8$$ That project ambiguously to the same phrase: $\;\;\;baa$
Basic operators: union, concatenation, star
Derived operators:
Repetition from k to n>k times $[a]_k^n=a^k\cup a^{k+1}\cup ...\cup a^n$
Optionality $[a]=(\varepsilon\cup a)$
Interval of an ordered set $(0...9)(a...z)(A...Z)$
By admitting the use of set-theoretic operations Intersection, Complement and Difference we obtain the so called Extended regular expressions.
A family of languages is said to be closed with respect to an operator $\theta$ if the result language belongs to the same family as the operand languages.
The $REG$ family is closed with respect to:
therefore it is closed also with respect to derived operators
Regular languages are also closed with respect to mirroring
This means that combining two regular languages with one of the said operators, we obtain another regular language.
The $LIB$ family of the context-free languages is closed w.r.t. concatenation, union and star but is not the smallest of such family, in fact: $$REG \subset LIB$$
Linguistic abstraction transforms the phrases of a real language to a simple form, called abstract representation
Compiler design makers reference to the abstract language to process, not to the effective language
Artificial languages (e.g. programming languages) contain few abstract structures, one of these is lists which can be easily modeled by regexps
A list is a sequence of elements with number not fixed in advance
An element can be a terminal (alphabetic symbol) or a compound object (for instance, a list itself)
List with separators and start-marker of end-marker $$ie(se)^f\;\;\;i[e(se)^]f$$
Operation that replaces the terminal characters of the source language with the phrases of the destination languages
$$L\subset \Sigma^\;\;\;L_b\subset\Delta^$$ $$x\in L \;\;\;x=a_1a_2...a_n\;\;\text{and for some}\;\;\;a_i=b$$ Substituting the language $L_b$ to the letter b in the string x produces a language over the alphabet $(\Sigma\setminus {b})\cup \Delta$ $${y|y=y_1y_2...y_n\wedge (ifa_i\neq b\;\text{then}\;y_i=a_i\;\text{else}\;y_i\in L_b)}$$
Lists may contain atomic objects as elements, but also other lists of lower level
livello 1: begin istr1;istr2;...;istrn end
livello 2: STAMPA(var1,var2,...,varn)