lesson_03.md 3.5 KB

FLC - lesson 03

Luca Oddone Breveglieri
8 october 2015

Regular Expressions

Regular Languages are the simplest family of formal languages.
Can be defined in different ways:

  • algebrically
  • through generating grammars
  • through recognizer machines

Regular Expression (regexp) is a string $r$ defined with the terminal alphabet $\Sigma$
containing some metasymbols

Possible cases:

$r,s,t$ are regexp

  1. $r=\varnothing$
  2. $r=a, \;\;a\in \Sigma$
  3. $r=s\cup t\;\; \text{or}\;\; r=s|t$
  4. $r=s.t\;\; \text{or}\;\; r=st$
  5. $r=s^$ <!---->

    Precedence of operators
  6. star *

  7. concatenation .

  8. union $\cup$ or |

Derived operators
  • Power $e^h=ee...e_{n-times}$
  • Repetition from k to n>k times $[a]_k^n=a^k\cup a^{k+1}\cup ...\cup a^n$
  • Optionality $[a]=(\varepsilon\cup a)$
  • Interval of an ordered set $(0...9)(a...z)(A...Z)$

By admitting the use of set-theoretic operations Intersection, Complement and Difference we obtain the so called Extended regular expressions.

Regular languages

A language is regular if it is generated by a regular expression

regexp language
$\varepsilon \;\text{or}\;\varnothing$ ${\varepsilon}$
$a\in\Sigma$ ${a}$
$s \cup t \; \text{or} \; s \mid t$ $L_s \cup L_t$
$s.t \; \text{or} \; st$ $L_s.L_t$
$s^*$ ${L_s}^*$
example

The regexp $e$ that generates the language of integers, possibly signed
$e={(+\cup - \cup\varepsilon)dd}^$ $L_e={+,-,\varepsilon}{d}{d}^$

Sets of languages

Family of regular languages or REG:
Is the collection of all regular languages.

Family of finite languages or FIN:
Is the collection of all languages of finite cardinality.

Every finite language is regular as it is the union of a finite number of finite strings.

But family of regular languages contains also languages of infinite cardinality.
$$FIN\subset REG$$

Subexpression of a regexp

  1. Consider a fully parenthesized regexp $$ e = (a\cup (bb))^*(c^+ \cup (a \cup (bb)))$$
  2. Number every alphabetic symbol that occurs in the regexp $$ e_N = (a_1\cup (b_2b_3))^*(c_4^+ \cup (a_5 \cup (b_6b_7)))$$
  3. Isolate the subexpressions and put them into evidence $$ (a_1\cup (b_2b_3))^\;\;\;\;c_4^+ \cup (a_5 \cup (b_6b_7))$$ $$ a_1\cup (b_2b_3)\;\;\;\;\;\;c_4^+\;\;\;\;a_5 \cup (b_6b_7)$$ $$ a_1\;\;\;\;b_2b_3\;\;\;\;\;\;c_4^\;\;\;\;a_5 \;\;(b_6b_7)$$ $$ a_1\;\;\;\;b_2\;\;\;b_3\;\;\;\;\;\;c_4^\;\;\;\;a_5 \;\;\;b_6\;\;\;b_7$$ <<!---->

    Different parenthesization

    Derivation

    Ambiguity

    Closure of the $REG$ family

A family of languages is said to be closed with respect to an operator $\theta$ if the result language belongs to the same family as the operand languages.

The $REG$ family is closed with respect to:

  • concatenation
  • union
  • star

therefore it is closed also with respect to derived operators

  • cross
  • repetition
  • optionality

Regular languages are also closed with respect to mirroring

This means that combining two regular languages with one of the said operators, we obtain another regular language.

#####The family $REG$ of the regular languages is the smallest family of languages such that both the following properties hold:

  • $REG$ contains all finite languages
  • $REG$ is closet w.r.t conatenation, union, and star.

The $LIB$ family of the context-free languages is closed w.r.t. concatenation, union and star but is not the smallest of such family, in fact: $$REG \subset LIB$$