talk.tex 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344
  1. \documentclass{beamer}
  2. %\setbeamercovered{transparent}
  3. \usetheme{poul}
  4. %\usetheme{Madrid}
  5. \usepackage[utf8]{inputenc}
  6. \usepackage[svgpath=images/]{svg}
  7. \usepackage{graphicx}
  8. \graphicspath{ {images/} }
  9. \usepackage[hyphenbreaks]{breakurl}
  10. \usepackage{hyperref}
  11. \def\UrlBreaks{\do\/\do-}
  12. %Information to be included in the title page:
  13. \title{Backup and (hopefully) Restore}
  14. \author{Andrea Gussoni}
  15. \institute{P.O.u.L.}
  16. \date{23 Marzo 2017}
  17. \titlegraphic{\includesvg[height=1.5cm]{logowhite}}
  18. \begin{document}
  19. \frame{\titlepage}
  20. \begin{frame}
  21. \frametitle{Why do we need backups?}
  22. Bad things can happen and do happen:
  23. \begin{itemize}
  24. \item You may drop your computer accidentally.
  25. \item The disk may be damaged by vibrations during the daily commute.
  26. \item The computer where you keep the unique copy of your thesis
  27. may be stolen.
  28. \item After some time the disk may simply stop operating because of ageing.
  29. \item But often the principal cause of data loss is that thing that it is between the keyboard and the chair.
  30. \end{itemize}
  31. \end{frame}
  32. \begin{frame}
  33. \frametitle{Why do we need backups?}
  34. \begin{center}
  35. \includegraphics[width=0.7\textwidth]{gitlab}
  36. \end{center}
  37. \footnotetext{\url{https://twitter.com/gitlabstatus/status/826591961444384768}}
  38. \end{frame}
  39. \begin{frame}
  40. \frametitle{What are backups?}
  41. \begin{block}{Definition}
  42. The copying and archiving of computer data so that it may be
  43. used to restore the original after a data loss event.
  44. \end{block}
  45. \end{frame}
  46. \begin{frame}
  47. \frametitle{What to backup?}
  48. It is important to distinguish what it is necessary to backup from what
  49. it is not.\\\pause
  50. Obviously this depends on the setup that you are using (native services, containers, VMs etc...)
  51. \end{frame}
  52. \begin{frame}
  53. \frametitle{A general guideline}
  54. Must:
  55. \begin{itemize}
  56. \item /home
  57. \end{itemize}
  58. \vfill
  59. At your discretion:
  60. \begin{itemize}
  61. \item /etc
  62. \item /var
  63. \end{itemize}
  64. \vfill
  65. Not necessary\footnote{if these folders contain something important probably you are doing something wrong in your setup}:
  66. \begin{itemize}
  67. \item /proc /sys /tmp
  68. \item /dev /mnt /media
  69. \end{itemize}
  70. \end{frame}
  71. \begin{frame}
  72. \frametitle{Backup types}
  73. Backups can be:
  74. \begin{itemize}
  75. \item \textbf{full}: a complete backup of a all files and folder starting from a root node.
  76. \item \textbf{incremental}: contains all the differences since the last incremental backup.
  77. \item \textbf{differential} contains the changes since the last full backup.
  78. \end{itemize}
  79. \end{frame}
  80. \begin{frame}
  81. \frametitle{Backup Support}
  82. \begin{itemize}
  83. \item Hard disks (HDD).
  84. \item Solid-State drives (SSD).
  85. \item Optical supports: DVDs, Blu-ray.
  86. \item Flash Drives.
  87. \item Cloud\footnote{Remember that there is no cloud, just other people's computers.}.
  88. \end{itemize}
  89. \end{frame}
  90. \begin{frame}
  91. \frametitle{dd}
  92. \textbf{dd} is a powerful tool that basically can copy everything that is a file or a block device. It is common to use it for disk cloning.\\
  93. Usage example:
  94. \begin{itemize}
  95. \item \textit{dd if=/dev/sdX of=/dev/sdY \&\& sync\footnote{useful to actually wait the end of data transfer and avoid corrupted copies}}
  96. \begin{itemize}
  97. \item \textbf{if:} input file/device
  98. \item \textbf{out:} output file/device
  99. \end{itemize}
  100. \end{itemize}
  101. \vfill\pause
  102. \begin{alertblock}{Caution}
  103. Since \textbf{dd} often requires \textit{sudo} privileges to run, if you mismatch the name of a device you can actually wipe the content of your primary hard disk, double check always the arguments before pressing enter.
  104. \end{alertblock}
  105. \end{frame}
  106. \begin{frame}
  107. \frametitle{GNU ddrescue}
  108. gdrescue is an enhanced version of dd that tries to rescue good parts in case of read errors. It may be useful to recover data from a drive with some damaged sector.\\
  109. Usage Example:
  110. \begin{itemize}
  111. \item \textit{ddrescue [options] /dev/sdX outfile mapfile}
  112. \begin{itemize}
  113. \item \textbf{mapfile:} a human readable text file ddrescue uses to manage the copy
  114. \end{itemize}
  115. \end{itemize}\pause
  116. \begin{alertblock}{Caution}
  117. For the rescued data to be correct, both dd and gddrescue are best used on unmounted devices.
  118. \end{alertblock}\pause
  119. \begin{block}{Tip}
  120. gddrescue can also be useful when trying to reallocate sectors on a drive with a few sector unreadable. Doing a wipe of the drive with gddrescue should reallocate bad sectors.
  121. \end{block}
  122. \end{frame}
  123. \begin{frame}
  124. \frametitle{rsync}
  125. Also known as an advanced version of cp
  126. \begin{exampleblock}{Pros}
  127. \begin{itemize}
  128. \item (unlike cp) preserves hard and symbolic links, file permissions and ownerships, modification times, etc.
  129. \item designed to be network efficient because only transfers file changes.
  130. \item easy to use.
  131. \end{itemize}
  132. \end{exampleblock}
  133. \begin{alertblock}{Cons}
  134. \begin{itemize}
  135. \item no storage encryption.
  136. \end{itemize}
  137. \end{alertblock}
  138. \end{frame}
  139. \begin{frame}
  140. \frametitle{rsync: usage}
  141. \begin{itemize}
  142. \item rsync -Pr source destination
  143. \begin{itemize}
  144. \item \textbf{P:} keep partially transferred files if the transfer is interrupted.
  145. \item \textbf{r:} recursive directory option.
  146. \end{itemize}
  147. \vfill
  148. \pause
  149. \item rsync source host:destination\footnote{But please don't do this \textit{rsync -av --delete source host:$\sim$/}}
  150. \begin{itemize}
  151. \item uses ssh by default, but can also be forced with the -e ssh option.
  152. \end{itemize}
  153. \vfill
  154. \pause
  155. \item rsync -aAXv --exclude=\{...\} /* /backup folder
  156. \begin{itemize}
  157. \item backup /* while following symlinks and preserving file properties.
  158. \end{itemize}
  159. \end{itemize}
  160. \end{frame}
  161. \begin{frame}
  162. \frametitle{rsnapshot: rsync automated}
  163. rsnapshot produces automated, periodical system snapshots
  164. \vfill
  165. \begin{exampleblock}{Pros}
  166. \begin{itemize}
  167. \item preserves hard and symbolic links, file permissions and ownership, modification times, etc.
  168. \item network efficient.
  169. \item each snapshot contains a full system backup.
  170. \item easy to use.
  171. \end{itemize}
  172. \end{exampleblock}
  173. \vfill
  174. \begin{alertblock}{Cons}
  175. \begin{itemize}
  176. \item no storage encryption.
  177. \end{itemize}
  178. \end{alertblock}
  179. \end{frame}
  180. \begin{frame}
  181. \frametitle{duplicity}
  182. duplicity produces encrypted, incremental backups in tar format.
  183. \begin{exampleblock}{Pros}
  184. \begin{itemize}
  185. \item preserves hard and symbolic links, file permissions and ownership, modification times, etc.
  186. \item network efficient.
  187. \item incremental backups.
  188. \item supports storage encryption with gpg.
  189. \item easy to use.
  190. \end{itemize}
  191. \end{exampleblock}
  192. \end{frame}
  193. \begin{frame}
  194. \frametitle{duplicity: usage}
  195. \begin{itemize}
  196. \item duplicity /home/user scp::/user@host//backup/directory
  197. \vfill\pause
  198. \item duplicity [restore] scp://user@host//backup/directory /home/user
  199. \vfill\pause
  200. \item duplicity full /home/user scp::/user@host//backup/directory
  201. \end{itemize}
  202. \end{frame}
  203. \begin{frame}
  204. \frametitle{duplicity: usage}
  205. \begin{itemize}
  206. \item duplicity list-current-files scp::/user@host//backup/directory
  207. \begin{itemize}
  208. \item list the files contained in the backup.
  209. \end{itemize}
  210. \vfill\pause
  211. \item duplicity [restore] -t 3D scp://user@host//backup/directory /home/user
  212. \begin{itemize}
  213. \item specify the time from which to restore files.
  214. \end{itemize}
  215. \vfill\pause
  216. \item duplicity remove-older-than 30D scp::/user@host//backup/directory
  217. \begin{itemize}
  218. \item remove from the backup full backups older than the specified period.
  219. \end{itemize}
  220. \end{itemize}
  221. \end{frame}
  222. \begin{frame}
  223. \frametitle{Demo}
  224. \begin{center}
  225. {\Huge Demo!}
  226. \end{center}
  227. \end{frame}
  228. \begin{frame}
  229. \frametitle{Last but not Least}
  230. \begin{itemize}
  231. \item When you use duplicity with encryption enabled always remember to backup the gpg keys you use to encrypt and sign the backup.\\
  232. If you loose them you won't be able to restore the backup.\pause
  233. \item Always check that the backup is taking place, don't just assume that everything is working fine because you followed exactly the suggested guide.\pause
  234. \item Always try to test that the backup is really working by trying to restore the backup. You'll be surprised to know how many times the backup procedures are not really working, and unfortunately if you do not test them you'll notice it only when the files are gone.
  235. \end{itemize}
  236. \end{frame}
  237. \begin{frame}
  238. \frametitle{Hi again GitLab}
  239. \begin{center}
  240. \includegraphics[width=0.9\textwidth,height=0.5\textheight]{gitlab2}
  241. \end{center}
  242. \footnotetext{\url{https://docs.google.com/document/d/1GCK53YDcBWQveod9kfzW-VCxIABGiryG7_z_6jHdVik/pub}}
  243. %\pause
  244. %\footnotesize{I don't want to put shame on GitLab for this incident, but only to use it as a case study.\\ In fact I think that the incident has been managed really well by the GitLab Team.\\
  245. %Instead of starting blaming each other and finding silly excuses as usually happens in cases like this, they have been really open from the beginning about the problem and put as a priority the restore of the functionality of the service.}
  246. \end{frame}
  247. \begin{frame}
  248. \frametitle{Before the Backup}
  249. A different approach to data protection is to use RAID (\textit{Redundant Array of Independent Disks}).\\
  250. \pause
  251. In general what we try to obtain with RAID is:
  252. \begin{itemize}
  253. \item Survival of the system if a disk failure happen.
  254. \item In certain conditions we can achieve higher performances compared to the single disk case.
  255. \end{itemize}
  256. \footnotetext{For further informations you can visit \url{https://www.digitalocean.com/community/tutorials/an-introduction-to-raid-terminology-and-concepts}}
  257. \end{frame}
  258. \begin{frame}
  259. \frametitle{RAID Configurations}
  260. \begin{figure}
  261. \centering
  262. \includesvg[width = 75pt]{RAID_0}
  263. \includesvg[width = 75pt]{RAID_1}\\
  264. \includesvg[width = 150pt]{RAID_5}
  265. \end{figure}
  266. \end{frame}
  267. \begin{frame}
  268. \frametitle{Problems}
  269. RAID can help in the event of a disk failure, but it doesn't protect us against \textbf{Silent Data Corruption}.\\\pause
  270. To address this problem new generation filesystems like ZFS or Btrfs have been created. Classical features that we can find in this kind of filesystems are:
  271. \begin{itemize}
  272. \item CopyOnWrite.
  273. \item Deduplication.
  274. \item Data \& Metadata checksums.
  275. \item Integrated RAID.
  276. \item Volume Management.
  277. \item Snapshots.
  278. \end{itemize}
  279. \end{frame}
  280. \begin{frame}
  281. \frametitle{Snapshots}
  282. \begin{itemize}
  283. \item Snapshots can be particularly useful because they allow us to obtain an (almost) instant snapshot of a volume that we can restore later, archive somewhere etc.\\\pause
  284. \item So we can use them in order to do some potential risky modifications on a system and restore the previous state with a little effort.\\\pause
  285. \item Remember that having a separate \textit{classical} backup is always useful, in particular for important data of our applications.\pause
  286. \item RAID is not a backup.
  287. \end{itemize}
  288. \end{frame}
  289. \begin{frame}
  290. \frametitle{References}
  291. \begin{itemize}
  292. \item \url{https://wiki.archlinux.org/index.php/Full_system_backup_with_rsync}
  293. \item \url{https://wiki.archlinux.org/index.php/Duplicity}
  294. \item \url{http://duplicity.nongnu.org/}
  295. \item \url{https://www.digitalocean.com/community/tutorials/how-to-use-duplicity-with-gpg-to-securely-automate-backups-on-ubuntu}
  296. \item \url{https://github.com/zertrin/duplicity-backup.sh}
  297. \end{itemize}
  298. \end{frame}
  299. \begin{frame}
  300. \frametitle{Special Thanks}
  301. I used as reference and starting point for this presentation the material of the previous editions of the course.\\
  302. Special thanks to \textit{Valeria Mazzola}\footnote{\url{https://slides.poul.org/2016/corsi-linux-avanzati/Backup_and_Restore.pdf}} and \textit{Federico Amedeo Izzo}\footnote{\url{https://filesystem.izzo.ovh/}} for the slides of the two previous edition of this talk.
  303. \end{frame}
  304. \begin{frame}
  305. \frametitle{License}
  306. \begin{center}
  307. {\Huge Thank you!}
  308. \vfill
  309. \includesvg[height=1.5cm]{by-sa}\\
  310. {\footnotesize These slides are published under a Creative Commons Attribution-ShareAlike 4.0 license.}
  311. \end{center}
  312. \end{frame}
  313. \end{document}