# Advanced Operating Systems and Embedded Systems - exercise session 1 #### Davide Zoni ###### 13 October 2016 ## Embedded Systems - Multicore Design ### Course structure Multi core architectures require an interconnection layer to be exploited fully. The course is structured in three parts: - On chip communication - Architectural simulation Used to test architectural solutions without designing a complete RTL architecture. - RTL Design Verification and Simulation This course will be focused mainly on the power requirements, not covering power, reliability and timing for lack of time. __Exam structure__ Written exam: 23 points Project + presentation: 9 points ## Multi-core history From 2000-2005 the CPUs development began hitting the "frequency wall" And the power consumption and single thread performance limits aswell. But the market demands for more processing power, so the response was development of the multi-core architecture. #### Market and Energy Efficiency We have two conflicting needs. The market asks for devices that are able to run the same applications no matter what the underlying hardware. while the technology viewpoint says that on mobile phones we have limited resources, for example battery, so it's not easy to raise performance. __Reference book__: Low Power Methodology Manual, ARM&Synopsis, 2007 Halving the size of transistor does not halve the power consumption, so reducing size brings to increasing the power density. ## Realiability #### Escape Bugs: They are bugs that ship with the device, due to improper design, and they produce a different results from the specification. Simple architectures have less escape bugs, and usually the number rises as user discovers them. #### Hard Faults: Hardware damage due to not proper usage, ex: overvolting. To test an architecture for escape bugs you have to try all the inputs for all the possible states, because usually the CPUs are not *stateless* 5Million flip-flops means 2^5million states. Usually __Bug Inspection__ is done partitioning the design and testing the more common states to be bug-free This testing is constrained by time-to-market. But later the verified portion can be expanded. ### Multi-cores In general it is more likely to have multiple applications using few resuources, than a single application using more resources. ## On-chip interconnects Usually the main types of traffic on *on-chip interconnections* are: - Data from *load* and *store* OPs - Data for cache coherence. ### Different Architectures - __Point-to-point__ Used by intel since 2010 in i5 CPUs It is fast and simple but grows with n\*(n-1) to the number of cores - __Bus__ Simple but usually need a split architectures, for example a bus between CPU+L1 and L2 and a slower one between L2 and memory. - __Crossbar__ is a non-blocking architectures until two sender are reaching the same destination or vice versa. The crossbar is no more than a *multi-bus* architecture, it can be *full* or not. The idea is that the more bus used, the more power consumption we have. - __Network-on-chip__: ## On-chip Bus Architecture The bus is manager by an __arbiter__ that is not shown usually on schemes. The bus is not a single line/wire a shown in theory In fact there are three different architectures: - Tri-state: - AndOr: - Mux: is the most common, it does not have native broadcast support, it can scale better but if i want to send a broadcast message (ex: for snooping) i have to consume more power. The BUS is the ideally placed in the between of the cores, to reduce the length of the wires. The BUS has also to avoid __deadlock__ situations. A __bridge__ is a piece of hardware that interfaces two different communication protocols.