Skip to the content of the web site.

Information for Prospective Students

Last Updated: January 2023. Note that thesis and project lists are subject to change.

Undergraduate Students

Are you an outstanding undergraduate student looking to see what research is all about? Or perhaps are you thinking about graduate studies? There are two main ways you can be involved in my group.

  • ECE 499: this is the suggested way for 4th year students. ECE 499 can be taken as an elective course in either the 4A or 4B term. It consists of a research project supervised by an ECE faculty member. I am always looking for excellent students with a good average and an interest in either the 20s (computer architecture) or 50s (computer software) areas. If you are interested in taking ECE 499 with me in the upcoming semester, please send me an email with ECE499 in the title, a description of your interests and a brief academic record.
  • Undergraduate Research Assistantship (URA): I sometimes offer URAs based on needs. An URA requires you to work six hours per week on a specific project. It does not count towards your academic requirements, but you get paid. This opportunity is more suitable to second and third year students willing to work on a more scope-limited project. The list of available projects is subject to change and I do not offer URAs every term, so if you are interested your best option is to contact me directly. Please note that only students with 80%+ average are eligible, and make sure that you can commit to an extra six hours of work every week before applying.

Here are some examples of past ECE 499 / URA projects:

  • Porting applications to PREM: the student refactored a complex embedded application (image processing) according to the MISRA-C standard for safety-critical embedded system, and converted it to the PRedictable Execution Model (PREM).
  • Implementing cache locking on an embedded platform: the student ported our framework for cache partitioning and locking in COTS systems on a small ARM-based embedded board.
  • Multicore OS porting: the student ported FreeRTOS, a popular open-source real-time OS, to a dual-core embedded board.

Graduate Students

My group has openings both at the MASc and PhD level, to perform research on hardware/software architectures for the next generation of embedded systems (cyber-physical systems). I am looking for students with an interest in one or more of the following broad areas:

  • Operating Systems
  • Compilers
  • Computer Architecture
  • Performance, Timing Analysis and System Optimization
  • Real-Time Scheduling and Resource Management

If you plan to apply to ECE at U Waterloo and you would like to work in my group, you are encouraged to contact me by email. Point out why you are interested in my research and why you think you would be a good addition to the team. If I like your background, I will usually ask you to apply for admission in the next available term (the university admits students in the Fall, Winter and Spring term). Once applications are released to faculty for review, if I like your application, I will usually ask you for a phone/skype interview.

Please note that admission to U Waterloo is highly competitive. I can not positively answer all requests. Past research experience in my core areas is not required for MASc students (however, expertise based on university courses and projects certainly helps), but it is a requirement when seeking direct admission in the PhD program. Many students in the department are first admitted in the MASc program and then move to the PhD program with the agreement of their advisor.

The following are some ideas for the type of Master/PhD thesis that you would be likely undertaking in my group; note that I strongly encourage independent and original research by all students.

Real-Time OS for Systems-on-Chip

By integrating all system components on a single die, the System-on-Chip (SoC) paradigm promises to greatly increase reliability and reduce costs, packaging size and power consumption of embedded systems. The explosion of the smart-phone market is a prime testament to such trend, but SoCs are also becoming more and more popular in real-time systems such as those employed in the avionic, automotive and medical industry.

Unfortunately, current real-time Operating Systems (OSs) are ill-suited to modern SoC architectures, because they rely on a critical assumption: the CPU is the only active component in the system. However, SoCs typically contain a variety of different processors, such as GPU, packet processors, compressors, etc., all of which are active components able to initiate communication and access memory. Hence, traditional CPU-based protection and isolation mechanisms are not sufficient anymore.

The goal of this thesis would be to design new OS abstraction and mechanisms to support strict isolation and timing predictability for applications running on multiple, heterogeneous processors on a SoC. Possible research topics include: 1) memory and cache partitioning; 2) task execution model; 3) driver model; 4) allocation of services; 5) virtualization and memory protection; 6) security guarantees for CPS. Key objectives would be predictable performance, and ease of certification for safety-critical applications. Key skills include OS and system software design, but also real-time scheduling (not necessarily by the same student).

Predictable Compilation for Parallel Programs

The goal of this thesis would be to extend automatic program transformation according to the PRedictable Execution Model (PREM) to parallel programs. In particular, we envision that affine computational kernels could be analyzed according to the polyhedral model and automatically transformed to execute according to the load - execution - unload PREM paradigm.

A first main complexity is how to optimize the program over multiple loop levels. A second main complexity is how to optimize across distributed loops. Both situation are very common in computational kernels used, for example, in neural networks. A third main complexity is to optimze the load / unload phases over non-regular memory access patterns; we envision that such issue could be ameliorated by generating custom DMA units through HLS in FPGA.

We are interested in targeting multiple architectures found in modern SoC, including: 1) arrays of cores using two-level scratchpad memory; 2) GPU; 3) AI cores.

Novel Predictable Architectures for Real-Time SoCs

High-performance computer architectures are normally designed to optimize average-case performance. Howeer, such optimizations often rely on speculative features, such as prefetching and request reordering, which can adversely affect worst-case scenarios. For this reason, it has been difficult to provide timing guarantees on the latecy of memory requests in modern SoC platform.

As a consequence, researchers in the real-time domain have devised a set of architectures (including caches, buses, main memory controllers...) specifically designed to provide tight latecy bounds. However, such bounds are generally achieved by disabling most of the optimizations targeted at average performance. Hence, existing designs exhibit a fundamental trade-off between average performance and worst-case guarantees.

Our goal for this research is to overcome such trade-off by designing architectures that provide tight latency bounds with minimal performance degradation. In particular, we have recently introduced the Duetto paradigm that achieves such result by pairing a conventional memory arbiter with a real-time one. The goal for this thesis would be to demonstrate that the paradigm can be extended throughout the memory hierarchy, and support application-specific configuration. Activities would involve both architectural design based on cycle-accurate simulation, as well as RTL design (not necessarily by the same student), with the goal of demonstrating the paradigm on a RISC-V based platform.

Timing Analysis and System Optimization for Complex Integrated Architectures

Modern Cyber-Physical Systems (CPS) are complex, integrated architectures. Timing analysis is crucial to ensure that the computation performed in the cyber part of the system (electronic components) correctly interacts with the physical world. Unfortunately, such analysis is made more complex by the presence of multiple cyber and physical resources shared both among hardware components and software applications. Such shared resources include processing cycles, interconnection bandwidth, cache space, memory bandwidth, power consumption and many more.

To avoid over-pessimism in the timing analysis, it is essential to properly configure and partition shared resources among software partitions / virtual machines. The key idea is that we can avoid worst-case scenarios through a careful assignment of shared resources. For example, CPU scheduling could be altered to avoid that two memory-intensive tasks run at the same time on a multi-core. The two key goals of this thesis would be to: 1) study optimization algorithms to best allocate resources based on profiling information about executed applications; 2) leverage less-pessimistic timing analysis based on the introduced resource isolation.