Final Year Project Initial Report

(rev. 5)










Submitted for the

BSc Degree in Software Engineering

April 2000


















A Modular Architecture Independent Multiprocessing Capable Hardware Abstraction Layer


Niall Douglas



Index of Contents


Chapter Page No

1: Introduction 3

2: The design of the hardware 4

3: The design of the HAL 6

4: The testing of the HAL 9

5: Conclusions 10

6: Bibliography 11


Chapter 1: Introduction

My final year project is to construct a modular architecture independent multiprocessing capable Hardware Abstraction Layer.

A Hardware Abstraction Layer (or HAL for short) is the collection of the lowest layers of an operating system or device driver which encapsulate all architecture-dependent code. In other words, you can completely port an operating system or device driver to another architecture by rewriting some or all of the HAL of that operating system or driver.

Having your operating system or device driver based on a HAL is desirable as porting it to other architectures is made considerably more straightforward as the work necessary is completely contained within one area of the source. Microsoft Windows NT and Linux are two major operating systems which use HALs - however, most real-time embedded operating systems also use a HAL eg; Cygnus' eCOS.

However despite this increasing use of HALs, there is currently no standard imposed on their design or capabilities. Hitachi's ITRON specification lays out certain minimum standards required of operating systems, but there is no minimum standard required of a HAL.

This project aims to construct a definitive HAL which provides all the facilities required even of the most demanding of operating systems. Hence the HAL will be:

Modular: The HAL will be comprised of self-contained pieces of code called modules. Modules define a certain API to the outside world which uses them and hence one module can be interchanged for another without the code using it knowing. For example, if one had code running on an ARM7 processor and one wanted to get it running on an ARM9, one would simply replace the ARM7-based module with the ARM9 one.

Architecture-independent: Obviously, a HAL’s sole purpose is to abstract hardware details away from an operating system in such a way that merely the HAL needs to be rewritten to port the OS to a new platform. However, this HAL through a layered and modular design strategy aims to minimise the amount of the HAL which needs rewriting across similar hardware.

Multiprocessing capable: The HAL provides support for working with multiple processors within the same system. Distributed and shared memory architectures should be supported.


Weeks 1-6: Evaluate possibilities of design and general structure of HAL
Week 6: Submit initial report detailing intended design and testing of the HAL
Weeks 6-12: Write all uniprocessor stuff. Get uC/OS running on HAL in a single-processor environment
Week 12: Submit portfolio
Weeks 12-16: Exams, Christmas break and Millennium celebrations
Weeks 16-25: Get project working on multi-processor hardware, implement PCI bus code & hence multiprocessing support
Week 25: Have HAL & uC/OS correctly running across multiple processors on real hardware. Submit project.


Chapter 2: The design of the hardware


There were three main constraints on the choice of development hardware:

As an impoverished student, cost is probably the most pressing. While uniprocessor hardware is cheap and plentiful, multiprocessor hardware is expensive and rare. x86 based architectures are the most common and therefore the cheapest – however, developing from scratch on that architecture seems foolish when given that within the embedded systems arena, there are quite a number of considerably newer & superior architectures available which are also cheap. However, these embedded architectures rarely offer multiprocessor solutions cheaply.

After conducting an extensive survey of the available options, I chose the ARM architecture as it offers a multitude of implementations within an established market presence, simple RISC-based architecture, flat 32-bit address space and a range of development tools. I also have extensive industrial experience in using the ARM architecture so the learning curve would be much less steep.

As for what multiprocessor configuration in which to use the ARM, I considered two choices: the traditional uniform memory architecture (UMA) where there is a single data store used by all processors or a non-uniform memory architecture (NUMA) where the memory is distributed across all processors.











The UMA and NUMA architectures respectively

As both architectures allow for the access of any memory location from any processor, the design of the software is made considerably easier. Also, the distributed memory form where some memory is very fast for a particular processor to access (ie; on-card memory) and remaining memory is slower (ie; the off-card memory available over the processor connection bus) offers the advantage of reducing necessary memory bandwidth and hence substantially increasing scalability. In others words, if the HAL can handle distributed memory architectures it will gain a significant speed increase over a single memory architecture whilst remaining compatible with such an architecture. Hence it seemed sensible to choose a distributed NUMA memory architecture for the development hardware.

Intel make a companion chip for its StrongARM processor, an integrated "motherboard on a chip" device called the 21285. This provides a memory controller, interrupt controller, DMA controller, timers, 16550 compatible UART and PCI bus interface logic. Using this device one can construct a cheap & fast minimal computer on a standard PCI card. Intel make just such a card called the EBSA-285. It remains to be seen if this card shall be used in the development of this project.

A layout of the proposed hardware (courtesy of Intel)

Obviously, one can plug many of these cards into a PCI bus – hence you now have the proposed distributed memory architecture with a bandwidth of 128Mb/sec (32 bit channel @ 33Mhz, the PCI 2.1 specification) between processors which is ample for our needs.

This solution of using an Intel SA-110 microprocessor with 21285 footbridge is cheap, easily available and an ideal solution for this project's development hardware needs.

One final advantage of using the 21285 is its provisions for a JTAG debug port. However, the use of this requires some very expensive hardware (the ARM Multi-ICE development toolkit) so unless this can be borrowed from someone, traditional serial-based debugging tools will have to be used. These are expected to be the ARM SDT’s debugger called Angel and possibly the older ARM debugger called Demon (Angel does not debug RTOS’s well whereas Demon does despite being much older and deprecated).


Chapter 3: The design of the HAL



There are two main choices of development software for the HAL – that of ARM’s own SDT, or that of Cygnus’ public domain GNU tools.

The ARM SDT is an industry standard embedded software design system with a tried and trusted ANSI-compliant Norcroft C compiler whose code-base dates back into the mid-80’s. However, it also costs in the region of US$4500.

The GNU solution uses tools originally developed for UNIX desktop application development and has only very recently been retargeted as an embedded solution. Even worse, its ARM support is at best described as functional – there are only a handful of working examples like eCOS. However, it is completely free.

ARM supply an evaluation version of their SDT which is fully functional but times out after sixty days of use. Given the poor state of the only other alternative, it was decided that using the evaluation version of the SDT was the best choice.

One advantage of the ARM SDT is its ARMulator, a simple emulated ARM system. This can be used to develop the HAL long before any actual hardware is available. It is expected that great use of the ARMulator shall be made during the course of this project, especially as ICE-based debugging tools are unlikely to be available due to their prohibitive cost.

One major point of reference which is expected to be used is the DEC ÁHAL, written by the Digital StrongARM team during the development of the StrongARM and the 21285 footbridge before their takeover by Intel. ÁHAL contains substantial amounts of 21285 control code, including the PCI logic control code. Hence development of the multiprocessing control code should be made much easier.

Design Goals

The HAL should be composed of a number of core modules, a number of build-time extension modules and a number of run-time extension modules which provide support for various peripherals. Some of these modules can be multiply included, depending on what those modules represent – so for example, a 16550 serial controller extension module could be included many times so that multiple 16550 controllers can be accessed. Similar things could apply for IDE, SCSI or video controllers.

The ability to reconfigure itself dependent on detected hardware is desirable as this allows a single image to run on different architectures. However, the ability to generate an image which is as small and as fast as possible should not be excluded either.

Routines to abstract the detail of accessing other processor’s memory or performing context switches should also be provided, along with basic synchronisation functions which can use processor features to implement local processor semaphores (actually, the ARM only implements one multiprocessor synchronisation function: the atomic load and store instruction).

Layers of the proposed HAL

In addition to the modular design, a two layer Application Programming Interface (API) is desirable. Using this, a simple OS could be made completely portable if it only used the upper layer which is intended to provide all necessary operations which are common across most modern processors and architectures. The lower layer though is also available to OS code should it need the extra processor and architecture specific control.


Core Modules

Some of the core modules guaranteed to exist currently are:

The idea is to encapsulate all variables across all possible ARM’s within the first three core modules – hence the HAL could be ported to an ARM9 processor purely by rewriting those modules. The reason why core, MMU and cache modules are separate is because an ARM6 MMU is identical to an ARM7 MMU although the cache and core control is different. Hence a minimum of reconfiguration would be necessary to port between different ARM’s.

The DebugIO module provides serial i/o based debug support. This module can be "attached" to any UART device driver module and hence debug i/o may be performed through a multitude of media – although for this project it is likely to be just for the 16550 UART and the Angel Debug Monitor UART (see below).

The upper HAL API layer will encapsulate the PCI bus controller functionality within one module whilst the specific 21825 control code would be contained in a separate module within the lower API layer. Hence a different PCI bus controller could be used merely by implementing a new module while the generic PCI bus controller module remains unchanged. The upper HAL API layer will also encapsulate access to other processor’s memory within another module. Hence a single memory architecture could be handled simply by replacing this module with a very simple one which merely implemented memory protection if desired.

Relationship of HAL modules


Extension Modules

Some of the extension modules guaranteed to exist currently are:

The 16550-compatible UART module is primarily for debug access as having a UART per processor means simultaneous debug of multiple processors is made possible. The 21285 contains a 16550-compatible UART so no additional hardware will be required.

Compatibility with a serial debugger is made possible through the ARM SDT Angel debug monitor UART module. The ARM SDT’s main debugger is called Angel and it can operate either through an ICE interface or through an installable Angel debug monitor which "hosts" the debuggee code. The Angel UART module makes the appropriate "magic" calls to allow emulation of a serial UART without disrupting the debug protocol running over the UART.


Expected design issues

The majority of the HAL design is expected to be reasonably straightforward. However, there are two substantial areas which are expected to present problems during the construction of the project:

  1. The PCI interface code
  2. Cache coherency

The PCI Interface code

I have no experience with working with the PCI bus prior to this. The full PCI 2.1 specification is an extremely capable piece of hardware which encapsulates pretty much every functionality you could ever demand from a generic bus interface. Options exist for triggering multiple types of interrupt on remote PCI devices, or performing 64-bit DMA transfers across the bus, or even making one device act as bus arbiter and effectively become the central control processor of the system (even if only temporarily). The 21285 core logic controller allows most PCI 2.1 operations to be carried out and if an effective multiprocessing solution is to be implemented then every facility provided by the 21285 controller will need to be used (eg; background DMA operations). In addition, the register level access to the 21285 is no trivial matter, so it is expected that considerable time will be spent getting the PCI code working.

Cache coherency

The Intel StrongARM processor is a Harvard architecture processor – hence it has separate instruction and data caches and the data cache operates in a write-back mode. Worse still, the proposed hardware design contains no support for hardware based cache coherency (indeed neither does the StrongARM itself) – hence, it shall have to be implemented entirely through software.

The simplest method of doing this is to disable the data caches on all processors – however given the severe performance penalties associated with this, it is hoped a better solution can be arrived at. Part of this project will be to devise an optimum multiprocessing solution.


Chapter 4: The testing of the HAL


Part of the project will be to port a small Real-Time Operating System (RTOS) which supports:

To test the port, I shall also write an extremely simple test program which does the following:

To extend this, I shall conduct some form of benchmarking to see how my cache coherency system scales according to processor number.

One of the most important parts of any HAL is that the operating system be completely portable to most other architectures simply through a rewrite of the HAL. Hence the RTOS will exclusively use the upper layer provided by the HAL.


The choice of the small real-time operating system

Much thought was given towards the implementation of the test operating system. It was decided that given that the project’s focus is upon the HAL rather than the OS, a public-domain RTOS should be chosen and ported to the HAL. This would also be helpful in evaluating how well the HAL fits its design criteria in a real-world test.

Probably the most popular public-domain RTOS is uC/OS, originally a real-time kernel written by Jean J. Labrosse for the x86 processor and published in the "Embedded Systems Programming" magazine in 1992. It has since been ported to various other processors, including the ARM6 and the StrongARM under Digital uHAL.

In 1998 Labrosse released his sequel book "MicroC/OS-II The Real-Time Kernel" which featured uC/OS II, an RTOS based on the original but more portable and with more features. A further advantage of uC/OS II is that it still remains free for use to educational establishments so as a result no licensing problems would arise. After extensive evaluation of uC/OS II, it was decided to use this as the HAL test harness RTOS.

Some reconstruction of uC/OS II will be necessary as its task switcher merely switches processor contexts – it does not remap memory or even change any of the environment (hence this form of multitasking is far more like multithreading under Win32). I will extend uC/OS II, using the appropriate upper layer HAL API’s, to provide elementary process control code and hence a test of the HAL’s multiprocessor support.


Chapter 5: Conclusions

At this early stage (30th October 1999), the future looks bright. The hardware is currently in final testing by a third party and is expected to be available at the latest by the start of the second semester. Much of the uniprocessor software development can be made upon cheap alternative hardware such as the Sharp LH77790 or the Cirrus Logic CL-PS7111 integrated microcontrollers during the first semester. There should be no reason hardware-wise why this project cannot be completed.

For the first time within the industry, a definitive standard in Hardware Abstraction Layers will set. There should be no conceivable conventional operating system which could not be ported to this HAL and it is hoped that any standards the HAL sets will be reflected in operating systems to come. It remains to be seen if this or any other standard catches on, but if it does then another small step of standardising the embedded systems market will have been made.

Niall Douglas

30th October 1999



Chapter 6: Bibliography


MicroC/OS-II The Real-Time Kernel (pub. October 1998, ISBN: 0-87930-543-6), by Jean J. Labrosse. Web ref:

uHAL for the DEC StrongARM, by Dave Rusling at the Digital Equipment Corporation’s StrongARM division. Web ref:

ARM reference materials, by ARM Ltd. Web ref:

eCOS public-domain embedded RTOS, by Cygnus. Web ref:

The ITRON project, by the TRON association. Web ref:

The LH77790 microcontroller documentation, by Sharp Ltd. Web ref:

The CL-PS7111 microcontroller documentation, by Cirrus Logic. Web ref: