Sunday, December 15, 2013

System Cache



The system cache is responsible for a great deal of the system performance improvement of today's PCs. The cache is a buffer of sorts between the very fast processor and the relatively slow memory that serves it. (The memory is not really slow, it's just that the processor is much faster.) The presence of the cache allows the processor to do its work while waiting for memory far less often than it otherwise would.
There are in fact several different "layers" of cache in a modern PC, each acting as a buffer for recently-used information to improve performance, but when "the cache" is mentioned without qualifiers, it normally refers to the "secondary" or "level 2" cache that is placed between the processor and system RAM. The various levels of cache are discussed here, in the discussion on the theory and operation behind cache (since many of the principles are the same). However, most of the focus of this section is on the level 2 system cache.

Role of Cache in the PC
In early PCs, the various components had one thing in common: they were all really slow :^). The processor was running at 8 MHz or less, and taking many clock cycles to get anything done. It wasn't very often that the processor would be held up waiting for the system memory, because even though the memory was slow, the processor wasn't a speed demon either. In fact, on some machines the memory was faster than the processor.
In the 15 or so years since the invention of the PC, every component has increased in speed a great deal. However, some have increased far faster than others. Memory, and memory subsystems, are now much faster than they were, by a factor of 10 or more. However a current top of the line processor has performance over 1,000 times that of the original IBM PC!
This disparity in speed growth has left us with processors that run much faster than everything else in the computer. This means that one of the key goals in modern system design is to ensure that to whatever extent possible, the processor is not slowed down by the storage devices it works with. Slowdowns mean wasted processor cycles, where the CPU can't do anything because it is sitting and waiting for information it needs. We want it so that when the processor needs something from memory, it gets it as soon as possible.
The best way to keep the processor from having to wait is to make everything that it uses as fast as it is. Wouldn't it be best just to have memory, system buses, hard disks and CD-ROM drives that just went as fast as the processor? Of course it would, but there's this little problem called "technology" that gets in the way. :^)
Actually, it's technology and cost; a modern 2 GB hard disk costs less than $200 and has a latency (access time) of about 10 milliseconds. You could implement a 2 GB hard disk in such a way that it would access information many times faster; but it would cost thousands, if not tens of thousands of dollars. Similarly, the highest speed SRAM <../../ram/types_SRAM.htm> available is much closer to the speed of the processor than the DRAM <../../ram/types_DRAM.htm> we use for system memory, but it is cost prohibitive in most cases to put 32 or 64 MB of it in a PC.
There is a good compromise to this however. Instead of trying to make the whole 64 MB out of this faster, expensive memory, you make a smaller piece, say 256 KB. Then you find a smart algorithm (process) that allows you to use this 256 KB in such a way that you get almost as much benefit from it as you would if the whole 64 MB was made from the faster memory. How do you do this? The short answer is by using this small cache of 256 KB to hold the information most recently used by the processor. Computer science shows that in general, a processor is much more likely to need again information it has recently used, compared to a random piece of information in memory. This is the principle behind caching.


"Layers" of Cache
There are in fact many layers of cache in a modern PC. This does not even include looking at caches included on some peripherals, such as hard disks. Each layer is closer to the processor and faster than the layer below it. Each layer also caches the layers below it, due to its increased speed relative to the lower levels:
Level                            Devices Cached                                   
Level 1 Cache                Level 2 Cache, System RAM, Hard Disk / CD-ROM 
Level 2 Cache                System RAM, Hard Disk / CD-ROM        
System RAM                  Hard Disk / CD-ROM                           
Hard Disk / CD-ROM      --                                                       
What happens in general terms is this. The processor requests a piece of information. The first place it looks is in the level 1 cache, since it is the fastest. If it finds it there (called a hit on the cache), great; it uses it with no performance delay. If not, it's a miss and the level 2 cache is searched. If it finds it there (level 2 "hit"), it is able to carry on with relatively little delay. Otherwise, it must issue a request to read it from the system RAM. The system RAM may in turn either have the information available or have to get it from the still slower hard disk or CD-ROM. The mechanics of how the processor (really the chipset controlling the cache and memory) "look" for the information in these various places is discussed here <func.htm>.
It is important to realize just how slow some of these devices are compared to the processor. Even the fastest hard disks have an access time measuring around 10 milliseconds. If it has to wait 10 milliseconds, a 200 MHz processor will waste 2 million clock cycles! And CD-ROMs are generally at least 10 times slower. This is why using caches to avoid accesses to these slow devices is so crucial.
Caching actually goes even beyond the level of the hardware. For example, your web browser uses caching itself, in fact, two levels of caching! Since loading a web page over the Internet is very slow for most people, the browser will hold recently-accessed pages to save it having to re-access them. It checks first in its memory cache and then in its disk cache to see if it already has a copy of the page you want. Only if it does not find the page will it actually go to the Internet to retrieve it.

Level 1 (Primary) Cache
Level 1 or primary cache is the fastest memory on the PC. It is in fact, built directly into the processor itself. This cache is very small, generally from 8 KB to 64 KB, but it is extremely fast; it runs at the same speed as the processor. If the processor requests information and can find it in the level 1 cache, that is the best case, because the information is there immediately and the system does not have to wait. The level 1 cache is discussed in more detail here <../../cpu/arch/int/comp_Cache.htm>, in the section on processors.
Note: Level 1 cache is also sometimes called "internal" cache since it resides within the processor.

Level 2 (Secondary) Cache
The level 2 cache is a secondary cache to the level 1 cache, and is larger and slightly slower. It is used to catch recent accesses that are not caught by the level 1 cache, and is usually 64 KB to 2 MB in size. Level 2 cache is usually found either on the motherboard or a daughterboard that inserts into the motherboard. Pentium Pro processors actually have the level 2 cache in the same package as the processor itself (though it isn't in the same circuit where the processor and level 1 cache are) which means it runs much faster than level 2 cache that is separate and resides on the motherboard. Pentium II processors are in the middle; their cache runs at half the speed of the CPU.
Note: Level 2 cache is also sometimes called "external" cache since it resides outside the processor. (Even on Pentium Pros... it is on a separate chip in the same package as the processor.)

Disk Cache
A disk cache is a portion of system memory used to cache reads and writes to the hard disk. In some ways this is the most important type of cache on the PC, because the greatest differential in speed between the layers mentioned here is between the system RAM and the hard disk. While the system RAM is slightly slower than the level 1 or level 2 cache, the hard disk is much slower than the system RAM.
Unlike the level 1 and level 2 cache memory, which are entirely devoted to caching, system RAM is used partially for caching but of course for other purposes as well. Disk caches are usually implemented using software (like DOS's SmartDrive). They are discussed in more detail in the section on hard disk performance <../../hdd/perf/ext_Caching.htm>.

Peripheral Cache
Much like the hard disk, other devices can be cached using the system RAM as well. CD-ROMs are the most common device cached other than hard disks, particularly due to their very slow initial access time <../../cd/perf_Access.htm>, measured in the tens to hundreds of milliseconds (which is an eternity to a computer). In fact, in some cases CD-ROM drives are cached to the hard disk, since the hard disk, despite its slow speed, is still much faster than a CD-ROM drive is.
 </cgi-bin/ads_S.pl?advert=spcg> </cgi-bin/ads_S.pl?advert=spcg>
Advertise on The PC Guide, and reach thousands of potential customers for incredibly reasonable rates! </cgi-bin/ads_S.pl?advert=spcg>
Function and Operation of the System Cache
This section discusses the principles behind the design of cache memory, and explains how the secondary (level 2) cache works in detail. This will give you a much better understanding of how the cache works and what the issues are in its design--at least I hope it will, because that was my primary goal in writing this. I was frustrated as I put the site together with my inability to find anything on the 'net that really explained how the cache worked.
This section is focused on the secondary cache, but in fact, the function of the primary (level 1) cache built into modern processors is in many ways identical: in terms of how associativity works, how the cache is organized, how the system checks for hits, etc. However, many of the implementation details are different.
Note: This is an advanced section with some potentially confusing concepts. I make use of examples in order to hopefully make sure the explanations make sense. You will find this section most helpful if you read all the subsections it contains in order. You may also find reading the section explaining system memory operation and timing <../../ram/timing.htm> instructive. This page also makes extensive reference to memory addresses and locations, and binary numbers. If you are not familiar with binary mathematics, you may want to read this introductory page on the subject <../../../intro/works/comput.htm>.

Why Caching Works
Cache is in some ways a really amazing technology. A 512 KB level 2 cache, caching 64 MB of system memory, can supply the information that the processor requests 90-95% of the time. Think about the ratios here: the level 2 cache is less than 1% of the size of the memory it is caching, but it is able to register a "hit" on over 90% of requests. That's pretty efficient, and is the reason why caching is so important.
The reason that this happens is due to a computer science principle called locality of reference. It states basically that even within very large programs with several megabytes of instructions, only small portions of this code generally get used at once. Programs tend to spend large periods of time working in one small area of the code, often performing the same work many times over and over with slightly different data, and then move to another area. This occurs because of "loops", which are what programs use to do work many times in rapid succession.
Just as one example (there are many), let's suppose you start up your word processor and open your favorite document. The word processor program at some point must read the file and then print on the screen the text it finds. This is done (in very simplified terms) using code similar to this:
·         Open document file.
·         Open screen window.
·         For each character in the document:
·         Read the character.
·         Store the character into working memory.
·         Write the character to the window if the character is part of the first page.
·         Close the document file.
The loop is of course the three instructions that are done "for each character in the document". These instructions will be repeated many thousands of times, and there are hundreds or thousands of loops like these in the software you use. Every time you hit "page down" on your keyboard, the word processor must clear the screen, figure out which characters to display next, and then run a similar loop to copy them from memory to the screen. Several loops are used when you tell it to save the file to the hard disk.
This example shows how caching improves performance when dealing with program code, but what about your data? Not surprisingly, access to data (your work files, etc.) is similarly repetitive. When you are using your word processor, how many times do you scroll up and down looking at the same text over and over, as you edit it? The system cache holds much of this information so that it can be loaded more quickly the second, third, and next times that it is needed.

How Caching Works
In the example in the previous section a loop was used to read characters from a file, store them in working memory, and then write them to the screen. The first time each of these instructions (read, store, write) is executed, it must be loaded from relatively slow system memory (assuming it is in memory, otherwise it must be read from the hard disk which is much, much slower even than the memory).
The cache is programmed (in hardware) to hold recently-accessed memory locations in case they are needed again. So each of these instructions will be saved in the cache after being loaded from memory the first time. The next time the processor wants to use the same instruction, it will check the cache first, see that the instruction it needs is there, and load it from cache instead of going to the slower system RAM. The number of instructions that can be buffered this way is a function of the size and design of the cache.
Let's suppose that our loop is going to process 1,000 characters and the cache is able to hold all three instructions in the loop (which sounds obvious, but isn't always, due to cache mapping techniques). This means that 999 of the 1,000 times these instructions are executed, they will be loaded from the cache, or 99.9% of the time. This is why caching is able to satisfy such a large percentage of requests for memory even though it has a capacity that is often less than 1% the size of the system RAM.

Parts of the Level 2 Cache
The level 2 cache is comprised of two main components. These are not usually physically located in the same chips, but represent logically how the cache works. These parts of the cache are:
·         The Data Store: This is where the cached information is actually kept. When reference is made to "storing something in the cache" or "retrieving something from the cache", this is where the actual data goes to or comes from. When someone says that the cache is 256 KB or 512 KB, they are referring to the size of the data store. The larger the store, the more information that can be cached and the more likelihood of the cache being able to satisfy a request, all else being equal.
·         The Tag RAM: This is a small area of memory used by the cache to keep track of where in memory the entries in the data store belong. The size of the tag RAM--and not the size of the data store--controls how much of main memory can be cached.
In addition to these memory areas are of course the cache controller circuitry. Most of the work of controlling the level 2 cache on a modern PC is performed by the system chipset <../chip/index.htm>.

Structure of the Data Store
Many people think of the cache as being organized as a large sequence of bytes (8 bits each). In fact, on a modern fifth-generation or later PC, the level 2 cache is organized as a set of long cache lines, each containing 32 bytes (256 bits). This means that each time the cache is written to or read from, a transfer of 32 bytes takes place; there is no way to read or write just 1 byte. This is done mainly for performance reasons. At the very least, you can't have less than 64 bits per line of cache, because the data bus on a Pentium or later PC is 64 bits wide <../../cpu/arch/ext_DataSize.htm>. The data store is 256 bits wide because memory is accessed in four-read bursts, and 4 times 64 is 256.
Let's take the case of a 512 KB cache (data store). If we wanted to mentally envision how this memory is structured, instead of seeing a single long column with 524,288 (512 K) individual rows, we should instead see 32 columns and 16,384 (16 K) rows. Each access to the data store is a line (row), and the cache has 16,384 different addresses.

Cache Mapping and Associativity
A very important factor in determining the effectiveness of the level 2 cache relates to how the cache is mapped to the system memory. What this means in brief is that there are many different ways to allocate the storage in our cache to the memory addresses it serves. Let's take as an example a system with 512 KB of L2 cache and 64 MB of main memory. The burning question is: how do we decide how to divvy up the 16,384 address lines in our cache amongst the "huge" 64 MB of memory?
There are three different ways that this mapping can generally be done. The choice of mapping technique is so critical to the design that the cache is often named after this choice:
·         Direct Mapped Cache: The simplest way to allocate the cache to the system memory is to determine how many cache lines there are (16,384 in our example) and just chop the system memory into the same number of chunks. Then each chunk gets the use of one cache line. This is called direct mapping. So if we have 64 MB of main memory addresses, each cache line would be shared by 4,096 memory addresses (64 M divided by 16 K).
·         Fully Associative Cache: Instead of hard-allocating cache lines to particular memory locations, it is possible to design the cache so that any line can store the contents of any memory location. This is called fully associative mapping.
·         N-Way Set Associative Cache: "N" here is a number, typically 2, 4, 8 etc. This is a compromise between the direct mapped and fully associative designs. In this case the cache is broken into sets where each set contains "N" cache lines, let's say 4. Then, each memory address is assigned a set, and can be cached in any one of those 4 locations within the set that it is assigned to. In other words, within each set the cache is associative, and thus the name.
This design means that there are "N" possible places that a given memory location may be in the cache. The tradeoff is that there are "N" times as many memory locations competing for the same "N" lines in the set. Let's suppose in our example that we are using a 4-way set associative cache. So instead of a single block of 16,384 lines, we have 4,096 sets with 4 lines in each. Each of these sets is shared by 16,384 memory addresses (64 M divided by 4 K) instead of 4,096 addresses as in the case of the direct mapped cache. So there is more to share (4 lines instead of 1) but more addresses sharing it (16,384 instead of 4,096).
Conceptually, the direct mapped and fully associative caches are just "special cases" of the N-way set associative cache. You can set "N" to 1 to make a "1-way" set associative cache. If you do this, then there is only one line per set, which is the same as a direct mapped cache because each memory address is back to pointing to only one possible cache location. On the other hand, suppose you make "N" really large; say, you set "N" to be equal to the number of lines in the cache (16,384 in our example). If you do this, then you only have one set, containing all of the cache lines, and every memory location points to that huge set. This means that any memory address can be in any line, and you are back to a fully associative cache.

Comparison of Cache Mapping Techniques
There is a critical tradeoff in cache performance that has led to the creation of the various cache mapping techniques described in the previous section. In order for the cache to have good performance you want to maximize both of the following:
·         Hit Ratio: You want to increase as much as possible the likelihood of the cache containing the memory addresses that the processor wants. Otherwise, you lose much of the benefit of caching because there will be too many misses.
·         Search Speed: You want to be able to determine as quickly as possible if you have scored a hit in the cache. Otherwise, you lose a small amount of time on every access, hit or miss, while you search the cache.
Now let's look at the three cache types and see how they fare:
·         Direct Mapped Cache: The direct mapped cache is the simplest form of cache and the easiest to check for a hit. Since there is only one possible place that any memory location can be cached, there is nothing to search; the line either contains the memory information we are looking for, or it doesn't.
Unfortunately, the direct mapped cache also has the worst performance, because again there is only one place that any address can be stored. Let's look again at our 512 KB level 2 cache and 64 MB of system memory. As you recall this cache has 16,384 lines (assuming 32-byte cache lines) and so each one is shared by 4,096 memory addresses. In the absolute worst case, imagine that the processor needs 2 different addresses (call them X and Y) that both map to the same cache line, in alternating sequence (X, Y, X, Y). This could happen in a small loop if you were unlucky. The processor will load X from memory and store it in cache. Then it will look in the cache for Y, but Y uses the same cache line as X, so it won't be there. So Y is loaded from memory, and stored in the cache for future use. But then the processor requests X, and looks in the cache only to find Y. This conflict repeats over and over. The net result is that the hit ratio here is 0%. This is a worst case scenario, but in general the performance is worst for this type of mapping.
·         Fully Associative Cache: The fully associative cache has the best hit ratio because any line in the cache can hold any address that needs to be cached. This means the problem seen in the direct mapped cache disappears, because there is no dedicated single line that an address must use.
However (you knew it was coming), this cache suffers from problems involving searching the cache. If a given address can be stored in any of 16,384 lines, how do you know where it is? Even with specialized hardware to do the searching, a performance penalty is incurred. And this penalty occurs for all accesses to memory, whether a cache hit occurs or not, because it is part of searching the cache to determine a hit. In addition, more logic must be added to determine which of the various lines to use when a new entry must be added (usually some form of a "least recently used" algorithm is employed to decide which cache line to use next). All this overhead adds cost, complexity and execution time.
·         N-Way Set Associative Cache: The set associative cache is a good compromise between the direct mapped and set associative caches. Let's consider the 4-way set associative cache. Here, each address can be cached in any of 4 places. This means that in the example described in the direct mapped cache description above, where we accessed alternately two addresses that map to the same cache line, they would now map to the same cache set instead. This set has 4 lines in it, so one could hold X and another could hold Y. This raises the hit ratio from 0% to near 100%! Again an extreme example, of course. As for searching, since the set only has 4 lines to examine this is not very complicated to deal with, although it does have to do this small search, and it also requires additional circuitry to decide which cache line to use when saving a fresh read from memory. Again, some form of LRU (least recently used) algorithm is typically used.
Here's a summary table of the different cache mapping techniques and their relative performance:
Cache Type                   Hit Ratio                     Search Speed               
Direct Mapped              Good                           Best                           
Fully Associative           Best                            Moderate                    
N-Way Set Associative, N>1                               Very Good, Better as N Increases      Good, Worse as N Increases         
In the "real world", the direct mapped and set associative caches are by far the most common. Direct mapping is used more for level 2 caches on motherboards, while the higher-performance set-associative cache is found more commonly on the smaller primary caches contained within processors.

Tag Storage
Since each cache line (or set) in the data store is shared by a large number of memory addresses that map to it, we need to keep track of which one is using each cache line at a given time. This is what the tag RAM is used for.
Let's take a look at the same example again: a system with 64 MB of main memory, a 512 KB cache, and 32-byte cache lines. There are 16,384 cache lines, and therefore 4,096 different memory locations that share each line. However, recall that each line contains 32 bytes; that means 32 different bytes can be placed in each line without interfering with each other. So really, there are 128 (4,096 divided by 32) different 32-byte lines of memory that must share a cache spot.
Okay, now to address 64 MB of memory you need 26 address lines (because 2^26 is 64 M) which are numbered from A0 to A25. 512 KB only requires 19 lines, A0 to A18. The difference between these is 7 lines; not surprisingly, since 128 is 2^7. These 7 address lines are what tell you which of the 128 different address lines that can use a given cache line, are actually using it at the moment. That's what the tag RAM is for. There will be as many entries in the tag RAM as there are in the data store, so we will have 16,384 tag RAM lines, although of course these entries are only a few bits wide, not 32 bytes wide like the data store.
Notice that the tag RAM is used early in the process of determining whether or not we have a cache hit. This means that no matter how fast the cache data store is, the tag RAM must be slightly faster.

How the Memory Address Is Used
The memory address provided by the processor represents which byte of information the processor is looking for at a given time. This is looked at in three sections by the cache controller as it does its work of checking for hits. This example is the same as before (64 MB memory, 512 KB cache, direct mapping to keep things simple) so we again have 26 address bits, A0 through A25:
·         A0 to A4: The lowest-order 5 bits represent the 32 different bytes within the data store (2^5 = 32). Recall that the cache we are looking at has 32 byte lines, all of which are moved around together. Therefore, the address bits A0 to A4 are ignored by the cache controller; the processor will use them later to determine which to use of the 32 bytes it receives from the cache.
·         A5 to A18: These 14 bits represent the cache line that this address maps to. 2^14 is 16,384, which is the total number of cache lines in our example, as you recall. This cache line address is used for looking up both the tag address in the tag RAM, and later the actual data in the data store if there is a hit.
·         A19 to A25: These 7 bits represent the tag address, which tells the system which of the possible memory locations that share the cache line (indicated by address lines A5 to A18) is currently using it.
If the numbers in the example change, so do these ranges. If instead we have 32 MB of memory, 128 KB of cache, and 16 byte cache lines, then A0 to A3 are ignored, A4 to A16 represent the cache line address, and A17 to A24 are the tag address.

Cache Write Policy and the Dirty Bit
In addition to caching reads from memory, the system is capable of caching writes to memory. The handling of the address bits and the cache lines, etc. is pretty similar to how this is done when the cache is read. However, there are two different ways that the cache can handle writes, and this is referred to as the "write policy" of the cache.
·         Write-Back Cache: Also called "copy back" cache, this policy is "full" write caching of the system memory. When a write is made to system memory at a location that is currently cached, the new data is only written to the cache, not actually written to the system memory. Later, if another memory location needs to use the cache line where this data is stored, it is saved ("written back") to the system memory and then the line can be used by the new address.
·         Write-Through Cache: With this method, every time the processor writes to a cached memory location, both the cache and the underlying memory location are updated. This is really sort of like "half caching" of writes; the data just written is in the cache in case it is needed to be read by the processor soon, but the write itself isn't actually cached because we still have to initiate a memory write operation each time.
Many caches that are capable of write-back operation can also be set to operate as write-through (not all however), but not generally the other way around.
Comparing the two policies, in general terms write-back provides better performance, but at the slight risk of memory integrity. Write-back caching saves the system from performing many unnecessary write cycles to the system RAM, which can lead to noticeably faster execution. However, when write-back caching is used, writes to cached memory locations are only placed in cache, and the RAM itself isn't actually updated until the cache line is booted out to make room for another address to use it.
As a result, at any given time, there can be a mismatch between many of the lines in the cache and the memory addresses that they correspond to. When this happens, the data in the memory is said to be "stale", since it doesn't have the fresh information yet that was only written to the cache. Memory used with a write-through cache can never be "stale" because the system memory is written at the same time that the cache is.
Normally, stale memory isn't a problem, because the cache controller keeps track of which locations in the cache have been changed and therefore which memory locations may be stale. This is done by using an extra single bit of memory, one per cache line, called the "dirty bit". Whenever a write is cached, this bit is set (made a 1) to tell the cache controller "when you decide to re-use this cache line for a different address, you need to write the current contents back to memory". This dirty bit is normally implemented by adding one extra bit to the tag RAM, instead of using a separate memory chip (to save cost).
However, the use of a write-back cache does entail the small possibility of data corruption if something were to happen before the "dirty" cache lines could be saved to memory. There aren't too many cases where this could happen, because both the memory and the cache are volatile (cleared when the machine is powered off).
On the other hand, consider a disk cache, where system memory is used to cache writes to the disk. Here, the memory is volatile but the disk is not. If a write-back cache is used here, you could have stale data on your disk compared to what is in memory. Then, if the power goes out, you lose everything that hadn't yet been written back to the disk, leading to possible corruption. For this reason, most disk caches allow programs to over-rule the write-back policy to ensure consistency between the cache (in memory) and disk. Disk utilities, for example, don't like write-back caching very much!
It is also possible with many caches to tell the controller "please write out to system memory all dirty cache lines, right now". This is done when it is necessary to make sure that the cache is in sync with the memory, and there is no stale data. This is sometimes called "flushing" the cache, and is especially common with disk caches, for the reason outlined in the previous paragraph.

Summary: The Cache Read/Write Process
Having looked at all the parts and design factors that make up a cache, in this section the actual process is described that is followed when the processor reads or writes from the system memory. This example is the same as in the other sections on this page: 64 MB memory, 512 KB cache, 32 byte cache lines. I will assume a direct mapped cache, since that is the simplest to explain (and is in fact most common for level 2 cache):
The processor begins a read/write from/to the system memory.
Simultaneously, the cache controller begins to check if the information requested is in the cache, and the memory controller begins the process of either reading or writing from the system RAM. This is done so that we don't lose any time at all in the event of a cache miss; if we have a cache hit, the system will cancel the partially-completed request from RAM, if appropriate. If we are doing a write on a write-through cache, the write to memory always proceeds.
The cache controller checks for a hit by looking at the address sent by the processor. The lowest five bits (A0 to A4) are ignored, because these differentiate between the 32 different bytes in the cache line. We aren't concerned with that because the cache will always return the whole 32 bytes and let the processor decide which one it wants. The next 14 lines (A5 to A18) represent the line in the cache that we need to check (notice that 2^14 is 16,384).
The cache controller reads the tag RAM at the address indicated by the 14 address lines A5 to A18. So if those 14 bits say address 13,714, the controller will examine the contents of tag RAM entry #13,714. It compares the 7 bits that it reads from the tag RAM at this location to the 7 address bits A19 to A25 that it gets from the processor. If they are identical, then the controller knows that the entry in the cache at that line address is the one the processor wanted; we have a hit. If the tag RAM doesn't match, then we have a miss.
If we do have a hit, then for a read, the cache controller reads the 32-byte contents of the cache data store at the same line address indicated by bits A5 to A18 (13,714), and sends them to the processor. The read that was started to the system RAM is canceled. The process is complete. For a write, the cache controller writes 32 bytes to the data store at that same cache line location referenced by bits A5 to A18. Then, if we are using a write-through cache the write to memory proceeds; if we are using a write-back cache, the write to memory is canceled, and the dirty bit for this cache line is set to 1 to indicate that the cache was updated but the memory was not.
If we have a miss and we were doing a read, the read of system RAM that we started earlier carries on, with 32 bytes being read from memory at the location specified by bits A5 to A25. These bytes are fed to the processor, which uses the lowest five bits (A0 to A4) to decide which of the 32 bytes it wanted. While this is happening the cache also must perform the work of storing these bytes that were just read from memory into the cache so they will be there for the next time this location is wanted. If we are using a write-through cache, the 32 bytes are just placed into the data store at the address indicated by bits A5 to A18. The contents of bits A19 to A25 are saved in the tag RAM at the same 14-bit address, A5 to A18. The entry is now ready for any future request by the processor. If we are using a write-back cache, then before overwriting the old contents of the cache line, we must check the line's dirty bit. If it is set (1) then we must first write back the contents of the cache line to memory, and then clear the dirty bit. If it is clear (0) then the memory isn't stale and we continue without the write cycle.
If we have a cache miss and we were doing a write, interestingly, the cache doesn't do much at all, because most caches don't update the cache line on a write miss. They just leave the entry that was there alone, and write to memory, bypassing the cache entirely. There are some caches that put all writes into the appropriate cache line whenever a write is done. They make the general assumption that anything the processor has just written, it is likely to read back again at some point in the near future. Therefore, they treat every write as a hit, by definition. This means there is no check for a hit on a write; in essence, the cache line that is used by the address just written is always replaced by the data that was just put out by the processor. It also means that on a write miss the cache controller must update the cache, including checking the dirty bit on the entry that was there before the write, exactly the same as what happens for a read miss.
As complex as it already is :^) this example would of course be even more complex if we used a set associative or fully associative cache. Then we would have a search to do when checking for a hit, and we would also have the matter of deciding which cache line to update on a cache miss.
 </cgi-bin/ads_S.pl?advert=spcg> </cgi-bin/ads_S.pl?advert=spcg>
Advertise on The PC Guide, and reach thousands of potential customers for incredibly reasonable rates! </cgi-bin/ads_S.pl?advert=spcg>
Cache Characteristics
This section discusses the different features of the level 2 cache. These are the characteristics you will normally need to understand when making a motherboard selection, or upgrading the cache in your existing system. Some of the descriptions in this section are explained in much more detail in Function and Operation of the System Cache <func.htm>. The focus of this page is on the higher-level performance aspects of the various cache features.

Cache Speed
There is no single number that dictates completely the "speed" of the system cache. Instead, we must consider the raw speed of the components used, as well as how the circuitry chooses to use them. These considerations are identical to how they are when looking at the system RAM itself; saying "my RAM is 60 ns" tells only a small part of the story <../../ram/timing_Ratings.htm>.
The "raw" speed of the cache is the speed of the RAM chips used to make it. Caches are normally made from static RAM chips (SRAM) <../../ram/types_SRAM.htm>, unlike main system memory which is made from dynamic RAM (DRAM) <../../ram/types_DRAM.htm>. The short version of the difference between the two, is that static RAM is faster but also more expensive. The access speed of SRAMs are normally rated in the tens of nanoseconds. SRAMs normally have a speed of 7 to 20 ns; DRAMs on the other hand are usually 50 to 70 ns.
The speed of the SRAM chips gives the upper bound on performance. It is up to the motherboard and chipset designer to make full use of the speed. Let's consider a Pentium motherboard with a memory bus speed running at 66 MHz. This means 66.66 million cycles per second; if we take the reciprocal of this it gives the cycle time, which is 15 nanoseconds (1 divided by 66 million). In order for the motherboard to be able to read from the cache in one cycle at this speed, the SRAM must be faster than 15 ns in speed (there is some overhead time as well so exactly 15 ns won't work). If the SRAM is faster than this, there will be no additional benefit; if it is slower, timing problems will occur, which usually manifest themselves as memory errors and system lockups.
The tag RAM <func_Tag.htm> used as part of the cache must normally be faster than the actual cache data store <func_Store.htm>. This is because the tag RAM must be read first to check for a cache hit. We want to be able to check the tag and still have enough time to read the cache within a single clock cycle, if we have a hit. So for example, you may find that your system's main cache chips are 15 ns, while the tag may be 12 ns.
The more complicated the cache mapping technique, the more important the difference in speed between the tag and the data store. Simple techniques like direct mapping don't generally require much difference at all. Your system may use the same speed for all the memory in this case; for example, if the system needs 15 ns for the tag and 16 ns for the data store, the motherboard may just specify 15 ns for everything since this is simpler. In any event, if your motherboard doesn't already come with the level 2 cache on it, you should buy for it whatever the motherboard manual or your dealer specifies.
The true speed of any cache, in terms of how quickly it really transfers information to and from the processor so that you get faster speed in your applications, is dependent on the cache controller and other chipset circuits. The capabilities of the chipset determine what kind of transfer technologies your cache can use. This in turn determines your cache's optimal system timing, the number of clock cycles required to move data in and out of the cache. This is discussed in detail in this section <timing.htm>.
The performance of the cache obviously also is greatly dependent on the speed that the cache subsystem is running at. In a typical Pentium machine this is the speed of the memory bus, 66 MHz. However a Pentium Pro processor has an integrated level 2 cache <struct_Integrated.htm>, which runs at full processor speed, normally 180 or 200 MHz. Obviously, this will yield superior performance! The Intel Pentium II uses instead a daughterboard cache <struct_Daughterboard.htm> with level 2 caches running at half the processor speed, which with a 233 or 266 MHz chip will still mean much better performance than running the cache at 66 MHz.

Cache Size
The size of the cache normally refers actually to the size of the data store, where the memory elements are actually stored. A typical PC level 2 cache is either 256 KB or 512 KB, but can be as small as 64 KB on older machines, or as high as 1 MB or even 2 MB. Within processors, level 1 cache usually ranges in size from 8 KB to 64 KB.
The more cache the system has, the more likely it is to register a hit on a memory access, because fewer memory locations are forced to share the same cache line. Let's use an example to illustrate (the same one we used when we discussed cache operation in detail <func.htm>.). We have a system with 64 MB of memory and 512 KB of direct-mapped cache, arranged into 32-byte cache lines. This means that we have 16,384 cache lines (512 K divided by 32). Each line is shared by 4,096 memory addresses (64 MB divided by 16,384). Now if we increase the amount of cache to 1 MB, we will have 32,768 cache lines, and each will only be shared by 2,048 addresses. Conversely, if we leave the cache at 512 KB but increase the system memory to 256 MB, each of the 16,384 cache lines will be shared by 16,384 addresses.
There are many areas in the computer world where Pareto's Law applies, and cache size is definitely one of them. If you have a 256 KB cache on a system using 32 MB, increasing the cache by 100% to 512 KB will probably result in an increase in the hit ratio of less than 10%. Doubling it again will likely result in an increase of less than 5%. In the real world, this differential is not noticeable to most people. However, if you greatly increase the amount of system memory you use, you will probably want to up your cache total as well to prevent a degradation in performance. Just make sure you watch closely the system RAM cacheability issue.

System RAM Cacheability
This is one of the most misunderstood aspects of the caching equation. The amount of RAM that the system can cache is very important if you are going to be using a lot of system memory. Almost all modern fifth generation systems can cache 64 MB of system memory. However, many systems, even newer ones, cannot cache more than 64 MB of memory. Intel's popular 430FX ("Triton I"), 430VX (one of the "Triton II"s, also called "Triton III") and 430TX chipsets, do not cache more than 64 MB of main memory. There are millions and millions of these PCs on the market.
If you put more memory in a system than can be cached, the result is a performance decrease. The speed differential between the cache and memory is significant; that's why we use it. :^) When some of that memory is not cached, the system must go to memory for every access to that uncached memory, which is much slower. In addition, when using a multitasking operating system (pretty much anything other than DOS these days) you can't really control what ends up in cached memory and what ends up in non-cached memory, unless you really know what you are doing.
The keys to how much memory your system can cache are first, the design of the chipset, and second, the width of the tag RAM. The more memory you have, the more address lines you need to specify an address. This means that you have more address bits to store in the tag RAM to use in order to check for a cache hit. Of course if the chipset isn't designed to cache more than 64 MB, an extra wide tag RAM won't help anyway.
Let's take our standard example again; 64 MB of memory, 512 KB cache, 32-byte cache lines. As we described in detail in this section <func_Address.htm>, 64 MB means 26 address lines (A0 to A25); A0 to A4 specify the byte in the cache line, A5 to A18 specify the cache line, and A19 to A25 go into the tag RAM to specify which memory address is currently using the cache line. That's 7 bits; let's say our tag RAM is 8 bits wide, and we are reserving one bit for the "dirty bit", to allow write-back operation of the cache <func_Write.htm>. So we're fine, we have enough tag memory in the cache. Now, suppose we add another 32 MB of memory. To address 96 MB you need another address line, A26, to be held in the tag RAM. Hmm, we have a problem, because now we need 9 bits in our tag RAM and it only has 8.
The only mainstream Pentium chipset to support caching over 64 MB is the 430HX "Triton II" chipset by Intel. In actual fact, caching over 64 MB on this chipset is considered "optional"; the motherboard manufacturer has to make sure to use an 11-bit tag RAM instead of the default 8-bit. The extra 3 bits increase cacheability from 64 MB to 512 MB (2^3=8, and 64*8=512).
Many people confuse the issue of system RAM size and system RAM cacheability. The common thought is that adding more cache will let you cache more RAM, but you can see that really it is the tag RAM and chipset that controls this. Further complicating the matter is that some companies put extra tag RAM on their COASt <struct_COASt.htm> modules. So a user will insert a 256 KB COASt module, and think that increasing his cache let him cache more system memory, when really it was the extra tag RAM that did it.
Pentium Pro PCs use an integrated level 2 cache that contains the tag RAM within it, so none of this is really a concern for these machines. The Pentium Pro will cache up to 4 GB of main memory, basically anything you can throw at it. The Pentium II uses an SEC daughtercard. It has the same general architecture as the Pentium Pro, but due to a design limitation will "only" cache up to 512 MB. This isn't nearly as much of an issue as a 64 MB barrier, but considering that the PII is used in many high-end applications, this might be a concern for some people.
One question that people ask a lot is: "How much will the system slow down if I have more RAM in it than can be cached?" There is no easy answer to this question, because it depends both on the system and what you are doing with it. Somewhere between 5% and 25% is most likely, but you should bear something else in mind: adding real physical memory to the system is one way to avoid the extreme slowdown to the system that occurs when it runs out of real memory and must use virtual memory <../../ram/size_Virtual.htm>. If you are doing heavy multitasking and notice that the system is thrashing, you will always be better off to have more memory, even uncached, instead of having the system swap a great deal to disk. Of course having all the memory cached is still preferred.

Integrated vs. Separate Data and Instruction Caches
Most (all?) level 2 caches work on both data and processor instructions (code, programs). They don't differentiate between the two because they view both as just memory addresses. However, many processors use a split design for their level 1 cache. For example, the Intel "Classic" Pentium (P54C) processor uses an 8 KB cache for data, and a separate 8 KB cache for program instructions. This is more efficient due to the way the processor is designed, and doesn't really affect performance very much compared to a single 16 KB cache, though it might lead to a very slightly lower hit ratio. Each of these caches can have different characteristics. For example they can use different mapping techniques (as they do on the Pentium Pro).

Mapping Technique
The cache mapping technique is another factor that determines how effective the cache is, that is, what its hit ratio and speed will be. This is discussed in detail in this section <func_Mapping.htm>, but briefly, the three types are:
·         Direct Mapped Cache: Each memory location is mapped to a single cache line that it shares with many others; only one of the many addresses that share this line can use it at a given time. This is the simplest technique both in concept and in implementation. Using this cache means the circuitry to check for hits is fast and easy to design, but the hit ratio is relatively poor compared to the other designs because of its inflexibility. Motherboard-based system caches are typically direct mapped.
·         Fully Associative Cache: Any memory location can be cached in any cache line. This is the most complex technique and requires sophisticated search algorithms when checking for a hit. It can lead to the whole cache being slowed down because of this, but it offers the best theoretical hit ratio since there are so many options for caching any memory address.
·         N-Way Set Associative Cache: "N" is typically 2, 4, 8 etc. A compromise between the two previous design, the cache is broken into sets of "N" lines each, and any memory address can be cached in any of those "N" lines. This improves hit ratios over the direct mapped cache, but without incurring a severe search penalty (since "N" is kept small). The 2-way or 4-way set associative cache is common in processor level 1 caches.

Write Policy
The cache's write policy determines how it handles writes to memory locations that are currently being held in cache. Described in more detail here <func_Write.htm>, the two policy types are:
·         Write-Back Cache: When the system writes to a memory location that is currently held in cache, it only writes the new information to the appropriate cache line. When the cache line is eventually needed for some other memory address, the changed data is "written back" to system memory. This type of cache provides better performance than a write-through cache, because it saves on (time-consuming) write cycles to memory.
·         Write-Through Cache: When the system writes to a memory location that is currently held in cache, it writes the new information both to the appropriate cache line and the memory location itself at the same time. This type of caching provides worse performance than write-back, but is simpler to implement and has the advantage of internal consistency, because the cache is never out of sync with the memory the way it is with a write-back cache.
Both write-back and write-through caches are used extensively, with write-back designs more prevalent in newer and more modern machines.

Transactional or Non-Blocking Cache
Most caches can only handle one outstanding request at a time. If a request is made to the cache and there is a miss, the cache must wait for the memory to supply the value that was needed, and until then it is "blocked". A non-blocking cache has the ability to work on other requests while waiting for memory to supply any misses.
The Intel Pentium Pro <../../cpu/fam/g6_PPro.htm> and Pentium II <../../cpu/fam/g6_PII.htm> processors use this technology for their level 2 caches, which can manage up to four simultaneous requests. This is done by using a transaction-based architecture, and a dedicated "backside" <../../cpu/arch/ext_Backside.htm> bus for the cache that is independent of the main memory bus. Intel calls this "dual independent bus" (DIB) architecture.
Next: Cache Transfer Technologies and Timing <timing.htm>

Home <../../../index.htm> - Search <../../../search.htm> - Topics <../../../topic.html> - Up <index.htm>
 </cgi-bin/ads_S.pl?advert=scru> </cgi-bin/ads_S.pl?advert=scru>
Memory upgrades tailored to your PC: improve your system and help support The PC Guide! </cgi-bin/ads_S.pl?advert=scru>
Cache Transfer Technologies and Timing
One of the most important factors directly influencing the performance of the level 2 cache is the technology used to transfer information to and from the processor. There are three main types of cache technology currently in use in motherboards; the capabilities of the chipset (in particular, the cache controller) dictate which your system will use.
"Timing" refers to the number of clock cycles required to perform the data transfers to and from the cache or processor, and this is a function of the technology used (among other things). Timing is a complex matter involving various characteristics of the processor, cache, memory, chipset, etc. Iin general, however, the fewer clock cycles it takes to transfer data, the faster the system. System timing is described in detail here <../../ram/timing.htm>, in the memory chapter.

Cache Bursting
In a typical level 2 cache each cache line contains 32 bytes, and transfers to and from the cache occur 32 bytes (256 bits) at a time. The normal transfer paths (for a fifth- or sixth-generation machine) are only 64 bits wide, which means four transfers are done in sequence. Because the transfers are from consecutive memory locations there is no need to specify a different address after the first one; this makes the second, third and fourth accesses extremely fast.
This high-performance access is called "bursting" or using the cache in "burst mode". All modern level 2 caches use this type of access. The timing, in clock cycles, to perform this quadruple read is normally stated as "x-y-y-y". For example, with ‘3-1-1-1" timing the first read takes 3 clock cycles and the next three take 1 each, for a total of 6. Obviously, the lower these numbers, the better.
Note: This is almost identical to the way burst transfers are done to and from memory <../../ram/timing_Burst.htm> in modern systems, except faster.

Asynchronous Cache
The oldest and slowest type of cache timing is asynchronous cache. Asynchronous means that transfers are not tied to the system clock. A request is sent to the cache, and the cache responds, and this happens independently of what the system clock (on the memory bus) is doing. This is similar to how most system memory works; your typical FPM or EDO memory is also asynchronous (and relatively slow, for this reason.)
Because asynchronous cache is not tied to the system clock, it can have problems dealing with faster clock speeds. At slow speeds like 33 MHz it is capable of 2-1-1-1 timing (which is very good) but at speeds like 60 or 66 MHz as used in modern Pentium class PCs it drops down to 3-2-2-2 (which is pretty bad.) For this reason, asynchronous cache is commonly found on 486 class motherboards but is not generally used on Pentium or later class machines.

Synchronous Burst Cache
Unlike asynchronous cache, which operates independently of the system clock, synchronous cache is tied to the memory bus clock. Each tick of the system clock, a transfer can be done to or from the cache (if it is ready). This means that it is capable of handling faster system speeds without slowing down the way asynchronous cache does. However, the faster the system runs, the faster the SRAM chips have to be, in order to keep up. Otherwise timing problems (crashes, lockups) occur.
Even this type of cache slows down at very high speeds. It is capable of 2-1-1-1 operation up to 66 MHz, but then it slows down to 3-2-2-2 at higher speeds (which are starting to become more popular and will become even moreso in the future). Synchronous burst cache never quite caught on; pipelined burst cache was developed at around the same time and seemed to take the market away from sync burst before the latter could really get going.

Pipelined Burst (PLB) Cache
Pipelining is a technology commonly used in processors <../../cpu/arch/int/exec_Pipelining.htm> to increase performance; in the pipelined burst (PLB) cache it is used in a similar way. PLB cache adds special circuitry that allows the four data transfers that occur in a "burst" to be done partially at the same time. In essence, the second transfer begins before the first transfer is done, just the way you can start pouring a second gallon of fluid down a pipeline before the first gallon has finished exiting the other side.
Because of the complexity of the circuitry, a bit more time is required initially to set up the "pipeline". For this reason, pipelined burst cache is slightly slower than synchronous burst cache for the initial read, requiring 3 clock cycles instead of 2 for sync burst. However, this parallelism allows PLB cache to burst at a single clock cycle for the remaining 3 transfers even up to very high clock speeds; this means 3-1-1-1 speed up to even 100 MHz bus speeds. PLB cache is now the standard for almost all quality Pentium class motherboards.

Comparison of Transfer Technology Performance
The table below shows a summary of the theoretical maximum system performance of the various cache technologies at different system bus speeds. It is theoretical because it is only possible with a chipset that supports it, fast enough cache memory and other factors. Note how, interestingly, synchronous burst is the best at the 60 and 66 MHz bus speeds common on so many Pentium machines today. Despite this it is not nearly as common as pipelined burst cache. Fortunately, PLB cache is only slightly slower, and holds more potential for use at the higher system speeds that should take the market by storm in 1998:
Bus Speed (MHz)                33            50            60            66            75            83            100      
Asynchronous    2-1-1-1     3-2-2-2     3-2-2-2     3-2-2-2     3-2-2-2     3-2-2-2      3-2-2-2     
Synchronous Burst             2-1-1-1     2-1-1-1     2-1-1-1     2-1-1-1     3-2-2-2      3-2-2-2      3-2-2-2  
Pipelined Burst   3-1-1-1     3-1-1-1     3-1-1-1     3-1-1-1     3-1-1-1     3-1-1-1      3-1-1-1     
Next: Cache Structure and Packaging <struct.htm>

Home <../../../index.htm> - Search <../../../search.htm> - Topics <../../../topic.html> - Up <index.htm>
This page has been served 22357 times.  The PC Guide (http://www.PCGuide.com)  </cgi-bin/ads_S.pl?advert=spcd> </cgi-bin/ads_S.pl?advert=spcd>
Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
Cache Structure and Packaging
System cache can come in many different physical forms. This section describes the different types of packaging that cache is normally found in. Which type your system uses is a function of your processor, chipset and motherboard.

Integrated Level 2 Cache
The Intel Pentium Pro processor comes with an integrated level 2 cache. The "chip" that you plug into the motherboard is really two chips. One is the processor itself (including the level 1 cache) and the other is the level 2 cache. These processors are available with 256 KB, 512 KB and 1 MB of level 2 cache. This is a very performance-enhancing design, because it allows the level 2 cache to run at the processor's internal speed (usually 180 or 200 MHz) instead of just the system bus speed (60 or 66 MHz). It also gives you one less thing to worry about in setting up a new system, because all of the support circuitry, tag RAM etc., is inside the chip.
One drawback of this design is that it is not possible to increase the level 2 cache without replacing the processor. These processors are also very expensive due to the difficulty of manufacturing the large chip required for the level 2 cache. Regular cache is made of many small chips, whereas this one is made from one large chip. In addition, defects in the level 2 cache often are not discoverable until after the processor and cache are put into their shared package; this means the processor has to be discarded as well if a defect is found in the cache chip. This is the main reason that Intel moved away from putting integrated cache on its Pentium II processor. No other CPUs currently use this design and it is unlikely that any more ever will.
The integrated level 2 cache of the Pentium Pro is also faster than the older cache used with fifth generation systems due to performance enhancements. The main one is that the cache is transactional, or non-blocking <char_Transactional.htm>.

Daughterboard Cache
Starting with the Pentium II processor (a.k.a. "Klamath") <../../cpu/fam/g6_PII.htm> Intel has introduced a new form of packaging, called SEC (Single Edge Contact) <../../cpu/char/pack_SEC.htm>. The integrated cache of the Pentium Pro processors ran at processor speed and offered very high performance, but was very expensive to manufacture. The motherboard cache of the regular Pentium was easy and cheap to produce but offered lower performance. SEC is a compromise where the processor and cache are mounted together on a small "daughterboard" that plugs into the motherboard. This greatly reduces manufacturing costs, and also means that a bad cache chip doesn't result in the processor being wasted.
This type of cache runs at a faster speed than it would if it were on the motherboard, but slower than an integrated cache; this is why it is a compromise between the other two designs. On the Pentium II the level 2 cache runs at half the processor speed. So a 266 MHz Pentium II will have a 133 MHz level 2 cache. Not as good as the 200 MHz Pentium Pro's integrated cache, but a lot faster than running it at 66 MHz. The Pentium II's cache is also non-blocking <char_Transactional.htm>, like the Pentium Pro's.
Note: Even though the Pentium II has an architecture very similar to that of the Pentium Pro, due to a design limitation it will only cache the first 512 MB of system memory. The Pentium Pro will cache up to 4 GB of system memory.

Motherboard Cache
The most common cache design places the chips directly on the motherboard. On some older designs the cache is several SRAM chips in sockets (which means it can be replaced, but also means it is more prone to certain types of failures). On most newer motherboards it is in the form of 1 to 4 chips soldered directly to the board. If the cache is socketed, you can in some cases add extra SRAM chips to increase the size of the data store. The exact chips you need to add depend on the motherboard; your manual is a necessity here.
Some motherboard support the use of both soldered cache and also a COASt module. To use both you may need to change a jumper setting on the motherboard.
Warning: There are some motherboards that actually have fake level 2 cache on them. These are most common on 486 motherboards with two or so flat cache chips soldered directly to the motherboard. In some cases, these chips are actually just empty plastic packages! In many cases the BIOS is even hacked so that it will report external (level 2) cache even when it doesn't exist. You can test for this by disabling the external cache <../bios/set/adv_External.htm>. If you disable it and see no performance difference in a good benchmark program, the cache may be fake.

COASt Modules
Some motherboards use a cache packaging format called COASt, which stands for "Cache On A Stick". This is a silly name for what is in effect a small circuit board similar to a single inline memory module (SIMM) <../../ram/pack_SIMM.htm> that contains cache SRAM chips on it. It is inserted into a special socket on the motherboard often called a CELP ("card edge low profile"). Some motherboards only use this socket for cache, some have only motherboard cache, and some have both. Usually jumpers are used in this last case to tell the board what is being used, although some boards will autodetect when a COASt module is added. See this procedure <../../../proc/physinst/coast.htm> for instructions on adding a COASt module to the motherboard.
The CELP socket could have evolved into a standard of sorts for COASt modules, much the way SIMMs and DIMMs are (mostly) standardized in the memory area. However, this has not happened. Despite standard-sounding names like "COASt V1.2" and whatnot, you cannot rely on just any old COASt working in your motherboard. While many manufacturers share COASt module types, many others use proprietary designs. It's important to contact your motherboard vendor or manufacturer to ensure you obtain the correct type for your PC.
Note: The COASt module often contains not just more data store for holding cached entries, but also more tag RAM to allow for more system memory to be cached. See here for more details <char_Cacheability.htm>.
 </cgi-bin/ads_S.pl?advert=scru> </cgi-bin/ads_S.pl?advert=scru>
Memory upgrades tailored to your PC: improve your system and help support The PC Guide! </cgi-bin/ads_S.pl?advert=scru>
System Resources
This section takes a detailed look at the PC's system resources. In some ways, everything in a PC is a resource--system RAM, processor speed, hard disk space, etc. However, there are in particular several special resources in the system that are shared by the various devices that use it. These are not physical "parts" of the system for the mostpart, though they have hardware that implement them of course. Rather they are logical parts of the system that control how it works, and are referred to as the PC's system resources.
System resources are important because they must be shared by the various devices in your PC. This includes not only the motherboard and other main components, but also expansion devices, plug-in cards and peripherals. The resources are primarily used for communication and information transfer between these devices. For historical reasons, the amount of some of these resources is very limited, and as you add more peripherals to your system it can be difficult to find enough resources to satisfy all the requirements. This can lead to resource conflicts, which are one of the most common problems with configuring new PCs--and often one of the most difficult to diagnose and correct.
This section looks at each of the types of system resources found in your PC, along with the main hardware devices that control them or access to them. For each one, listings and tables are provided to show how the resources are usually allocated in a typical PC, as well as what resources are sometimes used by various peripherals. Note that I consider a (SoundBlaster or compatible) sound card as part of a basic PC today; they are in most machines now--and are notorious resource hogs as well. In addition, the important matter of resource conflicts is discussed, along with conflict resolution. Finally, Plug and Play is examined, the relatively new system designed to help make resource allocation easier and reduce conflicts automatically.
Note: The term "system resources" is also sometimes used to refer to special memory areas in various Windows operating systems. This is a different concept altogether, that just happens to use the same name.
Next: Interrupts (IRQs) <irq/index.htm>

Home <../../../index.htm> - Search <../../../search.htm> - Topics <../../../topic.html> - Up <../index.htm>
 </cgi-bin/ads_S.pl?advert=spcd> </cgi-bin/ads_S.pl?advert=spcd>
Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
Interrupt Function and Operation
This section takes a look at the interrupt lines and the interrupt controller, describing how they work. This includes an explanation of the different types of interrupts and a summary of the different IRQ numbers used in the PC.

Why Interrupts Are Used to Process Information
The processor is a highly-tuned machine that is designed to (basically) do one thing at a time. However, we use our computers in a way that requires the processor to at least appear to do many things at once. If you've ever used a multitasking operating system like Windows 95, you've done this; you may have been editing a document while downloading information on your modem and listening to a CD simultaneously. The processor is able to do this by sharing its time among the various programs it is running and the different devices that need its attention. It only appears that the processor is doing many things at once because of the blindingly high speed that it is able to switch between tasks.
Most of the different parts of the PC need to send information to and from the processor, and they expect to be able to get the processor's attention when they need to do this. The processor has to balance the information transfers it gets from various parts of the machine and make sure they are handled in an organized fashion. There are two basic ways that the processor could do this:
·         Polling: The processor could take turns going to each device and asking if they have anything they need it to do. This is called polling the devices. In some situations in the computer world this technique is used, however it is not used by the processor in a PC for a couple of basic reasons. One reason is that it is wasteful; going around to all the devices constantly asking if they need the attention of the CPU wastes cycles that the processor could be doing something useful. This is particularly true because in most cases the answer will be "no". Another reason is that different devices need the processor's attention at differing rates; the mouse needs attention far less frequently than say, the hard disk (when it is actively transferring data).
·         Interrupting: The other way that the processor can handle information transfers is to let the devices request them when they need its attention. This is the basis for the use of interrupts. When a device has data to transfer, it generates an interrupt that says "I need your attention now, please". The processor then stops what it is doing and deals with the device that requested its attention. It actually can handle many such requests at a time, using a priority level for each to decide which to handle first.
It may seem like an inefficient way to run a computer, having it be interrupted all the time. I'm sure it must remind you of a day at the office, where the phone kept ringing every 5 minutes and you couldn't get anything done. However, without the ringer on the phone, the alternative would be to keep picking up the phone every 30 seconds to see if someone was trying to call you, which even the most ardent telephone-hater would have to admit is much worse. :^)
It's also interesting to put into perspective just how fast the modern processor is compared to many of the devices that transfer information to it. Let's imagine a very fast typist; say, 120 words per minute. At an average of 5 letters per word, this is 600 characters per minute on the keyboard. You might be fascinated to realize that if you type at this rate, a 200 MHz computer will process 20,000,000 instructions between each keystroke you make! You can see why having the processor spend a lot of time asking the keyboard if it needs anything would be wasteful, especially since at any time you might stop for a minute or two to review your writing, or do something else. Even while handling a full-bandwidth transfer from a 28,800 Kb/sec modem, which of course moves data much faster than your fingers, the processor has over 60,000 instruction cycles between bytes it needs to process.
In addition to the well-known hardware interrupts that we discuss in this section, there are also software interrupts <../../bios/func_Services.htm>. These are used by various software programs in response to different events that occur as the operating system and applications run. In essence, these represent the processor interrupting itself! This is part of how the processor is able to do many things at once. The other thing that software interrupts do is allow one program to access another one (usually an application or DOS accessing to the BIOS) without having to know where it resides in memory.

Interrupt Controllers
Device interrupts are fed to the processor using a special piece of hardware called an interrupt controller. The standard for this device is the Intel 8259 interrupt controller, and has been since early PCs. As with most of these dedicated controllers, in modern motherboards the 8259 is, in most cases, incorporated into a larger chip as part of the chipset <../../chip/index.htm>.
The interrupt controller has 8 input lines that take requests from one of 8 different devices. The controller then passes the request on to the processor, telling it which device issued the request (which interrupt number triggered the request, from 0 to 7). The original PC and XT had one of these controllers, and hence supported interrupts 0 to 7 only.
Starting with the IBM AT, a second interrupt controller was added to the system to expand it; this was part of the expansion of the ISA system bus from 8 to 16 bits. In order to ensure compatibility (isn't that a recurring theme?) the designers of the AT didn't want to change the single interrupt line going to the processor. So what they did instead was to cascade the two interrupt controllers together.
The first interrupt controller still has 8 inputs and a single output going to the processor. The second one has the same design, but it takes 8 new inputs (doubling the number of interrupts) and its output feeds into input line 2 of the first controller. If any of the inputs on the second controller become active, the output from that controller triggers interrupt #2 on the first controller, which then signals the processor.
So what happens to IRQ #2? That line is now being used to cascade the second controller, so the AT's designers changed the wiring on the motherboard to send any devices that used IRQ2 over to IRQ9 instead. What this means is that any older devices that used IRQ2 now use IRQ9, and if you set any device to use IRQ2 on an AT or later system, it is really using IRQ9.
Devices designed to use IRQ2 as a primary setting are rare in today's systems, since IRQ2 has been out of use for over 10 years. In most cases IRQ2 is just considered "unusable", while IRQ9 is a regular, usable interrupt line. However, some modems for example still offer the use of IRQ2 as a way to get around the fact that COM3 and COM4 share interrupts with COM1 and COM2 by default. You may need to do this if you have a lot of devices contending for the low-numbered IRQs (which is very common).
Note: If you select IRQ2 on a device such as a modem, IRQ9 will really be used instead. Any software that uses the device needs to be told that it is using IRQ9, not IRQ2. Also, if you do this, you cannot use the "real" IRQ9 for any other device. You should never attempt to use IRQ2 if you are already using IRQ9 on your PC, and vice-versa.

IRQ Lines and the System Bus
The devices that use interrupts trigger them by signaling over lines provided on the ISA system bus. Most of the interrupts are provided to the system bus for use by devices; however, some of them are only used internally by the system, and therefore they are not given wires on the system bus. These are interrupts 0, 1, 2, 8 and 13, and are never available to expansion cards (remember, IRQ2 is now wired to IRQ9 on the motherboard).
As explained in this section on the ISA bus <../../buses/types/older_ISA.htm>, the original bus was only 8 bits wide and had a single connector for expansion cards. The bus was expanded to 16 bits and a second connector slot added next to the first one; you can see this if you look at your motherboard, since all modern PCs use 16-bit slots.
The addition of this extra connector coincided with the addition of the second interrupt controller, and the lines for these extra IRQs were placed on this second slot. This means that in order to access any of these IRQs--10, 11, 12, 14 and 15--the card must have both connectors. While almost no motherboards today have 8-bit-only bus slots, there are still many expansion cards that only use one ISA connector. The most common example is an internal modem. These cards can only use IRQs 3, 4, 5, 6 and 7 (and 6 is almost always not available since it is used by the floppy disk controller). They can also use IRQ 9 indirectly if they have the ability to use IRQ2, since 9 is wired to where 2 used to be.
Note: All of this applies to ISA and VESA local bus slots only. PCI slots handle interrupts differently, using their own internal interrupt system <../../buses/types/pci_Interrupts.htm>. If a PCI card needs to use a regular IRQ line the BIOS/chipset will normally "map" the PCI interrupt to a regular system interrupt. This is normally done using IRQ9 to IRQ12.

Interrupt Priority
The PC processes device interrupts according to their priority level. This is a function of which interrupt line they use to enter the interrupt controller. For this reason, the priority levels are directly tied to the interrupt number:
·         On an old PC/XT, the priority of the interrupts is 0, 1, 2, 3, 4, 5, 6, 7.
·         On a modern machine, it's slightly more complicated (what else is new). Recall that the second set of eight interrupts is piped through the IRQ2 channel on the first interrupt controller. This means that the first controller views any of these interrupts as being at the priority level of its "IRQ2". The result of this is that the priorities become 0, 1, (8, 9, 10, 11, 12, 13, 14, 15), 3, 4, 5, 6, 7. IRQs 8 to 15 take the place of IRQ2.
In any event, the priority level of the IRQs doesn't make much of a difference in the performance of the machine, so it isn't something you're going to want to worry about too much. If you are a real performance freak, higher-priority IRQs may improve the performance of the devices that use them slightly. If you could actually notice this in any way other than examining the system under the microscope of a benchmark suite, I'd be pretty surprised...

Non-Maskable Interrupts (NMI)
All of the regular interrupts that we normally use and refer to by number are called maskable interrupts. The processor is able to mask, or temporarily ignore, any interrupt if it needs to, in order to finish something else that it is doing. In addition, however, the PC has a non-maskable interrupt (NMI) that can be used for serious conditions that demand the processor's immediate attention. The NMI cannot be ignored by the system unless it is shut off specifically.
When an NMI signal is received, the processor immediately drops whatever it was doing and attends to it. As you can imagine, this could cause havoc if used improperly. In fact, the NMI signal is normally used only for critical problem situations, such as serious hardware errors. The most common use of NMI is to signal a parity error <../../../ram/err_Errors.htm> from the memory subsystem. This error must be dealt with immediately to prevent possible data corruption.

Interrupts, Multiple Devices and Conflicts
In general, interrupts are single-device resources. Because of the way the system bus is designed, it is not feasible for more than one device to use an interrupt at one time, because this can confuse the processor and cause it to respond to the wrong device at the wrong time. If you attempt to use two devices with the same IRQ, an IRQ conflict will result. This is one of the types of resource conflicts <../confl.htm>.
It is possible to share an IRQ among more than one device, but only under limited conditions. In essence, if you have two devices that you seldom use, and that you never use simultaneously, you may be able to have them share an IRQ. However, this is not the preferred method since it is much more prone to problems than just giving each device its own interrupt line.
One of the most common problems regarding shared IRQs is the use of the third and fourth serial (COM) ports, COM3 and COM4. By default, COM3 uses the same interrupt as COM1 (IRQ4), and COM4 uses the same interrupt as COM2 (IRQ3). If you have a mouse on COM1 and set up your modem as COM3--a very common setup--guess what happens the first time you try to go online? :^) You can share COM ports on the same interrupt, but you have to be very careful not to use both devices at once; in general this arrangement is not preferred. See here for ideas on dealing with COM port difficulties <../../../../ts/x/comp/io.htm>.
Many modems will let you change the IRQ they use to IRQ5 or IRQ2, for example, to avoid this problem. Other common areas where interrupt conflicts occur are IRQ5, IRQ7 and IRQ12. The conflict resolution area of the Troubleshooting Expert <../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm> can sometimes help with these situations.

Summary of IRQs and Their Typical Uses
The table below provides summary information about the 16 IRQ levels in a typical PC. You may find this table useful when considering how to configure your system, or for resolving IRQ conflicts. For an explanation of the categories, along with more detailed descriptions, see here <num.htm>. To see IRQ usage organized by device instead of IRQ number, see this device resource summary <../config_Summary.htm>:
IRQ     Bus Line?             Priority                   Typical Default Use                          Other Common Uses     
0         no           1           System timer            None                                             
1         no           2           Keyboard controller  None                                             
2         no (rerouted)         n/a                          None; cascade for IRQs 8-15. Replaced by IRQ 9        Modems, very old (EGA) video cards, COM3 (third serial port), COM4 (fourth serial port)                                            
3         8/16-bit   11         COM2 (second serial port)                                               COM4 (fourth serial port), modems, sound cards, network cards, tape accelerator cards        
4         8/16-bit   12         COM1 (first serial port)                                                   COM3 (third serial port), modems, sound cards, network cards, tape accelerator cards                 
5         8/16-bit   13         Sound card               LPT2 (second parallel port), LPT3 (third parallel port), COM3 (third serial port), COM4 (fourth serial port), modems, network cards, tape accelerator cards, hard disk controller on old PC/XT       
6         8/16-bit   14         Floppy disk controller                                                     Tape accelerator cards   
7         8/16-bit   15         LPT1 (first parallel port)                                                 LPT2 (second parallel port), COM3 (third serial port), COM4 (fourth serial port), modems, sound cards, network cards, tape accelerator cards      
8         no           3           Real-time clock         None                                             
9         16-bit only            4                                                                                   Network cards, sound cards, SCSI host adapters, PCI devices, rerouted IRQ2 devices                   
10       16-bit only            5                                                                                   Network cards, sound cards, SCSI host adapters, secondary IDE channel, quaternary IDE channel, PCI devices                                     
11       16-bit only            6                                                                                   Network cards, sound cards, SCSI host adapters, VGA video cards, tertiary IDE channel, quaternary IDE channel, PCI devices                 
12       16-bit only            7                             PS/2 mouse                                     Network cards, sound cards, SCSI host adapters, VGA video cards, tertiary IDE channel, PCI devices                                                   
13       no           8           Floating Point Unit (FPU / NPU / Math Coprocessor)           None  
14       16-bit only            9                             Primary IDE channel                       SCSI host adapters        
15       16-bit only            10                           Secondary IDE channel                    Network cards, SCSI host adapters           
Next: IRQ Details By Number <num.htm>

Home <../../../../index.htm> - Search <../../../../search.htm> - Topics <../../../../topic.html> - Up <index.htm>
This page has been served 44231 times.  The PC Guide (http://www.PCGuide.com)  </cgi-bin/ads_S.pl?advert=spcd> </cgi-bin/ads_S.pl?advert=spcd>
Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
IRQ Details By Number
This section lists each of the 16 interrupt lines and provides a full description of what they are, how they are normally used, and any special information that is relevant to them. The general format for each section is as follows:
·         IRQ Number: The number of the IRQ from 0 to 15.
·         16-Bit Priority: The priority level of the interrupt <func_Priority.htm>. 1 is the highest and 15 is the lowest.
·         Bus Line: Indicates whether or not this IRQ is available to expansion devices on the system bus <func_Bus.htm>. This will say "8/16 bit" for an interrupt line available to all expansion devices, "16 bit only" for a line available only to 16-bit cards, or "No" for an interrupt used only by system devices.
·         Typical Default Use: Description of the device or function that normally uses this IRQ in a regular modern PC.
·         Other Common Uses: This is a list of other devices that commonly either use this IRQ or offer the use of this IRQ as one of their options. This list isn't exhaustive because there are a lot of oddball cards out there that may use unusual IRQs.
·         Description: A description of the interrupt and how it is used, along with any relevant or interesting points about it or its history.
·         Conflicts: A discussion of the likelihood of conflicts with this IRQ and what are the likely causes.

IRQ0
IRQ Number: 0
16-Bit Priority: 1
Bus Line: No
Typical Default Use: System timer.
Other Common Uses: None; for system use only.
Description: This is the reserved interrupt for the internal system timer. It is used exclusively for internal operations and is never available to peripherals or user devices.
Conflicts: This is a dedicated interrupt line; there should never be any conflicts. If software indicates a conflict on this IRQ, there is a good possibility of a hardware problem somewhere on your system board.

IRQ1
IRQ Number: 1
16-Bit Priority: 2
Bus Line: No
Typical Default Use: Keyboard / keyboard controller.
Other Common Uses: None; for system use only.
Description: This is the reserved interrupt for the keyboard controller. It is used exclusively for keyboard input. Even on systems without a keyboard, IRQ1 is not available for use by other devices. Note that the keyboard controller also controls the PS/2 style mouse if the system has one, but the mouse uses a separate line, IRQ12.
Conflicts: This is a dedicated interrupt line; there should never be any conflicts. If software indicates a conflict on this IRQ, there is a good possibility of a hardware problem somewhere on your system board; this can be a motherboard or chipset (keyboard controller) problem.

IRQ2
IRQ Number: 2
16-Bit Priority: n/a
Bus Line: No
Typical Default Use: Cascade for IRQs 8 to 15.
Other Common Uses: Not generally used. Can be used by modems, very old (EGA) video cards, as an alternative IRQ for COM3 (third serial port) or COM4 (fourth serial port). Rerouted to IRQ9 and appears to software as IRQ9.
Description: This is the interrupt number that is used to cascade the second interrupt controller to the first <func_Controller.htm>, allowing the use of extra IRQs 8 to 15. This use as a linkage between the two interrupt controllers means that IRQ2 is no longer available for normal use. For compatibility with older cards that used IRQ2 on the original PC or XT machines (which had only one controller and a normal IRQ2 line), the motherboard of modern PCs reroutes IRQ2 to IRQ9. Hence IRQ2 can still be used but appears to the system as IRQ9. The most common cards that do this are old EGA video cards, and newer cards making IRQ2 available with the knowledge that it will be routed to IRQ9.
Conflicts: This interrupt is normally not used on most systems, mostly because the whole IRQ2/IRQ9 thing confuses a lot of people so they tend to avoid it. Conflicts on this line generally come from trying to use a device on IRQ2 and another on IRQ9 at the same time. Some modems and serial port cards allow IRQ2 to be used as an alternative for the two standard lines used for modems and serial ports (IRQ3 and IRQ4) in order to avoid conflicts in those two heavily-contested areas. This is generally a good configuration decision since unused IRQs from 3 to 7 are harder to find than unused IRQs from 10 to 15. If you want to use IRQ2, move any device using IRQ9 to another line like 10 or 11.

IRQ3
IRQ Number: 3
16-Bit Priority: 11
Bus Line: 8/16-bit
Typical Default Use: COM2 (second serial port).
Other Common Uses: COM4 (fourth serial port), modems, sound cards, network cards, tape accelerator cards.
Description: This interrupt is normally used by the second serial port, COM2. It is also the default interrupt for the fourth serial port, COM4, and a popular option for modems, sound cards and other devices. Modems often come pre-configured to use COM2 on IRQ3.
Conflicts: Conflicts on IRQ3 are relatively common. The two biggest problem areas are first, modems that attempt to use COM2/IRQ3 and clash with the built-in COM2 port; and second, systems that attempt to use both COM2 and COM4 simultaneously on this same interrupt line. In addition, some devices, particularly network interface cards, come with IRQ3 as the default. In most cases the problem can be avoided by changing the conflicting device to a different interrupt (IRQ2 and IRQ5 usually being the best choices). If the built-in COM2 is not being used, it can be disabled in the BIOS setup <../../bios/set/periph_Serial.htm>, which will allow a modem to stay at COM2/IRQ3 without causing any problems. More general solutions to these issues can be found in the conflict resolution area of the Troubleshooting Expert <../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm>.

IRQ4
IRQ Number: 4
16-Bit Priority: 12
Bus Line: 8/16-bit
Typical Default Use: COM1 (first serial port).
Other Common Uses: COM3 (third serial port), modems, sound cards, network cards, tape accelerator cards.
Description: This interrupt is normally used by the first serial port, COM1. On PCs that do not use a PS/2-style mouse, this port (and thus this interrupt) are almost always used by the serial mouse. IRQ4 is also the default interrupt for the third serial port, COM3, and a popular option for modems, sound cards and other devices. Modems sometimes come pre-configured to use COM3 on IRQ4.
Conflicts: Conflicts on IRQ4 are relatively common, although not as common as on IRQ3. On systems that do not use a serial mouse, problems are less common, because COM1 isn't automatically busy whenever the mouse is in use. The two biggest problem areas are modems that attempt to use COM3/IRQ4 and clash with COM1, and systems that attempt to use both COM1 and COM3 simultaneously on this same interrupt line. In most cases the problem can be avoided by changing the conflicting device to a different interrupt (IRQ2 and IRQ5 usually being the best choices). If a PS/2 mouse is being used, you can disable the built-in COM1 port in the BIOS setup, which will allow a modem to stay at COM3/IRQ4 without causing any problems. However, this is not really recommended. More general solutions to these issues can be found in the conflict resolution area of the Troubleshooting Expert <../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm>.

IRQ5
IRQ Number: 5
16-Bit Priority: 13
Bus Line: 8/16-bit
Typical Default Use: Sound card (but varies widely).
Other Common Uses: LPT2 (second parallel port), COM3 (third serial port), COM4 (fourth serial port), modems, network cards, tape accelerator cards, hard disk controller on old PC/XT.
Description: This is probably the single "busiest" IRQ in the whole system. On the original PC/XT system this IRQ was used to control the (massive 10 MB) hard disk drive. When the AT was introduced, hard disk control was moved to IRQ14 to free up IRQ5 for 8-bit devices. As a result, IRQ5 is in most systems the only free interrupt below IRQ9 and is therefore the first choice for use by devices that would otherwise conflict with IRQ3, IRQ4, IRQ6 or IRQ7. IRQ5 is the default interrupt for the second parallel port in systems that use two printers for example. It is also the first choice that most sound cards make when looking for an IRQ setting. IRQ5 is also a popular choice as an alternate line for systems that need to use a third COM port, or a modem in addition to two COM ports.
Conflicts: Conflicts on IRQ5 are very common because of the large variety of devices that have it as an option. Since virtually every PC today uses a sound card, and they all like to grab IRQ5, it is almost always taken before you even start looking at more esoteric peripherals. If a second parallel port (LPT2) is being used to allow access to two printers or a printer and a parallel-port drive, then IRQ5 will usually be taken right away. If for some very strange reason you have three parallel ports, watch for a conflict here or with IRQ7, since 5 and 7 are the only two normally used as defaults for parallel ports. Sound cards that default to IRQ5 are generally best left there, to avoid problems with poorly written older software that just assumed the sound card would always be left at IRQ5. To whatever extent possible, move devices that can use higher-valued IRQs away from IRQ5. For example, you can't move COM3 to IRQ11, but you usually can move a network card to it. See the conflict resolution area of the Troubleshooting Expert <../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm> for more ideas.

IRQ6
IRQ Number: 6
16-Bit Priority: 14
Bus Line: 8/16-bit
Typical Default Use: Floppy disk controller.
Other Common Uses: Tape accelerator cards.
Description: This interrupt is reserved for use by the floppy disk controller. Technically, it is available for use by other devices, and some devices will allow you to select IRQ6. Most however do not, realizing that virtually every PC uses at least one floppy disk drive. The most common devices that will let you use IRQ6 are probably tape drive accelerator cards. This is probably because these cards are used for tape drives that run off the floppy interface, and many of them can be set to drive floppy disks themselves.
Conflicts: Conflicts on IRQ6 are uncommon and are usually the result of an incorrectly configured peripheral card, since IRQ6 is pretty standardized in its use for the floppy disks. If you use a tape accelerator card along with an integrated floppy disk controller on your motherboard, watch out for the accelerator trying to take over IRQ6; some even do this by default.

IRQ7
IRQ Number: 7
16-Bit Priority: 15
Bus Line: 8/16-bit
Typical Default Use: LPT1 (first parallel port).
Other Common Uses: COM3 (third serial port), COM4 (fourth serial port), modems, sound cards, network cards, tape accelerator cards.
Description: This IRQ is used on most systems to drive the first parallel port, normally for the use of a printer. These days of course many other devices use parallel ports, including external drives. If you are not using a printer or other device then IRQ7 can be used in a similar way to IRQ5: as an alternate for any of the devices that would normally be fighting over IRQ3 or IRQ4.
Conflicts: Conflicts on IRQ7 are relatively unusual. One thing to watch out for if you are using two parallel ports is to make sure the second one is set up to use IRQ5 or another available IRQ. Some add-in parallel boards try to make LPT2 also use IRQ7, which generally won't work. Otherwise, avoiding using IRQ7 for an expansion card if you are using it for LPT1 will eliminate conflicts in most cases.

IRQ8
IRQ Number: 8
16-Bit Priority: 3
Bus Line: No
Typical Default Use: Real-time clock.
Other Common Uses: None; for system use only.
Description: This is the reserved interrupt for the real-time clock timer. This timer is used by software programs to manage events that must be calibrated to real-world time; this is done by setting "alarms", which trigger this interrupt at a specified time. For example, if you are using an electronic datebook and have it set to pop up screen messages or beep the PC when it is time for a meeting, the software will set a timer to count down to the appropriate time. When the timer finishes its countdown, an interrupt will be generated on IRQ8.
Conflicts: This is a dedicated interrupt line; there should never be any conflicts. If software indicates a conflict on this IRQ, there is a good possibility of a hardware problem somewhere on your system board.

IRQ9
IRQ Number: 9
16-Bit Priority: 4
Bus Line: 16-bit only
Typical Default Use: None.
Other Common Uses: Network cards, sound cards, SCSI host adapters, PCI devices, rerouted IRQ2 devices.
Description: This is usually an open IRQ on most systems, and is a popular choice for use by peripherals, especially network cards. On most PCs it can be used freely since it has no default setting.
Conflicts: There are a couple of things to watch out for when using this IRQ. First, if you are trying to use IRQ2, you cannot use IRQ9 as well, since devices that try to use IRQ2 really end up using IRQ9 instead. Also, some systems that use PCI cards that require the use of a system IRQ line will grab IRQ9; this can be changed in some cases using the BIOS setup parameters that assign IRQs to PCI devices <../../bios/set/pci.htm>.

IRQ10
IRQ Number: 10
16-Bit Priority: 5
Bus Line: 16-bit only
Typical Default Use: None.
Other Common Uses: Network cards, sound cards, SCSI host adapters, secondary IDE channel, quaternary IDE channel, PCI devices.
Description: This is usually open and one of the easiest IRQs to use since it is generally not contested by many devices. While the secondary IDE controller can sometimes be set to use IRQ10, it almost always uses IRQ15 instead.
Conflicts: Conflicts on IRQ10 are unusual; the only thing to watch out for is a PCI card that needs an interrupt line being assigned IRQ10 by the BIOS; this can be changed in some cases using the BIOS setup parameters that assign IRQs to PCI devices <../../bios/set/pci.htm>.

IRQ11
IRQ Number: 11
16-Bit Priority: 6
Bus Line: 16-bit only
Typical Default Use: None.
Other Common Uses: Network cards, sound cards, SCSI host adapters, VGA video cards, tertiary IDE channel, quaternary IDE channel, PCI devices.
Description: This line is usually open and relatively easy to use since it is generally not contested by many devices. If you are using three IDE channels (the third typically being on a sound card), IRQ11 is typically the one that the tertiary controller will try to use. Also, some PCI video cards will try to use IRQ11.
Conflicts: Watch out for PCI cards, especially video cards, that grab IRQ11. This can be changed in some cases using the BIOS setup parameters that assign IRQs to PCI devices <../../bios/set/pci.htm>.

IRQ12
IRQ Number: 12
16-Bit Priority: 7
Bus Line: 16-bit only
Typical Default Use: PS/2 mouse.
Other Common Uses: Network cards, sound cards, SCSI host adapters, VGA video cards, tertiary IDE channel, PCI devices.
Description: On machines that use a PS/2 mouse, this is the IRQ reserved for its use. Using a PS/2 mouse frees up the COM1 serial port and the interrupt it uses (IRQ4) for other devices. Normally this is a good trade since free IRQs with numbers below 8 are harder to find than ones above 8. If a PS/2 mouse is not used, IRQ12 is a good choice for use by other devices such as network cards.
Conflicts: There are some potential problems here. Watch out for PCI cards that can sometimes be assigned this line by the system BIOS. This can be changed in some cases using the BIOS setup parameters that assign IRQs to PCI devices <../../bios/set/pci.htm>. If you are using a PS/2 mouse you need to make sure no other devices use IRQ12.

IRQ13
IRQ Number: 13
16-Bit Priority: 8
Bus Line: No
Typical Default Use: Floating point unit (FPU / NPU / Math coprocessor).
Other Common Uses: None; for system use only.
Description: This is the reserved interrupt for the integrated floating point unit (on 80486 or later machines) or the math coprocessor (on 80386 or earlier machines that use one). It is used exclusively for internal signaling and is never available for use by peripherals.
Conflicts: This is a dedicated interrupt line; there should never be any conflicts. If software indicates a conflict on this IRQ, there is a good possibility of a hardware problem somewhere on your system board, or possibly with your processor or math coprocessor.

IRQ14
IRQ Number: 14
16-Bit Priority: 9
Bus Line: 16-bit only
Typical Default Use: Primary IDE channel.
Other Common Uses: SCSI host adapters.
Description: On most PCs, this IRQ is reserved for use by the primary IDE controller, which provides access to the first two IDE/ATA devices (usually hard disk drives and/or CD-ROM drives). On machines that do not use IDE devices at all, this IRQ can be used for another purpose (such as a SCSI host adapter to provide SCSI drives). In order to do this, you will normally have to disable the IDE channel using either the appropriate BIOS setting <../../bios/set/periph_IDE.htm> (for integrated IDE support on newer boards) or jumpers on the controller board (for older machines that use an IDE controller card).
Conflicts: Problems with IRQ14 are rare, since the universality of its use for IDE means most peripheral vendors avoid offering it as an option. If you are using SCSI and not IDE, and want to use IRQ14, make sure any integrated IDE controllers are disabled first.

IRQ15
IRQ Number: 15
16-Bit Priority: 10
Bus Line: 16-bit only
Typical Default Use: Secondary IDE channel.
Other Common Uses: Network cards, SCSI host adapters.
Description: On most newer PCs, this IRQ is reserved for use by the secondary IDE controller, which provides access to the third and fourth IDE/ATA devices (usually hard disk drives and/or CD-ROM drives). If you are not using IDE, or are using only two devices and want to put them on the primary channel to free up this IRQ, that can be done easily as long as you remember to disable the secondary IDE channel using either the appropriate BIOS setting <../../bios/set/periph_IDE.htm> (for integrated IDE support on newer boards) or jumpers on the controller board (for older machines that use an IDE controller card).
Conflicts: Problems with IRQ15 typically result from assigning a peripheral to use it while forgetting to disable the integrated secondary IDE controller. Most Pentium or later (PCI-based) motherboards have two integrated IDE controllers. Some people incorrectly assume that there will be no conflict if nothing is attached to the secondary channel, but this is not always the case.


Direct Memory Access (DMA) Channels
Direct memory access (DMA) channels are system pathways used by many devices to transfer information directly to and from memory. DMA channels are not nearly as "famous" as IRQs as system resources go. This is mostly for a good reason: there are fewer of them and they are used by many fewer devices, and hence they usually cause fewer problems with system setup. However, conflicts on DMA channels can cause very strange system problems and can be very difficult to diagnose. DMAs are used most commonly today by floppy disk drives, tape drives and sound cards.

 </cgi-bin/ads_S.pl?advert=skcc> </cgi-bin/ads_S.pl?advert=skcc>
KC Computers, ranked highly in the customer satisfaction survey at www.resellerratings.com </cgi-bin/ads_S.pl?advert=skcc>
DMA Channel Function and Operation
This section takes a look at DMA channels and how they work. This includes an explanation of the different types of DMA channels, the DMA controller, and a summary of the different DMA channels used in the PC.

Why DMA Channels Were Invented for Data Transfer
As you know, the processor is the "brain" of the machine, and in many ways it can also be likened to the conductor of an orchestra. In early machines the processor really did almost everything. In addition to running programs it was also responsible for transferring data to and from peripherals. Unfortunately, having the processor perform these transfers is very inefficient, because it then is unable to do anything else.
The invention of DMA enabled the devices to cut out the "middle man", allowing the processor to do other work and the peripherals to transfer data themselves, leading to increased performance. Special channels were created, along with circuitry to control them, that allowed the transfer of information without the processor controlling every aspect of the transfer. This circuitry is normally part of the system chipset on the motherboard.
Note that DMA channels are only on the ISA bus (and EISA and VLB, since they are derivatives of it). PCI devices do not use standard DMA channels at all.

Third-Party and First-Party DMA (Bus Mastering)
Standard DMA is sometimes called "third party" DMA. This refers to the fact that the system DMA controller is actually doing the transfer (the first two parties are the sender and receiver of the transfer). There is also a type of DMA called "first party" DMA. In this situation, the peripheral doing the transfer actually takes control of the system bus to perform the transfer. This is also called bus mastering.
Bus mastering provides much better performance than regular DMA because modern devices have much smarter and faster DMA circuitry built into them than exists in the old standard ISA DMA controller. Newer DMA modes are now available, such as Ultra DMA <../../../hdd/if/ide/std_Ultra.htm> (mode 3 or DMA-33) that provide for very high transfer rates.

Limitations of Standard DMA
While the use of DMA provided a significant improvement over processor-controlled data transfers, it too eventually reached a point where its performance became a limiting factor. DMA on the ISA bus has been stuck at the same performance level for over 10 years. For old 10 MB XT hard disks, DMA was a top performer. For a modern 8 GB hard disk, transferring multiple megabytes per second, DMA is insufficient.
On newer machines, disks are controlled using either programmed I/O (PIO) or first-party DMA (bus mastering) on the PCI bus <../../buses/types/pci_IDEBM.htm>, and not using the standard ISA DMA that is used for devices like sound cards. Hard disk transfer modes are discussed in detail here <../../../hdd/if/ide/modes.htm>. This type of DMA does not rely on the slow ISA DMA controllers, and allows these high-performance devices the bandwidth they need. In fact, many of the devices that used to use DMA on the ISA bus use bus mastering over the PCI bus for faster performance. This includes newer high-end SCSI cards, and even network and video cards.

DMA Controllers
Standard DMA transfers are managed by the DMA controller, built into the system chipset <../../chip/index.htm> on modern PCs. The original PC and XT had one of these controllers and supported 4 DMA channels, 0 to 3.
Starting with the IBM AT, a second DMA controller was added. Much in the way that the second interrupt controller was cascaded with the first <../irq/func_Controller.htm>, the first DMA controller is cascaded to the second. The difference is that with IRQs, the second controller is cascaded to the first, but with DMAs the first is cascaded to the second. As a result, there are 8 DMAs, from 0 to 7, but DMA 4 is not usable. There is no rerouting as with IRQ2 and IRQ9 here, because all of the original DMAs (0 to 3) are still usable directly.

DMA Channels and the System Bus
All of the DMA channels except channel 4 are accessible to devices on the ISA system bus. Channel 4 is used to cascade the two DMA controllers together. PCI devices do not use standard system DMA channels.
As was the case with IRQs, the second DMA controller was added when the ISA bus was expanded to 16 bits with the creation of the AT. The lines to access these extra DMA channels were placed on the second part of the AT slot that is used by 16-bit cards. This means that only 16-bit cards can access DMA channels 5, 6 or 7. Unfortunately, many devices even today are still only 8-bit cards. You can tell by looking at them and seeing that they only use the first part of the two-part ISA bus connector on the motherboard.

DMA Request (DRQ) and DMA Acknowledgment (DACK)
Each DMA channel is comprised of two signals: the DMA request signal (DRQ) and the DMA acknowledgment signal (DACK). Some peripheral cards have separate jumpers for these instead of a single DMA channel jumper. If this is the case, make sure that the DRQ and DACK are set to the same number, otherwise the device won't work (I wonder what goes through the minds of some peripheral card designers. :^) )

DMA, Multiple Devices and Conflicts
Like interrupts, DMA channels are single-device resources. If two devices try to use the same DMA channel at the same time, information will get mixed up between the two devices trying to use it, and any number of problems can be the result. DMA channel conflicts can be very difficult to diagnose. See here for more details on resource conflicts <../confl.htm>.
It is possible to share a DMA channel among more than one device, but only under limited conditions. In essence, if you have two devices that you seldom use, and that you never use simultaneously, you may be able to have them share a channel. However, this is not the preferred method since it is much more prone to problems than just giving each device its own resource.
One problem area with DMA channels is that most devices want to use DMA channels with numbers 0 to 3 (on the first DMA controller). DMA channels 5 to 7 are relatively unused because they require 16-bit cards. Considering that DMA channel 0 is never available, and DMA 2 is used for the floppy disk controller, that doesn't leave many options. On one of my systems I wanted to set up an ECP parallel port, a tape accelerator and a voice modem in addition to my sound card. I ran out of DMA channels between 1 and 3 very quickly. I still had DMA channels 6 and 7 open but could not use them because all the devices I wanted to use were either on 8-bit cards or wouldn't support the higher numbers for software reasons.
Speaking of the ECP parallel port, this is another new area of concern regarding DMA resource conflicts. Many people don't realize that this high-speed parallel port option requires the use of a DMA channel. (Your BIOS setup program will usually have a setting to select the DMA channel <../../bios/set/periph_ParallelECP.htm>, right under where you enable ECP <../../bios/set/periph_ParallelMode.htm>. This should be a good hint but still a lot of people don't notice this. :^) ) The usual default for this port is DMA 3, which is also used by many other types of devices. The conflict resolution area of the Troubleshooting Expert <../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm> can sometimes help with these situations.

Summary of DMA Channels and Their Typical Uses
The table below provides summary information about the 8 DMA channel numbers in a typical PC. You may find this table useful when considering how to configure your system, or for resolving DMA conflicts. For an explanation of the categories, along with more detailed descriptions, see here <num.htm>. To see DMA channel usage organized by device instead of DMA number, see this device resource summary <../config_Summary.htm>.
DMA Bus Line? Typical Default Use                                                                   Other Common Uses       
0       no            Memory Refresh                                                                       None     
1       8/16-bit     Sound card (low DMA)                                                               SCSI host adapters, ECP parallel ports, tape accelerator cards, network cards, voice modems                                                     
2       8/16-bit     Floppy disk controller                                                                Tape accelerator cards      
3       8/16-bit     None                 ECP parallel ports, SCSI host adapters, tape accelerator cards, sound card (low DMA), network cards, voice modems, hard disk controller on old PC/XT                                                   
4       no            None; cascade for DMAs 0-3                                                       None     
5       16-bit only Sound card (high DMA)                                                              SCSI host adapters, network cards  
6       16-bit only None                 Sound cards (high DMA), network cards                
7       16-bit only None                 Sound cards (high DMA), network cards                
Next: DMA Channel Details By Number <num.htm>

Home <../../../../index.htm> - Search <../../../../search.htm> - Topics <../../../../topic.html> - Up <index.htm>
 </cgi-bin/ads_S.pl?advert=spcd> </cgi-bin/ads_S.pl?advert=spcd>
Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
DMA Channel Details By Number
This section lists each of the 8 DMA channels and provides a full description of what they are, how they are normally used, and any special information that is relevant to them. The general format for each section is as follows:
·         Channel Number: The number of the DMA channel from 0 to 7.
·         Bus Line: Indicates whether or not this DMA channel is available to expansion devices on the system bus. This will say "8/16 bit" for DMA accessible by all expansion devices, "16 bit only" for a channel available only to 16-bit cards, or "No" for a channel reserved for use only by system devices.
·         Typical Default Use: Description of the device or function that normally uses this DMA channel in a regular modern PC.
·         Other Common Uses: This is a list of other devices that commonly either use this channel or offer the use of this channel as one of their options. This list isn't exhaustive because there are a lot of oddball cards out there that may use unusual DMAs.
·         Description: A description of the channel and how it is used, along with any relevant or interesting points about it or its history.
·         Conflicts: A discussion of the likelihood of conflicts with this DMA channel and what are the likely causes.

DMA0
Channel Number: 0
Bus Line: No
Typical Default Use: Memory (DRAM) Refresh.
Other Common Uses: None; for system use only.
Description: This DMA channel is reserved for use by the internal DRAM refresh circuitry. Dynamic RAM <../../../ram/types_DRAM.htm> (used for system memory on almost all PCs) must be refreshed frequently to make sure that it does not lose its contents. DMA channel 0 is used for this purpose and is not available for use by peripherals.
Conflicts: Most devices stay far away from DMA0, recognizing its use by the system. Beware however, as some devices actually offer DMA0 as an option. For example, some sound cards do. Do not use DMA0 for peripherals. If you have no devices set to use DMA0 but a conflict becomes apparent anyway, it could be a problem with your motherboard.

DMA1
Channel Number: 1
Bus Line: 8/16-bit
Typical Default Use: Low DMA channel for sound card.
Other Common Uses: SCSI host adapters, ECP parallel ports, tape accelerator cards, network cards, voice modems.
Description: This DMA channel is normally taken by the sound card in your PC for its "low" DMA channel. Most sound cards today actually use two DMA channels; one must be chosen from DMAs 1, 2 or 3, while the other can be any free DMA channel (and so is selected from the less-used 5, 6 or 7). DMA1 is also a popular choice for many other peripherals, largely for historical reasons (on the original XT, DMA3 was used for the hard disk so DMA1 was all that was left open for everything else to share).
Conflicts: DMA1 is one of the two most contested channels in the system (the other being DMA3, which is often worse). It is important to watch for conflicts between multiple devices here, particularly if you are using a sound card. It is preferable in general to leave the sound card on DMA1 and move any other devices out of its way, for compatibility with older (poorly written) software that assumes the sound card is on DMA1. Also watch out for ECP parallel port conflicts here. More general solutions to resource conflicts can be found in the conflict resolution area of the Troubleshooting Expert <../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm>.

DMA2
Channel Number: 2
Bus Line: 8/16-bit
Typical Default Use: Floppy disk controller.
Other Common Uses: Tape accelerator cards.
Description: This DMA channel is used on virtually every PC for the floppy disk controller. As such, it is usually not offered as an option for use by most peripherals. Some do offer it as an option however. In particular, tape accelerator cards often offer the use of DMA2 as an option. This is probably because these cards are used for tape drives that run off the floppy interface, and many of them can be set to drive floppy disks themselves.
Conflicts: DMA2 is not often a source of conflicts, as long as you remember not to put any other devices on it if you have a floppy disk controller in your system (which almost everyone does). Beware tape accelerator cards that default to DMA2 for their channel assignment.

DMA3
Channel Number: 3
Bus Line: 8/16-bit
Typical Default Use: None.
Other Common Uses: ECP parallel ports, SCSI host adapters, tape accelerator cards, sound card (low DMA), network cards, voice modems.
Description: This DMA channel is normally the only one free on the first controller (DMAs 0 to 3) when you are using a sound card. As a result, it is probably the "busiest" channel in the PC, with many different devices vying for its services. One of the most common uses of this channel is by ECP parallel ports, which require a DMA channel unlike other parallel port modes. On very old XT systems, DMA channel 3 is used by the hard disk drive.
Conflicts: DMA3 is probably the worst channel in the system for conflicts, because so many devices try to use it. It is important to watch for conflicts between multiple devices here, particularly if you are using a sound card or ECP parallel port. More general solutions to resource conflicts can be found in the conflict resolution area of the Troubleshooting Expert <../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm>.

DMA4
Channel Number: 4
Bus Line: No
Typical Default Use: Cascade for DMA channels 5 to 7.
Other Common Uses: None; for system use only.
Description: This DMA channel is reserved for cascading the two DMA controllers on systems with a 16-bit ISA bus. It is not available for use by peripherals.
Conflicts: There should not be any conflicts on this channel; any problems with it indicate a possible system hardware failure.

DMA5
Channel Number: 5
Bus Line: 16-bit only
Typical Default Use: High DMA channel for sound card.
Other Common Uses: SCSI host adapters, network cards.
Description: This DMA channel is normally taken by the sound card in your PC for its "high" DMA channel. Most sound cards today actually use two DMA channels; one must be chosen from DMAs 1, 2 or 3 (the "low" channel), while the other is selected from a high-numbered channel like this one. Some network cards also use this channel, though others don't use DMA at all.
Conflicts: Few conflicts arise with this channel because there are relatively few devices that can use DMA channels 5, 6 or 7.

DMA6
Channel Number: 6
Bus Line: 16-bit only
Typical Default Use: None.
Other Common Uses: Sound cards (high DMA), network cards.
Description: This DMA channel is normally open and available for use by peripherals. It is one of the least used channels in the system and is an alternative location for the "high" sound card DMA channel or other devices.
Conflicts: Few conflicts arise with this channel because there are relatively few devices that can use DMA channels 5, 6 or 7.

DMA7
Channel Number: 7
Bus Line: 16-bit only
Typical Default Use: None.
Other Common Uses: Sound cards (high DMA), network cards.
Description: This DMA channel is normally open and available for use by peripherals. It is one of the least used channels in the system and is an alternative location for the "high" sound card DMA channel or other devices.
Conflicts: Few conflicts arise with this channel because there are relatively few devices that can use DMA channels 5, 6 or 7.
Next: Input / Output (I/O) Addresses <../io.htm>

Home <../../../../index.htm> - Search <../../../../search.htm> - Topics <../../../../topic.html> - Up <index.htm>
This page has been served 15477 times.  The PC Guide (http://www.PCGuide.com)  </cgi-bin/ads_S.pl?advert=sout> </cgi-bin/ads_S.pl?advert=sout>
Outpost.com - Hardware. Software. Answers. </cgi-bin/ads_S.pl?advert=sout>
Input / Output (I/O) Addresses
Input/output addresses (usually called I/O addresses for short) are resources used by virtually every device in the computer. Conceptually, they are very simple; they represent locations in memory that are designated for use by various devices to exchange information between themselves and the rest of the PC.
Note: I/O addresses are referred to in hexadecimal notation. See here for an explanation of what this means <../../../intro/works/comput_Math.htm>, if you are not familiar with it.

Memory-Mapped I/O
You can think of I/O addresses like a bunch of small two-way "mailboxes" in the system's memory. Take for example a communications (COM) port that has a modem connected to it. When information is received by the modem, it needs to get this information into the PC. Where does it put the data it pulls off the phone line?
One answer to this problem is to give each device its own small area of memory to work with. This is called memory-mapped I/O. When the modem gets a byte of data it sends it over the COM port, and it shows up in the COM port's designated I/O address space. When the CPU is ready to process the data, it knows where to look to find it. When it later wants to send information over the modem, it uses this address again (or another one near it). This is a very simple way of dealing with the problem of information exchange between devices.

I/O Address Space Width
Unlike IRQs and DMA channels, which are of uniform size and normally assigned one per device--sound cards use more than one because they are really many devices wrapped into one package--I/O addresses vary in size. The reason is simple: some devices (e.g., network cards) have much more information to move around than others (e.g., keyboards).
The size of the I/O address is also in some cases dictated by the design of the card and (as usual) compatibility reasons with older devices. Most devices use an I/O address space of 4, 8 or 16 bytes; some use as few as 1 byte and others as many as 32 or more. The wide variance in the size of the I/O addresses can make it difficult to determine and resolve resource conflicts, because often I/O addresses are referred to only by the first byte of the I/O address.
For example, people may say to "put your network card at 360h", which may seem not to conflict with your LPT1 parallel port at address 378h. In fact many network cards take up 32 bytes for I/O; this means they use up 360-37Fh, which totally overlaps with the parallel port (378-37Fh). The I/O address summary map helps you to see which I/O addresses are most used, and to visualize and avoid potential conflicts.

I/O Addresses, Multiple Devices and Conflicts
I/O addresses, like other system resources, are normally used only by single devices. Having multiple devices try to use the same address would cause information to get mixed up and overwritten, sort of like having two people share a mailbox (where none of the envelopes had anything printed on them. :^) )
There are some unusual exceptions to this however, mostly for historical reasons. They are discussed in the next section where individual addresses are reviewed. One of the problems with I/O addresses and conflicts is simply keeping track of them all. They can be quite confusing to keep straight, particularly since different devices use different sized address spaces.
I/O addresses suffer from the same problem that IRQs and DMA channels do: many conflicts occur not because there aren't enough I/O addresses to go around, but because they aren't allocated or spaced out in an organized way. Too many devices attempt to use the same addresses, or have too few different configuration options to allow them all to find a place to use without getting in each others' way. This is largely due to historical reasons.
One additional note about parallel ports. The I/O addresses used for the different parallel ports (LPT1, LPT2, LPT3) are not universal. Originally IBM defined different defaults for monochrome-based PCs and for color PCs. Of course, all new systems have been color for many years, but even some new systems still default LPT1 to 3BCh. Here is how the two different labeling schemes typically work. See the section on logical devices <logic.htm> for more details:
Port       "Monochrome" Systems                     "Color" Systems   
LPT1     3BC-3BFh                378-37Fh           
LPT2     378-37Fh                  278-27Fh           
LPT3     278-27Fh                  --                      

I/O Address Details By Number
Here I describe some of the more interesting I/O addresses in use in the typical PC. Of particular interest are those where conflicts are likely to occur, due to a large number of devices using the address or offering it as an option. A complete list of I/O addresses is provided in the summary in the next section:
·         060h and 064h: These two addresses are used by the keyboard controller, which operates both the keyboard and the PS/2 style mouse (on devices that use it).
·         130-14Fh and 140-15Fh: These addresses are sometimes offered as options for SCSI host adapters. Note that these options partially overlap (from 140-14Fh).
·         220-22Fh: This is the default address for many sound cards. It is also an option for some SCSI host adapters (first 16 bytes).
·         240-24Fh: This is an optional address for sound cards and network cards (first 16 bytes for NE2000 cards).
·         260-26Fh and 270-27Fh: This is an optional address for sound cards and network cards. NE2000-compatible network cards take 32 bytes; if set to use this I/O address, they will conflict with several system devices as well as the I/O address for either LPT2 or LPT3 in the 270-27Fh area.
·         280-28Fh: This is an optional address for sound cards and network cards (first 16 bytes for NE2000 cards).
·         300-30Fh: This is the default for many network cards (NE2000 cards extend to 31Fh). 300-301h is also an option for the MIDI port on many sound cards.
·         320-32Fh and 330-33Fh: This is a busy area in the I/O memory map. First, 330-331h is the default for the MIDI port on many sound cards. 320-33Fh is an option for some NE2000-compatible network cards and will conflict with the MIDI port at this setting. Some SCSI host adapters also offer 330-34Fh as an option. Finally, the old PC/XT hard disk controller also uses 320-323h.
·         340-34Fh: Optional areas for several device types overlap here, including two options for SCSI host adapters (330-34Fh and 340-35Fh) as well as network cards.
·         360-36Fh and 370-37Fh: This is another "high traffic" area. 378-37Fh is used on most systems for the first parallel port, and 376-377h is used for the secondary IDE controller's slave drive. These can conflict with an NE2000-compatible network card placed at location 360h. Tape accelerator cards often default to 370h, which will also conflict with a network card placed at 360h).
·         3B0-3BBh and 3C0-3DFh: These are used by VGA video adapters. They take all of the areas originally assigned for monochrome cards (3B0-3BBh), CGA adapters (3D0-3DFh) and EGA adapters (3C0-3CFh).
·         3E8-3EFh: There is a potential conflict here in locations 3EE-3EFh if you are using a third serial port (COM3) and a tertiary IDE controller.
·         3F0-3F7h: There is actually a "standard" resource conflict here: the floppy disk controller and the slave drive on the primary IDE controller "share" locations 3F6-3F7h. These devices are actually both present in many systems. Fortunately, this conflict (which exists for historical reasons) is fairly well known and compensated for, so it will not result in problems in a typical system. Note that some tape accelerator cards also offer the use of 3F0h as an option, which will conflict with the floppy disk controller.

I/O Address Summary Map
The table below shows the I/O addresses from 000 to 3FFh, along with the devices that typically use them. This table is slightly different than the ones that show default and optional use of IRQs and DMA channels. There are many different addresses of different sizes, so in order to keep the table a manageable size, it was made somewhat two-dimensional. Each row is 16 bytes and is divided into four columns; the first is for bytes 0 to 3, the second 4 to 7, the third 8 to B and the fourth C to F. So to find address 3BCh, you would look in the fourth column of row "3B0-3BFh".
Items in the table in bold print represent standard devices in a typical PC configuration. Items in regular print represent optional devices or optional locations for addresses of standard devices. Blank spaces are areas that are open. Multiple lines are used to show multiple items that go in the same address space. Where you see two or more items overlapping in the same address space, there is the potential for a resource conflict.
To see I/O address usage organized by device instead of address, see this device resource summary <config_Summary.htm> instead:
Addr.        First Quad (xx0h to xx3h)                        Second Quad (xx4h to xx7h)                     Third Quad (xx8h to xxBh)                Fourth Quad (xxCh to xxFh)                   
000-00Fh   DMA controller, channels 0 to 3                                                                                  
010-01Fh   (System use)                                                                                                      
020-02Fh   Interrupt controller #1 (020-021h)            (System use)                                                  
030-03Fh   (System use)                                                                                                      
040-04Fh   System timers          (System use)                                                                        
050-05Fh   (System use)                                                                                                      
060-06Fh   Keyboard & PS/2 mouse (060h), Speaker (061h)                       Keyboard & PS/2 mouse (064h)                             
070-07Fh   RTC/CMOS, NMI (070-071h)                   (System use)                                                  
080-08Fh   DMA page register 0-2 (081-083h)            DMA page register 3 (087h)                     DMA page registers 4-6 (089-08Bh)                DMA page register 7 (08Fh)                    
090-09Fh   (System use)                                                                                                      
0A0-0Afh  Interrupt controller #2 (0A0-0A1h)           (System use)                                                  
0B0-0BFh  (System use)                                                                                                      
0C0-0CFh  DMA controller, channels 4-7 (0C0-0DFh, bytes 1-16)                                                                 
0D0-0DFh  DMA controller, channels 4-7 (0C0-0DFh, bytes 17-32)                                                               
0E0-0Efh  (System use)                                                                                                      
0F0-0FFh  Floating point unit (FPU/NPU/Math coprocessor)                                                                      
100-10Fh   (System use)                                                                                                      
110-11Fh   (System use)                                                                                                      
120-12Fh   (System use)                                                                                                      
130-13Fh   SCSI host adapter, (130-14Fh, bytes 1 to 16)                                                                               
140-14Fh   SCSI host adapter, (130-14Fh, bytes 17 to 32)                                                                             
                SCSI host adapter, (140-15Fh, bytes 1 to 16)                                                                               
150-15Fh   SCSI host adapter, (140-15Fh, bytes 17 to 32)                                                                             
160-16Fh                                                               Quaternary IDE controller, master drive              
170-17Fh   Secondary IDE controller, master drive                                                                       
180-18Fh                                                                                                                          
190-19Fh                                                                                                                          
1A0-1AFh                                                                                                                         
1B0-1BFh                                                                                                                         
1C0-1CFh                                                                                                                         
1D0-1DFh                                                                                                                         
1E0-1EFh                                                              Tertiary IDE controller, master drive                  
1F0-1FFh  Primary IDE controller, master drive                                                                          
200-20Fh   Joystick port                                                                        (System use, 20C-20Dh)    
210-21Fh                                                                                                                          
220-22Fh   Sound card                                                                                                       
                SCSI host adapter, (220-23Fh, bytes 1 to 16)                                                                               
230-23Fh   SCSI host adapter, (220-23Fh, bytes 17 to 32)                                                                             
240-24Fh   Sound card                                                                                                        
                Non-NE2000 network card                                                                                           
                NE2000 network card (240-25Fh, bytes 1 to 16)                                                                          
250-25Fh   NE2000 network card (240-25Fh, bytes 17 to 32)                                                                         
260-26Fh   Sound card                                                                                                        
                Non-NE2000 network card                                                                                           
                NE2000 network card (260-27Fh, bytes 1 to 16)                                                                          
270-27Fh   (System use)             Plug and Play system devices                   LPT2 (second parallel port) (color systems)               
                                                                            LPT3 (third parallel port) (monochrome systems)             
                NE2000 network card (260-27Fh, bytes 17 to 32)                                                                         
280-28Fh   Sound card                                                                                                        
                Non-NE2000 network card                                                                                           
                NE2000 network card (280-29Fh, bytes 1 to 16)                                                                          
290-29Fh   NE2000 network card (280-29Fh, bytes 17 to 32)                                                                         
2A0-2Afh vvv                          Non-NE2000 network card                                                                         
                NE2000 network card (2A0-2BFh, bytes 1 to 16)                                                                         
2B0-2BFh  NE2000 network card (2A0-2BFh, bytes 17 to 32)                                                                        
2C0-2CFh                                                                                                                         
2D0-2DFh                                                                                                                         
2E0-2Efh                                                              COM4 (fourth serial port)                                 
2F0-2FFh                                                              COM2 (second serial port)                              
300-30Fh   Sound card (MIDI port) (300-301h)                                                                               
                Non-NE2000 network card                                                                                           
                NE2000 network card (300-31Fh, bytes 1 to 16)                                                                          
310-31Fh   NE2000 network card (300-31Fh, bytes 17 to 32)                                                                         
320-32Fh   Non-NE2000 network card                                                                                           
                NE2000 network card (320-33Fh, bytes 1 to 16)                                                                          
                Hard disk controller on old PC/XT                                                                                 
330-33Fh   Sound card (MIDI port) (330-331h)                                                                              
                NE2000 network card (320-33Fh, bytes 17 to 32)                                                                         
                SCSI host adapter, (330-34Fh, bytes 1 to 16)                                                                               
340-34Fh   SCSI host adapter, (330-34Fh, bytes 17 to 32)                                                                             
                SCSI host adapter, (340-35Fh, bytes 1 to 16)                                                                               
                Non-NE2000 network card                                                                                           
                NE2000 network card (340-35Fh, bytes 1 to 16)                                                                          
350-35Fh   SCSI host adapter, (340-35Fh, bytes 17 to 32)                                                                             
                NE2000 network card (340-35Fh, bytes 17 to 32)                                                                         
360-36Fh   Tape accelerator card (360h)                                                                                 Quaternary IDE controller (slave drive) (36E-36Fh)                     
                Non-NE2000 network card                                                                                           
                NE2000 network card (360-37Fh, bytes 1 to 16)                                                                          
370-37Fh   Tape accelerator card (370h)                     Secondary IDE controller (slave drive) (376-377h)         LPT1 (first parallel port) (color systems)                                              
                                                                            LPT2 (second parallel port) (monochrome systems)                      
                NE2000 network card (360-37Fh, bytes 17 to 32)                                                                         
380-38Fh                                                               Sound card (FM synthesizer)                           
390-39Fh                                                                                                                          
3A0-3AFh                                                                                                                         
3B0-3BFh  VGA/Monochrome Video                                                                                    LPT1 (first parallel port) (monochrome systems)             
3C0-3CFh  VGA/EGA Video                                                                                               
3D0-3DFh  VGA/CGA Video                                                                                               
3E0-3EFh  Tape accelerator card (3E0h)                                                   COM3 (third serial port)                 
                                                                                                          Tertiary IDE controller (slave drive) (3EE-3EFh)         
3F0-3FFh  Floppy disk controller                                                           COM1 (first serial port)               
                Tape accelerator card (3F0h)                     Primary IDE controller (slave drive) (3F6-3F7h)                                   
Next: Logical Devices <logic.htm>

Home <../../../index.htm> - Search <../../../search.htm> - Topics <../../../topic.html> - Up <index.htm>
This page has been served 21990 times.  The PC Guide (http://www.PCGuide.com)  </cgi-bin/ads_S.pl?advert=scru> </cgi-bin/ads_S.pl?advert=scru>
Memory upgrades tailored to your PC: improve your system and help support The PC Guide! </cgi-bin/ads_S.pl?advert=scru>
Logical Devices
Some devices have both a physical address and also a logical name. The two most commonly-encountered device types that work this way are serial ports (called COM1 to COM4) and parallel ports (LPT1 to LPT3). Actually, hard disks are labeled this way too, A:, C: etc., even though most people don't think of them the same way. The purpose of this logical labeling is to make it easier to refer to devices without having to know their specific addresses. It's much simpler for software to be able to refer to a COM port by name than by an address.

Logical Name Assignment
Logical device names are assigned by the system BIOS during the power-on self test, when the system is booted up. The BIOS searches for devices by I/O address in a predefined order, and assigns them a logical name dynamically, in numerical order. The following are the normal default assignments for COM ports, in order:
Port         I/O Address      Default IRQ    
COM1      3F8-3FFh         4                    
COM2      2F8-2FFh         3                    
COM3      3E8-3EFh         4                    
COM4      2E8-2EFh         3                    
For parallel ports it is slightly more complicated. Originally IBM defined different defaults for monochrome-based PCs and for color PCs. Of course, all new systems have been color for many years, but even some new systems still put LPT1 at 3BCh. Here is how the two different labeling schemes typically work:
Port       "Monochrome" Systems               "Color" Systems          Default IRQ      
LPT1     3BC-3BFh             378-37Fh        7                   
LPT2     378-37Fh               278-27Fh        5                   
LPT3     278-27Fh               --                    5                   
Most new systems have LPT1 at 378-37Fh. Note that the sequences are really the same, in a way; on a "monochrome" system if you don't put a device at 3BC-3BFh but instead put it at 378-37Fh, the BIOS will make that LPT1 since it didn't find an LPT1 at 3BCh.
Tip: If you want to run three parallel ports (for some reason) you should put LPT1 at 3BCh. By default most new systems put LPT1 at 378h and will not support three parallel ports.

Problems With Logical Device Names
Most of the problems that arise with the use of logical device names occur when devices are added or removed from the system. The most common problem is software that will refuse to work because the logical device name assigned to a physical device has changed, as a result of a device being added to or removed from the system.
Most software refers to a device by its name such as "LPT1". However, the names are assigned dynamically by the BIOS at boot time, when it searches your system to see what hardware it has. If you originally had "LPT1" at 378-37Fh and you add a new parallel port and give it the address 3BC-3BFh, then the new one will now be LPT1 and your old port will become LPT2. This is because, as mentioned before, the ports are labeled dynamically based on a predefined search order, and 3BC is looked at first. If this happens, all of your software that used to print to LPT1 will now print to LPT2, and you will either have to switch the devices' connections to the PC, or change the software.
Next: Memory Addresses and Device BIOSes <addr.htm>

Home <../../../index.htm> - Search <../../../search.htm> - Topics <../../../topic.html> - Up <index.htm>
Memory Addresses and Device BIOSes
While not really considered a standard system resource like the others mentioned in this section, a brief discussion of memory addresses is warranted here. Some devices, in addition to using interrupt lines, DMA channels and/or I/O addresses, require some space in the upper memory area <../../ram/logic_UMA.htm> for their own use. As with other resources, problems and conflicts can result if you attempt to overlap two such devices, or try to use the memory for programs when an adapter needs it.
The devices that use a memory area generally use it for their own BIOS, which contains code to control the device and is invoked by direct calls or calls from the internal system BIOS. These BIOSes are "mapped" into the upper memory area in particular places and the BIOS looks for them there and executes them if found. This is part of the system boot process <../bios/boot_Sequence.htm>.
There are three standard BIOSes present in most systems and located pretty much at the same place:
·         System BIOS: The main system BIOS is located in a 64KB block of memory from F0000h to FFFFFh.
·         VGA Video BIOS: This is the BIOS that controls your video card. It is normally in the 32KB block from C0000h to C7FFFh.
·         IDE Hard Disk BIOS: The BIOS that controls your IDE hard disk, if you have IDE in your system (which most do) is located from C8000h to CBFFFh.
The most common add-in device to use a dedicated memory address space for its own BIOS is a SCSI host adapter. This may default to C8000-CBFFFh, which will conflict with an IDE drive that is also in the system, but can be configured to use a different address space instead, such as D0000-D7FFFh. In addition, network cards that have the ability to boot the computer over the network typically also use a memory area for the boot BIOS.
Warning: Many systems use a memory manager (like EMM386) to allow the unused system RAM in the upper memory area to be used by programs, to save conventional memory (the standard 640KB normally available to programs.) If your system does this and you add a device that needs some of the upper memory area for its BIOS, you may have to add a parameter to the memory manager to tell it not to try to use the space that the device needs. See here for more details <../../ram/logic_UMB.htm>.

                                                                                                                                                             










No comments:

Post a Comment