The system cache
is responsible for a great deal of the system performance improvement of
today's PCs. The cache is a buffer of sorts between the very fast processor and
the relatively slow memory that serves it. (The memory is not really slow, it's just that the processor is
much faster.) The presence of the cache allows the processor to do its work
while waiting for memory far less often than it otherwise would.
There are in fact
several different "layers" of cache in a modern PC, each acting as a
buffer for recently-used information to improve performance, but when "the
cache" is mentioned without qualifiers, it normally refers to the "secondary"
or "level 2" cache that is placed between the processor and system
RAM. The various levels of cache are discussed here, in the discussion on the
theory and operation behind cache (since many of the principles are the same).
However, most of the focus of this section is on the level 2 system cache.
Role of Cache
in the PC
In early PCs, the
various components had one thing in common: they were all really slow :^). The processor was running at 8
MHz or less, and taking many clock cycles to get anything done. It wasn't very
often that the processor would be held up waiting for the system memory,
because even though the memory was slow, the processor wasn't a speed demon
either. In fact, on some machines the memory was faster than the processor.
In the 15 or so
years since the invention of the PC, every component has increased in speed a
great deal. However, some have increased far faster than others. Memory, and
memory subsystems, are now much faster than they were, by a factor of 10 or
more. However a current top of the line processor has performance over 1,000
times that of the original IBM PC!
This disparity in
speed growth has left us with processors that run much faster than everything
else in the computer. This means that one of the key goals in modern system
design is to ensure that to whatever extent possible, the processor is not
slowed down by the storage devices it works with. Slowdowns mean wasted
processor cycles, where the CPU can't do anything because it is sitting and
waiting for information it needs. We want it so that when the processor needs
something from memory, it gets it as soon as possible.
The best way to
keep the processor from having to wait is to make everything that it uses as
fast as it is. Wouldn't it be best just to have memory, system buses, hard
disks and CD-ROM drives that just went as fast as the processor? Of course it
would, but there's this little problem called "technology" that gets
in the way. :^)
Actually, it's
technology and cost; a modern 2 GB hard disk costs less than $200 and has a
latency (access time) of about 10 milliseconds. You could implement a 2 GB hard
disk in such a way that it would access information many times faster; but it
would cost thousands, if not tens of thousands of dollars. Similarly, the
highest speed SRAM <../../ram/types_SRAM.htm>
available is much closer to the speed of the processor than the DRAM <../../ram/types_DRAM.htm> we use for
system memory, but it is cost prohibitive in most cases to put 32 or 64 MB of
it in a PC.
There is a good
compromise to this however. Instead of trying to make the whole 64 MB out of
this faster, expensive memory, you make a smaller piece, say 256 KB. Then you
find a smart algorithm (process) that allows you to use this 256 KB in such a
way that you get almost as much benefit from it as you would if the whole 64 MB
was made from the faster memory. How do you do this? The short answer is by
using this small cache of 256 KB to hold the information most recently used by
the processor. Computer science shows that in general, a processor is much more
likely to need again information it has recently used, compared to a random
piece of information in memory. This is the principle behind caching.
"Layers"
of Cache
There are in fact
many layers of cache in a modern PC. This does not even include looking at
caches included on some peripherals, such as hard disks. Each layer is closer
to the processor and faster than the layer below it. Each layer also caches the
layers below it, due to its increased speed relative to the lower levels:
Level Devices Cached
Level 1 Cache Level 2 Cache, System RAM, Hard Disk / CD-ROM
Level 2 Cache System RAM, Hard Disk / CD-ROM
System RAM Hard Disk / CD-ROM
Hard Disk / CD-ROM --
What happens in general terms is this. The
processor requests a piece of information. The first place it looks is in the
level 1 cache, since it is the fastest. If it finds it there (called a hit on the cache), great; it uses it
with no performance delay. If not, it's a miss
and the level 2 cache is searched. If it finds it there (level 2
"hit"), it is able to carry on with relatively little delay.
Otherwise, it must issue a request to read it from the system RAM. The system
RAM may in turn either have the information available or have to get it from
the still slower hard disk or CD-ROM. The mechanics of how the processor
(really the chipset controlling the cache and memory) "look" for the
information in these various places is discussed
here <func.htm>.
It is important to realize just how slow some of
these devices are compared to the processor. Even the fastest hard disks have
an access time measuring around 10 milliseconds. If it has to wait 10
milliseconds, a 200 MHz processor will waste 2 million clock cycles! And
CD-ROMs are generally at least 10 times slower. This is why using caches to
avoid accesses to these slow devices is so crucial.
Caching actually goes even beyond the level of the
hardware. For example, your web browser uses caching itself, in fact, two
levels of caching! Since loading a web page over the Internet is very slow for
most people, the browser will hold recently-accessed pages to save it having to
re-access them. It checks first in its memory cache and then in its disk cache
to see if it already has a copy of the page you want. Only if it does not find
the page will it actually go to the Internet to retrieve it.
Level 1
(Primary) Cache
Level 1 or primary cache is the fastest memory on
the PC. It is in fact, built directly into the processor itself. This cache is
very small, generally from 8 KB to 64 KB, but it is extremely fast; it runs at
the same speed as the processor. If the processor requests information and can
find it in the level 1 cache, that is the best case, because the information is
there immediately and the system does not have to wait. The level 1 cache is discussed in more detail here
<../../cpu/arch/int/comp_Cache.htm>, in the section on
processors.
Note: Level 1 cache is also sometimes called "internal" cache since it
resides within the processor.
Level 2
(Secondary) Cache
The level 2 cache is a secondary cache to the
level 1 cache, and is larger and slightly slower. It is used to catch recent
accesses that are not caught by the level 1 cache, and is usually 64 KB to 2 MB
in size. Level 2 cache is usually found either on the motherboard or a
daughterboard that inserts into the motherboard. Pentium Pro processors
actually have the level 2 cache in the same package as the processor itself
(though it isn't in the same circuit where the processor and level 1 cache are)
which means it runs much faster than level 2 cache that is separate and resides
on the motherboard. Pentium II processors are in the middle; their cache runs
at half the speed of the CPU.
Note: Level 2 cache is also sometimes called "external" cache since it
resides outside the processor. (Even on Pentium Pros... it is on a separate
chip in the same package as the processor.)
Disk
Cache
A disk cache is a portion of system memory used to
cache reads and writes to the hard disk. In some ways this is the most
important type of cache on the PC, because the greatest differential in speed
between the layers mentioned here is between the system RAM and the hard disk.
While the system RAM is slightly slower than the level 1 or level 2 cache, the
hard disk is much slower than the
system RAM.
Unlike the level 1 and level 2 cache memory, which
are entirely devoted to caching, system RAM is used partially for caching but
of course for other purposes as well. Disk caches are usually implemented using
software (like DOS's SmartDrive). They are
discussed in more detail in the section on hard disk performance <../../hdd/perf/ext_Caching.htm>.
Peripheral
Cache
Much like the hard disk, other devices can be
cached using the system RAM as well. CD-ROMs are the most common device cached
other than hard disks, particularly due to their very
slow initial access time <../../cd/perf_Access.htm>, measured
in the tens to hundreds of milliseconds (which is an eternity to a computer).
In fact, in some cases CD-ROM drives are cached to the hard disk, since the
hard disk, despite its slow speed, is still much faster than a CD-ROM drive is.
</cgi-bin/ads_S.pl?advert=spcg>
</cgi-bin/ads_S.pl?advert=spcg>
Advertise on The PC Guide, and reach thousands of potential customers for incredibly reasonable rates! </cgi-bin/ads_S.pl?advert=spcg>
</cgi-bin/ads_S.pl?advert=spcg>Advertise on The PC Guide, and reach thousands of potential customers for incredibly reasonable rates! </cgi-bin/ads_S.pl?advert=spcg>
Function and Operation of the System Cache
This section discusses the principles behind the
design of cache memory, and explains how the secondary (level 2) cache works in
detail. This will give you a much better understanding of how the cache works
and what the issues are in its design--at least I hope it will, because that
was my primary goal in writing this. I was frustrated as I put the site
together with my inability to find anything on the 'net that really explained
how the cache worked.
This section is focused on the secondary cache,
but in fact, the function of the primary (level 1) cache built into modern
processors is in many ways identical: in terms of how associativity works, how
the cache is organized, how the system checks for hits, etc. However, many of
the implementation details are different.
Note: This is an advanced section with some potentially confusing concepts. I
make use of examples in order to hopefully make sure the explanations make
sense. You will find this section most helpful if you read all the subsections
it contains in order. You may also find reading the
section explaining system memory operation and timing
<../../ram/timing.htm> instructive. This page also makes
extensive reference to memory addresses and locations, and binary numbers. If
you are not familiar with binary mathematics, you may want to read this introductory page on the subject
<../../../intro/works/comput.htm>.
Why
Caching Works
Cache is in some ways a really amazing technology.
A 512 KB level 2 cache, caching 64 MB of system memory, can supply the
information that the processor requests 90-95% of the time. Think about the
ratios here: the level 2 cache is less than 1% of the size of the memory it is
caching, but it is able to register a "hit" on over 90% of requests.
That's pretty efficient, and is the reason why caching is so important.
The reason that this happens is due to a computer
science principle called locality of
reference. It states basically that even within very large programs with
several megabytes of instructions, only small portions of this code generally
get used at once. Programs tend to spend large periods of time working in one
small area of the code, often performing the same work many times over and over
with slightly different data, and then move to another area. This occurs because
of "loops", which are what programs use to do work many times in
rapid succession.
Just as one example (there are many), let's
suppose you start up your word processor and open your favorite document. The
word processor program at some point must read the file and then print on the
screen the text it finds. This is done (in very simplified terms) using code
similar to this:
·
Open document file.
·
Open screen window.
·
For each character in the
document:
·
Read the character.
·
Store the character into
working memory.
·
Write the character to the
window if the character is part of the first page.
·
Close the document file.
The loop is of course the three instructions that
are done "for each character in the document". These instructions
will be repeated many thousands of times, and there are hundreds or thousands
of loops like these in the software you use. Every time you hit "page
down" on your keyboard, the word processor must clear the screen, figure
out which characters to display next, and then run a similar loop to copy them
from memory to the screen. Several loops are used when you tell it to save the
file to the hard disk.
This example shows how caching improves
performance when dealing with program code, but what about your data? Not
surprisingly, access to data (your work files, etc.) is similarly repetitive.
When you are using your word processor, how many times do you scroll up and
down looking at the same text over and over, as you edit it? The system cache
holds much of this information so that it can be loaded more quickly the
second, third, and next times that it is needed.
How
Caching Works
In the example in the previous
section a loop was used to read characters from a file, store them
in working memory, and then write them to the screen. The first time each of
these instructions (read, store, write) is executed, it must be loaded from
relatively slow system memory (assuming it is in memory, otherwise it must be
read from the hard disk which is much, much slower even than the memory).
The cache is programmed (in hardware) to hold
recently-accessed memory locations in case they are needed again. So each of
these instructions will be saved in the cache after being loaded from memory
the first time. The next time the processor wants to use the same instruction,
it will check the cache first, see that the instruction it needs is there, and
load it from cache instead of going to the slower system RAM. The number of
instructions that can be buffered this way is a function of the size and design
of the cache.
Let's suppose that our loop is going to process
1,000 characters and the cache is able to hold all three instructions in the
loop (which sounds obvious, but isn't always, due to cache mapping techniques).
This means that 999 of the 1,000 times these instructions are executed, they
will be loaded from the cache, or 99.9% of the time. This is why caching is
able to satisfy such a large percentage of requests for memory even though it
has a capacity that is often less than 1% the size of the system RAM.
Parts
of the Level 2 Cache
The level 2 cache is comprised of two main
components. These are not usually physically located in the same chips, but
represent logically how the cache works. These parts of the cache are:
·
The Data Store: This is where the cached information is actually
kept. When reference is made to "storing something in the cache" or
"retrieving something from the cache", this is where the actual data
goes to or comes from. When someone says that the cache is 256 KB or 512 KB,
they are referring to the size of the data store. The larger the store, the
more information that can be cached and the more likelihood of the cache being
able to satisfy a request, all else being equal.
·
The Tag RAM: This is a small area of memory used by the cache
to keep track of where in memory the entries in the data store belong. The size
of the tag RAM--and not the size of
the data store--controls how much of main memory can be cached.
In addition to these memory areas are of course
the cache controller circuitry. Most of the work of controlling the level 2
cache on a modern PC is performed by the system
chipset <../chip/index.htm>.
Structure
of the Data Store
Many people think of the cache as being organized
as a large sequence of bytes (8 bits each). In fact, on a modern
fifth-generation or later PC, the level 2 cache is organized as a set of long cache lines, each containing 32 bytes
(256 bits). This means that each time the cache is written to or read from, a
transfer of 32 bytes takes place; there is no way to read or write just 1 byte.
This is done mainly for performance reasons. At the very least, you can't have
less than 64 bits per line of cache, because the data
bus on a Pentium or later PC is 64 bits wide
<../../cpu/arch/ext_DataSize.htm>. The data store is 256 bits
wide because memory is accessed in four-read bursts, and 4 times 64 is 256.
Let's take the case of a 512 KB cache (data
store). If we wanted to mentally envision how this memory is structured,
instead of seeing a single long column with 524,288 (512 K) individual rows, we
should instead see 32 columns and 16,384 (16 K) rows. Each access to the data
store is a line (row), and the cache has 16,384 different addresses.
Cache
Mapping and Associativity
A very important factor in determining the
effectiveness of the level 2 cache relates to how the cache is mapped to the
system memory. What this means in brief is that there are many different ways
to allocate the storage in our cache to the memory addresses it serves. Let's
take as an example a system with 512 KB of L2 cache and 64 MB of main memory.
The burning question is: how do we decide how to divvy up the 16,384 address
lines in our cache amongst the "huge" 64 MB of memory?
There are three different ways that this mapping
can generally be done. The choice of mapping technique is so critical to the
design that the cache is often named after this choice:
·
Direct Mapped Cache: The simplest way to allocate
the cache to the system memory is to determine how many cache lines there are
(16,384 in our example) and just chop the system memory into the same number of
chunks. Then each chunk gets the use of one cache line. This is called direct mapping. So if we have 64 MB of
main memory addresses, each cache line would be shared by 4,096 memory
addresses (64 M divided by 16 K).
·
Fully Associative Cache: Instead of hard-allocating
cache lines to particular memory locations, it is possible to design the cache
so that any line can store the contents of any memory location. This is called fully associative mapping.
·
N-Way Set Associative Cache: "N" here is a
number, typically 2, 4, 8 etc. This is a compromise between the direct mapped
and fully associative designs. In this case the cache is broken into sets where
each set contains "N" cache lines, let's say 4. Then, each memory
address is assigned a set, and can be cached in any one of those 4 locations
within the set that it is assigned to. In other words, within each set the cache is associative, and thus the name.
This design means that there are "N" possible places that a given memory location may be in the cache. The tradeoff is that there are "N" times as many memory locations competing for the same "N" lines in the set. Let's suppose in our example that we are using a 4-way set associative cache. So instead of a single block of 16,384 lines, we have 4,096 sets with 4 lines in each. Each of these sets is shared by 16,384 memory addresses (64 M divided by 4 K) instead of 4,096 addresses as in the case of the direct mapped cache. So there is more to share (4 lines instead of 1) but more addresses sharing it (16,384 instead of 4,096).
This design means that there are "N" possible places that a given memory location may be in the cache. The tradeoff is that there are "N" times as many memory locations competing for the same "N" lines in the set. Let's suppose in our example that we are using a 4-way set associative cache. So instead of a single block of 16,384 lines, we have 4,096 sets with 4 lines in each. Each of these sets is shared by 16,384 memory addresses (64 M divided by 4 K) instead of 4,096 addresses as in the case of the direct mapped cache. So there is more to share (4 lines instead of 1) but more addresses sharing it (16,384 instead of 4,096).
Conceptually, the direct mapped and fully
associative caches are just "special cases" of the N-way set
associative cache. You can set "N" to 1 to make a "1-way"
set associative cache. If you do this, then there is only one line per set,
which is the same as a direct mapped cache because each memory address is back
to pointing to only one possible cache location. On the other hand, suppose you
make "N" really large; say, you set "N" to be equal to the
number of lines in the cache (16,384 in our example). If you do this, then you
only have one set, containing all of the cache lines, and every memory location
points to that huge set. This means that any memory address can be in any line,
and you are back to a fully associative cache.
Comparison
of Cache Mapping Techniques
There is a critical tradeoff in cache performance
that has led to the creation of the various cache mapping techniques described
in the previous section. In order for the cache to have good performance you
want to maximize both of the following:
·
Hit Ratio: You want to increase as much as possible the
likelihood of the cache containing the memory addresses that the processor
wants. Otherwise, you lose much of the benefit of caching because there will be
too many misses.
·
Search Speed: You want to be able to determine as quickly as
possible if you have scored a hit in the cache. Otherwise, you lose a small
amount of time on every access, hit or
miss, while you search the cache.
Now let's look at the three cache types and see
how they fare:
·
Direct Mapped Cache: The direct mapped cache is the
simplest form of cache and the easiest to check for a hit. Since there is only
one possible place that any memory location can be cached, there is nothing to
search; the line either contains the memory information we are looking for, or
it doesn't.
Unfortunately, the direct mapped cache also has the worst performance, because again there is only one place that any address can be stored. Let's look again at our 512 KB level 2 cache and 64 MB of system memory. As you recall this cache has 16,384 lines (assuming 32-byte cache lines) and so each one is shared by 4,096 memory addresses. In the absolute worst case, imagine that the processor needs 2 different addresses (call them X and Y) that both map to the same cache line, in alternating sequence (X, Y, X, Y). This could happen in a small loop if you were unlucky. The processor will load X from memory and store it in cache. Then it will look in the cache for Y, but Y uses the same cache line as X, so it won't be there. So Y is loaded from memory, and stored in the cache for future use. But then the processor requests X, and looks in the cache only to find Y. This conflict repeats over and over. The net result is that the hit ratio here is 0%. This is a worst case scenario, but in general the performance is worst for this type of mapping.
Unfortunately, the direct mapped cache also has the worst performance, because again there is only one place that any address can be stored. Let's look again at our 512 KB level 2 cache and 64 MB of system memory. As you recall this cache has 16,384 lines (assuming 32-byte cache lines) and so each one is shared by 4,096 memory addresses. In the absolute worst case, imagine that the processor needs 2 different addresses (call them X and Y) that both map to the same cache line, in alternating sequence (X, Y, X, Y). This could happen in a small loop if you were unlucky. The processor will load X from memory and store it in cache. Then it will look in the cache for Y, but Y uses the same cache line as X, so it won't be there. So Y is loaded from memory, and stored in the cache for future use. But then the processor requests X, and looks in the cache only to find Y. This conflict repeats over and over. The net result is that the hit ratio here is 0%. This is a worst case scenario, but in general the performance is worst for this type of mapping.
·
Fully Associative Cache: The fully associative cache
has the best hit ratio because any line in the cache can hold any address that
needs to be cached. This means the problem seen in the direct mapped cache
disappears, because there is no dedicated single line that an address must use.
However (you knew it was coming), this cache suffers from problems involving searching the cache. If a given address can be stored in any of 16,384 lines, how do you know where it is? Even with specialized hardware to do the searching, a performance penalty is incurred. And this penalty occurs for all accesses to memory, whether a cache hit occurs or not, because it is part of searching the cache to determine a hit. In addition, more logic must be added to determine which of the various lines to use when a new entry must be added (usually some form of a "least recently used" algorithm is employed to decide which cache line to use next). All this overhead adds cost, complexity and execution time.
However (you knew it was coming), this cache suffers from problems involving searching the cache. If a given address can be stored in any of 16,384 lines, how do you know where it is? Even with specialized hardware to do the searching, a performance penalty is incurred. And this penalty occurs for all accesses to memory, whether a cache hit occurs or not, because it is part of searching the cache to determine a hit. In addition, more logic must be added to determine which of the various lines to use when a new entry must be added (usually some form of a "least recently used" algorithm is employed to decide which cache line to use next). All this overhead adds cost, complexity and execution time.
·
N-Way Set Associative Cache: The set associative cache is a
good compromise between the direct mapped and set associative caches. Let's
consider the 4-way set associative cache. Here, each address can be cached in
any of 4 places. This means that in the example described in the direct mapped
cache description above, where we accessed alternately two addresses that map
to the same cache line, they would now map to the same cache set instead. This set has 4 lines in it,
so one could hold X and another could hold Y. This raises the hit ratio from 0%
to near 100%! Again an extreme example, of course. As for searching, since the
set only has 4 lines to examine this is not very complicated to deal with,
although it does have to do this small search, and it also requires additional
circuitry to decide which cache line to use when saving a fresh read from
memory. Again, some form of LRU (least recently used) algorithm is typically
used.
Here's a summary table of the different cache
mapping techniques and their relative performance:
Cache Type Hit Ratio Search Speed
Direct Mapped Good Best
Fully Associative Best Moderate
N-Way Set Associative, N>1 Very Good, Better as N Increases Good, Worse
as N Increases
In the "real world", the direct mapped
and set associative caches are by far the most common. Direct mapping is used
more for level 2 caches on motherboards, while the higher-performance
set-associative cache is found more commonly on the smaller primary caches
contained within processors.
Tag
Storage
Since each cache line (or set) in the data store
is shared by a large number of memory addresses that map to it, we need to keep
track of which one is using each cache line at a given time. This is what the
tag RAM is used for.
Let's take a look at the same example again: a
system with 64 MB of main memory, a 512 KB cache, and 32-byte cache lines.
There are 16,384 cache lines, and therefore 4,096 different memory locations
that share each line. However, recall that each line contains 32 bytes; that
means 32 different bytes can be placed in each line without interfering with
each other. So really, there are 128 (4,096 divided by 32) different 32-byte
lines of memory that must share a cache spot.
Okay, now to address 64 MB of memory you need 26
address lines (because 2^26 is 64 M) which are numbered from A0 to A25. 512 KB
only requires 19 lines, A0 to A18. The difference between these is 7 lines; not
surprisingly, since 128 is 2^7. These 7 address lines are what tell you which
of the 128 different address lines that can use a given cache line, are
actually using it at the moment. That's what the tag RAM is for. There will be
as many entries in the tag RAM as there are in the data store, so we will have
16,384 tag RAM lines, although of course these entries are only a few bits
wide, not 32 bytes wide like the data store.
Notice that the tag RAM is used early in the
process of determining whether or not we have a cache hit. This means that no
matter how fast the cache data store is, the tag RAM must be slightly faster.
How the
Memory Address Is Used
The memory address provided by the processor
represents which byte of information the processor is looking for at a given
time. This is looked at in three sections by the cache controller as it does
its work of checking for hits. This example is the same as before (64 MB
memory, 512 KB cache, direct mapping to keep things simple) so we again have 26
address bits, A0 through A25:
·
A0 to A4: The lowest-order 5 bits represent the 32
different bytes within the data store (2^5 = 32). Recall that the cache we are
looking at has 32 byte lines, all of which are moved around together.
Therefore, the address bits A0 to A4 are ignored by the cache controller; the
processor will use them later to determine which to use of the 32 bytes it
receives from the cache.
·
A5 to A18: These 14 bits represent the cache line that this
address maps to. 2^14 is 16,384, which is the total number of cache lines in
our example, as you recall. This cache line address is used for looking up both
the tag address in the tag RAM, and later the actual data in the data store if
there is a hit.
·
A19 to A25: These 7 bits represent the tag address, which
tells the system which of the possible memory locations that share the cache
line (indicated by address lines A5 to A18) is currently using it.
If the numbers in the example change, so do these
ranges. If instead we have 32 MB of memory, 128 KB of cache, and 16 byte cache
lines, then A0 to A3 are ignored, A4 to A16 represent the cache line address,
and A17 to A24 are the tag address.
Cache
Write Policy and the Dirty Bit
In addition to caching reads from memory, the
system is capable of caching writes to
memory. The handling of the address bits and the cache lines, etc. is pretty
similar to how this is done when the cache is read. However, there are two
different ways that the cache can handle writes, and this is referred to as the
"write policy" of the cache.
·
Write-Back Cache: Also called "copy
back" cache, this policy is "full" write caching of the system
memory. When a write is made to system memory at a location that is currently
cached, the new data is only written to the cache, not actually written to the
system memory. Later, if another memory location needs to use the cache line
where this data is stored, it is saved ("written back") to the system
memory and then the line can be used by the new address.
·
Write-Through Cache: With this method, every time
the processor writes to a cached memory location, both the cache and the
underlying memory location are updated. This is really sort of like "half
caching" of writes; the data just written is in the cache in case it is
needed to be read by the processor soon, but the write itself isn't actually
cached because we still have to initiate a memory write operation each time.
Many caches that are capable of write-back
operation can also be set to operate as write-through (not all however), but
not generally the other way around.
Comparing the two policies, in general terms
write-back provides better performance, but at the slight risk of memory
integrity. Write-back caching saves the system from performing many unnecessary write cycles to the
system RAM, which can lead to noticeably faster execution. However, when
write-back caching is used, writes to cached memory locations are only placed
in cache, and the RAM itself isn't actually updated until the cache line is
booted out to make room for another address to use it.
As a result, at any given time, there can be a
mismatch between many of the lines in the cache and the memory addresses that
they correspond to. When this happens, the data in the memory is said to be
"stale", since it doesn't have the fresh information yet that was
only written to the cache. Memory used with a write-through cache can never be
"stale" because the system memory is written at the same time that
the cache is.
Normally, stale memory isn't a problem, because
the cache controller keeps track of which locations in the cache have been
changed and therefore which memory locations may be stale. This is done by
using an extra single bit of memory, one per cache line, called the "dirty
bit". Whenever a write is cached, this bit is set (made a 1) to tell the
cache controller "when you decide to re-use this cache line for a
different address, you need to write the current contents back to memory".
This dirty bit is normally implemented by adding one extra bit to the tag RAM,
instead of using a separate memory chip (to save cost).
However, the use of a write-back cache does entail
the small possibility of data corruption if something were to happen before the
"dirty" cache lines could be saved to memory. There aren't too many
cases where this could happen, because both the memory and the cache are
volatile (cleared when the machine is powered off).
On the other hand, consider a disk cache, where system memory is used to cache writes to the
disk. Here, the memory is volatile but the disk is not. If a write-back cache
is used here, you could have stale data on your disk compared to what is in
memory. Then, if the power goes out, you lose everything that hadn't yet been
written back to the disk, leading to possible corruption. For this reason, most
disk caches allow programs to over-rule the write-back policy to ensure
consistency between the cache (in memory) and disk. Disk utilities, for
example, don't like write-back caching very much!
It is also possible with many caches to tell the
controller "please write out to system memory all dirty cache lines, right now". This is done when it is
necessary to make sure that the cache is in sync with the memory, and there is
no stale data. This is sometimes called "flushing" the cache, and is
especially common with disk caches, for the reason outlined in the previous
paragraph.
Summary:
The Cache Read/Write Process
Having looked at all the parts and design factors
that make up a cache, in this section the actual process is described that is
followed when the processor reads or writes from the system memory. This
example is the same as in the other sections on this page: 64 MB memory, 512 KB
cache, 32 byte cache lines. I will assume a direct mapped cache, since that is
the simplest to explain (and is in fact most common for level 2 cache):
The processor begins a read/write from/to the system memory.
Simultaneously, the cache controller begins to check if the information
requested is in the cache, and the memory controller begins the process of
either reading or writing from the system RAM. This is done so that we don't
lose any time at all in the event of a cache miss; if we have a cache hit, the
system will cancel the partially-completed request from RAM, if appropriate. If
we are doing a write on a write-through cache, the write to memory always
proceeds.
The cache controller checks for a hit by looking at the address sent by the
processor. The lowest five bits (A0 to A4) are ignored, because these
differentiate between the 32 different bytes in the cache line. We aren't
concerned with that because the cache will always return the whole 32 bytes and
let the processor decide which one it wants. The next 14 lines (A5 to A18)
represent the line in the cache that we need to check (notice that 2^14 is
16,384).
The cache controller reads the tag RAM at the address indicated by the 14
address lines A5 to A18. So if those 14 bits say address 13,714, the controller
will examine the contents of tag RAM entry #13,714. It compares the 7 bits that
it reads from the tag RAM at this location to the 7 address bits A19 to A25
that it gets from the processor. If they are identical, then the controller
knows that the entry in the cache at that line address is the one the processor
wanted; we have a hit. If the tag RAM doesn't match, then we have a miss.
If we do have a hit, then for a read, the cache controller reads the
32-byte contents of the cache data store at the same line address indicated by
bits A5 to A18 (13,714), and sends them to the processor. The read that was
started to the system RAM is canceled. The process is complete. For a write,
the cache controller writes 32 bytes to the data store at that same cache line
location referenced by bits A5 to A18. Then, if we are using a write-through
cache the write to memory proceeds; if we are using a write-back cache, the
write to memory is canceled, and the dirty bit for this cache line is set to 1
to indicate that the cache was updated but the memory was not.
If we have a miss and we were doing a read, the read of system RAM that we
started earlier carries on, with 32 bytes being read from memory at the
location specified by bits A5 to A25. These bytes are fed to the processor,
which uses the lowest five bits (A0 to A4) to decide which of the 32 bytes it
wanted. While this is happening the cache also must perform the work of storing
these bytes that were just read from memory into the cache so they will be
there for the next time this location is wanted. If we are using a
write-through cache, the 32 bytes are just placed into the data store at the
address indicated by bits A5 to A18. The contents of bits A19 to A25 are saved
in the tag RAM at the same 14-bit address, A5 to A18. The entry is now ready
for any future request by the processor. If we are using a write-back cache,
then before overwriting the old contents of the cache line, we must check the
line's dirty bit. If it is set (1) then we must first write back the contents
of the cache line to memory, and then clear the dirty bit. If it is clear (0)
then the memory isn't stale and we continue without the write cycle.
If we have a cache miss and we were doing a write, interestingly, the cache
doesn't do much at all, because most caches don't update the cache line on a
write miss. They just leave the entry that was there alone, and write to
memory, bypassing the cache entirely. There are some caches that put all writes
into the appropriate cache line whenever a write is done. They make the general
assumption that anything the processor has just written, it is likely to read
back again at some point in the near future. Therefore, they treat every write as a hit, by definition.
This means there is no check for a hit on a write; in essence, the cache line
that is used by the address just written is always replaced by the data that
was just put out by the processor. It also means that on a write miss the cache
controller must update the cache, including checking the dirty bit on the entry
that was there before the write, exactly the same as what happens for a read
miss.
As complex as it already is :^) this example would
of course be even more complex if we used a set associative or fully
associative cache. Then we would have a search to do when checking for a hit,
and we would also have the matter of deciding which cache line to update on a
cache miss.
</cgi-bin/ads_S.pl?advert=spcg>
</cgi-bin/ads_S.pl?advert=spcg>
Advertise on The PC Guide, and reach thousands of potential customers for incredibly reasonable rates! </cgi-bin/ads_S.pl?advert=spcg>
</cgi-bin/ads_S.pl?advert=spcg>Advertise on The PC Guide, and reach thousands of potential customers for incredibly reasonable rates! </cgi-bin/ads_S.pl?advert=spcg>
Cache Characteristics
This section discusses the different features of
the level 2 cache. These are the characteristics you will normally need to
understand when making a motherboard selection, or upgrading the cache in your
existing system. Some of the descriptions in this section are explained in much
more detail in Function and Operation of the System
Cache <func.htm>. The focus of this page is on the
higher-level performance aspects of the various cache features.
Cache
Speed
There is no single number that dictates completely
the "speed" of the system cache. Instead, we must consider the raw
speed of the components used, as well as how the circuitry chooses to use them.
These considerations are identical to how they are when looking at the system
RAM itself; saying "my RAM is 60 ns" tells
only a small part of the story <../../ram/timing_Ratings.htm>.
The "raw" speed of the cache is the
speed of the RAM chips used to make it. Caches are normally made from static RAM chips (SRAM) <../../ram/types_SRAM.htm>,
unlike main system memory which is made from dynamic
RAM (DRAM) <../../ram/types_DRAM.htm>. The short version of
the difference between the two, is that static RAM is faster but also more
expensive. The access speed of SRAMs are normally rated in the tens of
nanoseconds. SRAMs normally have a speed of 7 to 20 ns; DRAMs on the other hand
are usually 50 to 70 ns.
The speed of the SRAM chips gives the upper bound
on performance. It is up to the motherboard and chipset designer to make full
use of the speed. Let's consider a Pentium motherboard with a memory bus speed
running at 66 MHz. This means 66.66 million cycles per second; if we take the
reciprocal of this it gives the cycle time, which is 15 nanoseconds (1 divided
by 66 million). In order for the motherboard to be able to read from the cache
in one cycle at this speed, the SRAM must be faster than 15 ns in speed (there
is some overhead time as well so exactly 15 ns won't work). If the SRAM is
faster than this, there will be no additional benefit; if it is slower, timing
problems will occur, which usually manifest themselves as memory errors and
system lockups.
The tag RAM
<func_Tag.htm> used as part of the cache must normally be
faster than the actual cache data store
<func_Store.htm>. This is because the tag RAM must be read
first to check for a cache hit. We want to be able to check the tag and still
have enough time to read the cache within a single clock cycle, if we have a
hit. So for example, you may find that your system's main cache chips are 15
ns, while the tag may be 12 ns.
The more complicated the cache mapping technique,
the more important the difference in speed between the tag and the data store.
Simple techniques like direct mapping don't generally require much difference
at all. Your system may use the same speed for all the memory in this case; for
example, if the system needs 15 ns for the tag and 16 ns for the data store,
the motherboard may just specify 15 ns for everything since this is simpler. In
any event, if your motherboard doesn't already come with the level 2 cache on
it, you should buy for it whatever the motherboard manual or your dealer
specifies.
The true speed of any cache, in terms of how
quickly it really transfers information to and from the processor so that you
get faster speed in your applications, is dependent on the cache controller and
other chipset circuits. The capabilities of the chipset determine what kind of
transfer technologies your cache can use. This in turn determines your cache's
optimal system timing, the number of clock cycles required to move data in and
out of the cache. This is discussed in detail in
this section <timing.htm>.
The performance of the cache obviously also is
greatly dependent on the speed that the cache subsystem is running at. In a
typical Pentium machine this is the speed of the memory bus, 66 MHz. However a
Pentium Pro processor has an integrated level 2
cache <struct_Integrated.htm>, which runs at full processor
speed, normally 180 or 200 MHz. Obviously, this will yield superior
performance! The Intel Pentium II uses instead a daughterboard
cache <struct_Daughterboard.htm> with level 2 caches running
at half the processor speed, which with a 233 or 266 MHz chip will still mean
much better performance than running the cache at 66 MHz.
Cache
Size
The size of the cache normally refers actually to
the size of the data store, where the memory elements are actually stored. A
typical PC level 2 cache is either 256 KB or 512 KB, but can be as small as 64
KB on older machines, or as high as 1 MB or even 2 MB. Within processors, level
1 cache usually ranges in size from 8 KB to 64 KB.
The more cache the system has, the more likely it
is to register a hit on a memory access, because fewer memory locations are
forced to share the same cache line. Let's use an example to illustrate (the same
one we used when we discussed cache operation in
detail <func.htm>.). We have a system with 64 MB of memory and
512 KB of direct-mapped cache, arranged into 32-byte cache lines. This means
that we have 16,384 cache lines (512 K divided by 32). Each line is shared by
4,096 memory addresses (64 MB divided by 16,384). Now if we increase the amount
of cache to 1 MB, we will have 32,768 cache lines, and each will only be shared
by 2,048 addresses. Conversely, if we leave the cache at 512 KB but increase
the system memory to 256 MB, each of the 16,384 cache lines will be shared by
16,384 addresses.
There are many areas in the computer world where
Pareto's Law applies, and cache size is definitely one of them. If you have a
256 KB cache on a system using 32 MB, increasing the cache by 100% to 512 KB
will probably result in an increase in the hit ratio of less than 10%. Doubling
it again will likely result in an increase of less than 5%. In the real world,
this differential is not noticeable to most people. However, if you greatly
increase the amount of system memory you use, you will probably want to up your
cache total as well to prevent a degradation in performance. Just make sure you
watch closely the system RAM cacheability issue.
System
RAM Cacheability
This is one of the most misunderstood aspects of
the caching equation. The amount of RAM that the system can cache is very
important if you are going to be using a lot of system memory. Almost all
modern fifth generation systems can cache 64 MB of system memory. However, many systems, even newer ones, cannot
cache more than 64 MB of memory. Intel's popular 430FX ("Triton I"),
430VX (one of the "Triton II"s, also called "Triton III")
and 430TX chipsets, do not cache more than 64 MB of main memory. There are
millions and millions of these PCs on the market.
If you put more memory in a system than can be
cached, the result is a performance decrease. The speed differential between
the cache and memory is significant; that's why we use it. :^) When some of
that memory is not cached, the system must go to memory for every access to
that uncached memory, which is much slower. In addition, when using a
multitasking operating system (pretty much anything other than DOS these days)
you can't really control what ends up in cached memory and what ends up in
non-cached memory, unless you really know what you are doing.
The keys to how much memory your system can cache
are first, the design of the chipset, and second, the width of the tag RAM. The
more memory you have, the more address lines you need to specify an address.
This means that you have more address bits to store in the tag RAM to use in
order to check for a cache hit. Of course if the chipset isn't designed to
cache more than 64 MB, an extra wide tag RAM won't help anyway.
Let's take our standard example again; 64 MB of
memory, 512 KB cache, 32-byte cache lines. As we
described in detail in this section <func_Address.htm>, 64 MB
means 26 address lines (A0 to A25); A0 to A4 specify the byte in the cache
line, A5 to A18 specify the cache line, and A19 to A25 go into the tag RAM to
specify which memory address is currently using the cache line. That's 7 bits;
let's say our tag RAM is 8 bits wide, and we are reserving one bit for the "dirty bit", to allow write-back operation of the
cache <func_Write.htm>. So we're fine, we have enough tag
memory in the cache. Now, suppose we add another 32 MB of memory. To address 96
MB you need another address line, A26, to be held in the tag RAM. Hmm, we have
a problem, because now we need 9 bits in our tag RAM and it only has 8.
The only mainstream Pentium chipset to support
caching over 64 MB is the 430HX "Triton II" chipset by Intel. In
actual fact, caching over 64 MB on this chipset is considered
"optional"; the motherboard manufacturer has to make sure to use an
11-bit tag RAM instead of the default 8-bit. The extra 3 bits increase
cacheability from 64 MB to 512 MB (2^3=8, and 64*8=512).
Many people confuse the issue of system RAM size
and system RAM cacheability. The common thought is that adding more cache will
let you cache more RAM, but you can see that really it is the tag RAM and
chipset that controls this. Further complicating the matter is that some
companies put extra tag RAM on their COASt
<struct_COASt.htm> modules. So a user will insert a 256 KB
COASt module, and think that increasing his cache let him cache more system
memory, when really it was the extra tag RAM that did it.
Pentium Pro PCs use an integrated level 2 cache
that contains the tag RAM within it, so none of this is really a concern for
these machines. The Pentium Pro will cache up to 4 GB of main memory, basically
anything you can throw at it. The Pentium II uses an SEC daughtercard. It has
the same general architecture as the Pentium Pro, but due to a design
limitation will "only" cache up to 512 MB. This isn't nearly as much
of an issue as a 64 MB barrier, but considering that the PII is used in many
high-end applications, this might be a concern for some people.
One question that people ask a lot is: "How
much will the system slow down if I have more RAM in it than can be
cached?" There is no easy answer to this question, because it depends both
on the system and what you are doing with it. Somewhere between 5% and 25% is
most likely, but you should bear something else in mind: adding real physical
memory to the system is one way to avoid the extreme slowdown to the system
that occurs when it runs out of real memory and must use virtual memory <../../ram/size_Virtual.htm>.
If you are doing heavy multitasking and notice that the system is thrashing,
you will always be better off to have more memory, even uncached, instead of
having the system swap a great deal to disk. Of course having all the memory
cached is still preferred.
Integrated
vs. Separate Data and Instruction Caches
Most (all?) level 2 caches work on both data and
processor instructions (code, programs). They don't differentiate between the
two because they view both as just memory addresses. However, many processors
use a split design for their level 1 cache. For example, the Intel
"Classic" Pentium (P54C) processor uses an 8 KB cache for data, and a
separate 8 KB cache for program instructions. This is more efficient due to the
way the processor is designed, and doesn't really affect performance very much
compared to a single 16 KB cache, though it might lead to a very slightly lower
hit ratio. Each of these caches can have different characteristics. For example
they can use different mapping techniques (as they do on the Pentium Pro).
Mapping
Technique
The cache mapping technique is another factor that
determines how effective the cache is, that is, what its hit ratio and speed
will be. This is discussed in detail in this
section <func_Mapping.htm>, but briefly, the three types are:
·
Direct Mapped Cache: Each memory location is mapped
to a single cache line that it shares with many others; only one of the many
addresses that share this line can use it at a given time. This is the simplest
technique both in concept and in implementation. Using this cache means the
circuitry to check for hits is fast and easy to design, but the hit ratio is
relatively poor compared to the other designs because of its inflexibility.
Motherboard-based system caches are typically direct mapped.
·
Fully Associative Cache: Any memory location can be
cached in any cache line. This is the most complex technique and requires
sophisticated search algorithms when checking for a hit. It can lead to the
whole cache being slowed down because of this, but it offers the best theoretical
hit ratio since there are so many options for caching any memory address.
·
N-Way Set Associative Cache: "N" is typically 2,
4, 8 etc. A compromise between the two previous design, the cache is broken
into sets of "N" lines each, and any memory address can be cached in
any of those "N" lines. This improves hit ratios over the direct
mapped cache, but without incurring a severe search penalty (since
"N" is kept small). The 2-way or 4-way set associative cache is
common in processor level 1 caches.
Write Policy
The cache's write policy determines how it handles
writes to memory locations that are currently being held in cache. Described in more detail here <func_Write.htm>,
the two policy types are:
·
Write-Back Cache: When the system writes to a
memory location that is currently held in cache, it only writes the new
information to the appropriate cache line. When the cache line is eventually
needed for some other memory address, the changed data is "written back"
to system memory. This type of cache provides better performance than a
write-through cache, because it saves on (time-consuming) write cycles to
memory.
·
Write-Through Cache: When the system writes to a
memory location that is currently held in cache, it writes the new information
both to the appropriate cache line and the memory location itself at the same
time. This type of caching provides worse performance than write-back, but is
simpler to implement and has the advantage of internal consistency, because the
cache is never out of sync with the memory the way it is with a write-back
cache.
Both write-back and write-through caches are used
extensively, with write-back designs more prevalent in newer and more modern
machines.
Transactional
or Non-Blocking Cache
Most caches can only handle one outstanding
request at a time. If a request is made to the cache and there is a miss, the
cache must wait for the memory to supply the value that was needed, and until
then it is "blocked". A non-blocking
cache has the ability to work on other requests while waiting for memory to
supply any misses.
The Intel Pentium Pro
<../../cpu/fam/g6_PPro.htm> and Pentium
II <../../cpu/fam/g6_PII.htm> processors use this technology
for their level 2 caches, which can manage up to four simultaneous requests.
This is done by using a transaction-based architecture, and a dedicated "backside" <../../cpu/arch/ext_Backside.htm>
bus for the cache that is independent of the main memory bus. Intel calls this
"dual independent bus" (DIB) architecture.
Home <../../../index.htm> - Search <../../../search.htm>
- Topics <../../../topic.html> - Up <index.htm>
</cgi-bin/ads_S.pl?advert=scru>
</cgi-bin/ads_S.pl?advert=scru>
Memory upgrades tailored to your PC: improve your system and help support The PC Guide! </cgi-bin/ads_S.pl?advert=scru>
</cgi-bin/ads_S.pl?advert=scru>Memory upgrades tailored to your PC: improve your system and help support The PC Guide! </cgi-bin/ads_S.pl?advert=scru>
Cache Transfer Technologies and Timing
One of the most important factors directly
influencing the performance of the level 2 cache is the technology used to
transfer information to and from the processor. There are three main types of
cache technology currently in use in motherboards; the capabilities of the
chipset (in particular, the cache controller) dictate which your system will
use.
"Timing" refers to the number of clock
cycles required to perform the data transfers to and from the cache or
processor, and this is a function of the technology used (among other things).
Timing is a complex matter involving various characteristics of the processor,
cache, memory, chipset, etc. Iin general, however, the fewer clock cycles it
takes to transfer data, the faster the system. System
timing is described in detail here <../../ram/timing.htm>, in
the memory chapter.
Cache
Bursting
In a typical level 2 cache each cache line
contains 32 bytes, and transfers to and from the cache occur 32 bytes (256
bits) at a time. The normal transfer paths (for a fifth- or sixth-generation
machine) are only 64 bits wide, which means four transfers are done in
sequence. Because the transfers are from consecutive memory locations there is
no need to specify a different address after the first one; this makes the
second, third and fourth accesses extremely fast.
This high-performance access is called
"bursting" or using the cache in "burst mode". All modern
level 2 caches use this type of access. The timing, in clock cycles, to perform
this quadruple read is normally stated as "x-y-y-y". For example,
with ‘3-1-1-1" timing the first read takes 3 clock cycles and the next
three take 1 each, for a total of 6. Obviously, the lower these numbers, the
better.
Note: This is almost identical to the way burst
transfers are done to and from memory <../../ram/timing_Burst.htm>
in modern systems, except faster.
Asynchronous
Cache
The oldest and slowest type of cache timing is
asynchronous cache. Asynchronous means that transfers are not tied to the
system clock. A request is sent to the cache, and the cache responds, and this
happens independently of what the system clock (on the memory bus) is doing.
This is similar to how most system memory works; your typical FPM or EDO memory
is also asynchronous (and relatively slow, for this reason.)
Because asynchronous cache is not tied to the
system clock, it can have problems dealing with faster clock speeds. At slow
speeds like 33 MHz it is capable of 2-1-1-1 timing (which is very good) but at
speeds like 60 or 66 MHz as used in modern Pentium class PCs it drops down to
3-2-2-2 (which is pretty bad.) For this reason, asynchronous cache is commonly
found on 486 class motherboards but is not generally used on Pentium or later
class machines.
Synchronous
Burst Cache
Unlike asynchronous cache, which operates
independently of the system clock, synchronous cache is tied to the memory bus
clock. Each tick of the system clock, a transfer can be done to or from the
cache (if it is ready). This means that it is capable of handling faster system
speeds without slowing down the way asynchronous cache does. However, the
faster the system runs, the faster the SRAM chips have to be, in order to keep
up. Otherwise timing problems (crashes, lockups) occur.
Even this type of cache slows down at very high
speeds. It is capable of 2-1-1-1 operation up to 66 MHz, but then it slows down
to 3-2-2-2 at higher speeds (which are starting to become more popular and will
become even moreso in the future). Synchronous burst cache never quite caught
on; pipelined burst cache was developed at around the same time and seemed to
take the market away from sync burst before the latter could really get going.
Pipelined
Burst (PLB) Cache
Pipelining is a technology commonly
used in processors <../../cpu/arch/int/exec_Pipelining.htm> to increase performance; in the pipelined burst (PLB) cache it is used in
a similar way. PLB cache adds special circuitry that allows the four data
transfers that occur in a "burst" to be done partially at the same
time. In essence, the second transfer begins before the first transfer is done,
just the way you can start pouring a second gallon of fluid down a pipeline
before the first gallon has finished exiting the other side.
Because of the complexity of the circuitry, a bit
more time is required initially to set up the "pipeline". For this
reason, pipelined burst cache is slightly slower than synchronous burst cache
for the initial read, requiring 3 clock cycles instead of 2 for sync burst.
However, this parallelism allows PLB cache to burst at a single clock cycle for
the remaining 3 transfers even up to very high clock speeds; this means 3-1-1-1
speed up to even 100 MHz bus speeds. PLB cache is now the standard for almost
all quality Pentium class motherboards.
Comparison
of Transfer Technology Performance
The table below shows a summary of the theoretical maximum system performance
of the various cache technologies at different system bus speeds. It is
theoretical because it is only possible with a chipset that supports it, fast
enough cache memory and other factors. Note how, interestingly, synchronous burst
is the best at the 60 and 66 MHz bus speeds common on so many Pentium machines
today. Despite this it is not nearly as common as pipelined burst cache.
Fortunately, PLB cache is only slightly slower, and holds more potential for
use at the higher system speeds that should take the market by storm in 1998:
Bus Speed (MHz) 33 50 60 66 75 83 100
Asynchronous 2-1-1-1 3-2-2-2 3-2-2-2 3-2-2-2 3-2-2-2 3-2-2-2 3-2-2-2
Synchronous Burst 2-1-1-1 2-1-1-1 2-1-1-1 2-1-1-1 3-2-2-2 3-2-2-2 3-2-2-2
Pipelined Burst 3-1-1-1 3-1-1-1 3-1-1-1 3-1-1-1 3-1-1-1 3-1-1-1 3-1-1-1
Home <../../../index.htm> - Search <../../../search.htm>
- Topics <../../../topic.html> - Up <index.htm>
This page has been served 22357 times. The PC
Guide (http://www.PCGuide.com) </cgi-bin/ads_S.pl?advert=spcd>
</cgi-bin/ads_S.pl?advert=spcd>
Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
</cgi-bin/ads_S.pl?advert=spcd>Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
Cache Structure and Packaging
System cache can come in many different physical
forms. This section describes the different types of packaging that cache is
normally found in. Which type your system uses is a function of your processor,
chipset and motherboard.
Integrated
Level 2 Cache
The Intel Pentium Pro processor comes with an
integrated level 2 cache. The "chip" that you plug into the
motherboard is really two chips. One is the processor itself (including the
level 1 cache) and the other is the level 2 cache. These processors are
available with 256 KB, 512 KB and 1 MB of level 2 cache. This is a very
performance-enhancing design, because it allows the level 2 cache to run at the
processor's internal speed (usually 180 or 200 MHz) instead of just the system
bus speed (60 or 66 MHz). It also gives you one less thing to worry about in
setting up a new system, because all of the support circuitry, tag RAM etc., is
inside the chip.
One drawback of this design is that it is not
possible to increase the level 2 cache without replacing the processor. These
processors are also very expensive due to the difficulty of manufacturing the
large chip required for the level 2 cache. Regular cache is made of many small
chips, whereas this one is made from one large chip. In addition, defects in
the level 2 cache often are not discoverable until after the processor and
cache are put into their shared package; this means the processor has to be
discarded as well if a defect is found in the cache chip. This is the main
reason that Intel moved away from putting integrated cache on its Pentium II
processor. No other CPUs currently use this design and it is unlikely that any
more ever will.
The integrated level 2 cache of the Pentium Pro is
also faster than the older cache used with fifth generation systems due to
performance enhancements. The main one is that the cache is transactional, or non-blocking
<char_Transactional.htm>.
Daughterboard
Cache
Starting with the Pentium
II processor (a.k.a. "Klamath") <../../cpu/fam/g6_PII.htm>
Intel has introduced a new form of packaging, called SEC (Single Edge Contact) <../../cpu/char/pack_SEC.htm>.
The integrated cache of the Pentium Pro processors ran at processor speed and
offered very high performance, but was very expensive to manufacture. The
motherboard cache of the regular Pentium was easy and cheap to produce but
offered lower performance. SEC is a compromise where the processor and cache
are mounted together on a small "daughterboard" that plugs into the
motherboard. This greatly reduces manufacturing costs, and also means that a
bad cache chip doesn't result in the processor being wasted.
This type of cache runs at a faster speed than it
would if it were on the motherboard, but slower than an integrated cache; this
is why it is a compromise between the other two designs. On the Pentium II the
level 2 cache runs at half the processor speed. So a 266 MHz Pentium II will
have a 133 MHz level 2 cache. Not as good as the 200 MHz Pentium Pro's
integrated cache, but a lot faster than running it at 66 MHz. The Pentium II's
cache is also non-blocking
<char_Transactional.htm>, like the Pentium Pro's.
Note: Even though the Pentium II has an architecture very similar to that of the
Pentium Pro, due to a design limitation it will only cache the first 512 MB of
system memory. The Pentium Pro will cache up to 4 GB of system memory.
Motherboard
Cache
The most common cache design places the chips
directly on the motherboard. On some older designs the cache is several SRAM
chips in sockets (which means it can be replaced, but also means it is more
prone to certain types of failures). On most newer motherboards it is in the
form of 1 to 4 chips soldered directly to the board. If the cache is socketed,
you can in some cases add extra SRAM chips to increase the size of the data
store. The exact chips you need to add depend on the motherboard; your manual
is a necessity here.
Some motherboard support the use of both soldered
cache and also a COASt module. To use
both you may need to change a jumper setting on the motherboard.
Warning: There are some motherboards that actually have fake level 2 cache on them. These are most common on 486
motherboards with two or so flat cache chips soldered directly to the
motherboard. In some cases, these chips are actually just empty plastic
packages! In many cases the BIOS is even hacked so that it will report external
(level 2) cache even when it doesn't exist. You can test for this by disabling the external cache
<../bios/set/adv_External.htm>. If you disable it and see no
performance difference in a good benchmark program, the cache may be fake.
COASt
Modules
Some motherboards use a cache packaging format
called COASt, which stands for
"Cache On A Stick". This is a silly name for what is in effect a
small circuit board similar to a single inline
memory module (SIMM) <../../ram/pack_SIMM.htm> that contains
cache SRAM chips on it. It is inserted into a special socket on the motherboard
often called a CELP ("card edge low profile"). Some motherboards only
use this socket for cache, some have only motherboard cache, and some have both.
Usually jumpers are used in this last case to tell the board what is being
used, although some boards will autodetect when a COASt module is added. See this procedure <../../../proc/physinst/coast.htm>
for instructions on adding a COASt module to the motherboard.
The CELP socket could have evolved into a standard
of sorts for COASt modules, much the way SIMMs and DIMMs are (mostly)
standardized in the memory area. However, this has not happened. Despite
standard-sounding names like "COASt V1.2" and whatnot, you cannot
rely on just any old COASt working in your motherboard. While many
manufacturers share COASt module types, many others use proprietary designs.
It's important to contact your motherboard vendor or manufacturer to ensure you
obtain the correct type for your PC.
Note: The COASt module often contains not just more data store for holding
cached entries, but also more tag RAM to allow for more system memory to be
cached. See here for more details
<char_Cacheability.htm>.
</cgi-bin/ads_S.pl?advert=scru>
</cgi-bin/ads_S.pl?advert=scru>
Memory upgrades tailored to your PC: improve your system and help support The PC Guide! </cgi-bin/ads_S.pl?advert=scru>
</cgi-bin/ads_S.pl?advert=scru>Memory upgrades tailored to your PC: improve your system and help support The PC Guide! </cgi-bin/ads_S.pl?advert=scru>
System Resources
This section takes a detailed look at the PC's
system resources. In some ways, everything in a PC is a resource--system RAM,
processor speed, hard disk space, etc. However, there are in particular several
special resources in the system that are shared by the various devices that use
it. These are not physical "parts" of the system for the mostpart,
though they have hardware that implement them of course. Rather they are logical parts of the system that control
how it works, and are referred to as the PC's system resources.
System resources are important because they must
be shared by the various devices in your PC. This includes not only the
motherboard and other main components, but also expansion devices, plug-in
cards and peripherals. The resources are primarily used for communication and
information transfer between these devices. For historical reasons, the amount
of some of these resources is very limited, and as you add more peripherals to
your system it can be difficult to find enough resources to satisfy all the
requirements. This can lead to resource
conflicts, which are one of the most common problems with configuring new
PCs--and often one of the most difficult to diagnose and correct.
This section looks at each of the types of system
resources found in your PC, along with the main hardware devices that control
them or access to them. For each one, listings and tables are provided to show
how the resources are usually allocated in a typical PC, as well as what
resources are sometimes used by various peripherals. Note that I consider a
(SoundBlaster or compatible) sound card as part of a basic PC today; they are
in most machines now--and are notorious resource hogs as well. In addition, the
important matter of resource conflicts is discussed, along with conflict
resolution. Finally, Plug and Play is examined, the relatively new system
designed to help make resource allocation easier and reduce conflicts
automatically.
Note: The term "system resources" is also sometimes used to refer to
special memory areas in various Windows operating systems. This is a different
concept altogether, that just happens to use the same name.
Home <../../../index.htm> - Search <../../../search.htm>
- Topics <../../../topic.html> - Up <../index.htm>
</cgi-bin/ads_S.pl?advert=spcd>
</cgi-bin/ads_S.pl?advert=spcd>
Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
</cgi-bin/ads_S.pl?advert=spcd>Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
Interrupt Function and Operation
This section takes a look at the interrupt lines
and the interrupt controller, describing how they work. This includes an
explanation of the different types of interrupts and a summary of the different
IRQ numbers used in the PC.
Why
Interrupts Are Used to Process Information
The processor is a highly-tuned machine that is
designed to (basically) do one thing at a time. However, we use our computers
in a way that requires the processor to at least appear to do many things at once. If you've ever used a
multitasking operating system like Windows 95, you've done this; you may have
been editing a document while downloading information on your modem and
listening to a CD simultaneously. The processor is able to do this by sharing
its time among the various programs it is running and the different devices
that need its attention. It only appears that the processor is doing many
things at once because of the blindingly high speed that it is able to switch
between tasks.
Most of the different parts of the PC need to send
information to and from the processor, and they expect to be able to get the
processor's attention when they need to do this. The processor has to balance
the information transfers it gets from various parts of the machine and make sure
they are handled in an organized fashion. There are two basic ways that the
processor could do this:
·
Polling: The processor could take turns going to each
device and asking if they have anything they need it to do. This is called polling the devices. In some situations
in the computer world this technique is used, however it is not used by the
processor in a PC for a couple of basic reasons. One reason is that it is
wasteful; going around to all the devices constantly asking if they need the
attention of the CPU wastes cycles that the processor could be doing something
useful. This is particularly true because in most cases the answer will be
"no". Another reason is that different devices need the processor's
attention at differing rates; the mouse needs attention far less frequently
than say, the hard disk (when it is actively transferring data).
·
Interrupting: The other way that the processor can handle
information transfers is to let the devices request them when they need its
attention. This is the basis for the use of interrupts. When a device has data
to transfer, it generates an interrupt that says "I need your attention
now, please". The processor then stops what it is doing and deals with the
device that requested its attention. It actually can handle many such requests
at a time, using a priority level for each to decide which to handle first.
It may seem like an inefficient way to run a
computer, having it be interrupted all the time. I'm sure it must remind you of
a day at the office, where the phone kept ringing every 5 minutes and you
couldn't get anything done. However, without the ringer on the phone, the
alternative would be to keep picking up the phone every 30 seconds to see if
someone was trying to call you, which even the most ardent telephone-hater
would have to admit is much worse. :^)
It's also interesting to put into perspective just
how fast the modern processor is compared to many of the devices that transfer
information to it. Let's imagine a very fast typist; say, 120 words per minute.
At an average of 5 letters per word, this is 600 characters per minute on the
keyboard. You might be fascinated to realize that if you type at this rate, a
200 MHz computer will process 20,000,000 instructions between each keystroke
you make! You can see why having the processor spend a lot of time asking the
keyboard if it needs anything would be wasteful, especially since at any time
you might stop for a minute or two to review your writing, or do something
else. Even while handling a full-bandwidth transfer from a 28,800 Kb/sec modem,
which of course moves data much faster than your fingers, the processor has
over 60,000 instruction cycles between bytes it needs to process.
In addition to the well-known hardware interrupts
that we discuss in this section, there are also software
interrupts <../../bios/func_Services.htm>. These are used by
various software programs in response to different events that occur as the
operating system and applications run. In essence, these represent the
processor interrupting itself! This is part of how the processor is able to do
many things at once. The other thing that software interrupts do is allow one
program to access another one (usually an application or DOS accessing to the
BIOS) without having to know where it resides in memory.
Interrupt
Controllers
Device interrupts are fed to the processor using a
special piece of hardware called an interrupt
controller. The standard for this device is the Intel 8259 interrupt
controller, and has been since early PCs. As with most of these dedicated
controllers, in modern motherboards the 8259 is, in most cases, incorporated
into a larger chip as part of the chipset
<../../chip/index.htm>.
The interrupt controller has 8 input lines that
take requests from one of 8 different devices. The controller then passes the
request on to the processor, telling it which device issued the request (which
interrupt number triggered the request, from 0 to 7). The original PC and XT
had one of these controllers, and hence supported interrupts 0 to 7 only.
Starting with the IBM AT, a second interrupt
controller was added to the system to expand it; this was part of the expansion
of the ISA system bus from 8 to 16 bits. In order to ensure compatibility
(isn't that a recurring theme?) the designers of the AT didn't want to change
the single interrupt line going to the processor. So what they did instead was
to cascade the two interrupt
controllers together.
The first interrupt controller still has 8 inputs
and a single output going to the processor. The second one has the same design,
but it takes 8 new inputs (doubling the number of interrupts) and its output
feeds into input line 2 of the first controller. If any of the inputs on the
second controller become active, the output from that controller triggers
interrupt #2 on the first controller, which then signals the processor.
So what happens to IRQ #2? That line is now being
used to cascade the second controller, so the AT's designers changed the wiring
on the motherboard to send any devices that used IRQ2 over to IRQ9 instead.
What this means is that any older devices that used IRQ2 now use IRQ9, and if
you set any device to use IRQ2 on an AT or later system, it is really using
IRQ9.
Devices designed to use IRQ2 as a primary setting
are rare in today's systems, since IRQ2 has been out of use for over 10 years.
In most cases IRQ2 is just considered "unusable", while IRQ9 is a
regular, usable interrupt line. However, some modems for example still offer
the use of IRQ2 as a way to get around the fact that COM3 and COM4 share
interrupts with COM1 and COM2 by default. You may need to do this if you have a
lot of devices contending for the low-numbered IRQs (which is very common).
Note: If you select IRQ2 on a device such as a modem, IRQ9 will really be used
instead. Any software that uses the device needs to be told that it is using
IRQ9, not IRQ2. Also, if you do this, you cannot use the "real" IRQ9
for any other device. You should never attempt to use IRQ2 if you are already
using IRQ9 on your PC, and vice-versa.
IRQ
Lines and the System Bus
The devices that use interrupts trigger them by
signaling over lines provided on the ISA system bus. Most of the interrupts are
provided to the system bus for use by devices; however, some of them are only
used internally by the system, and therefore they are not given wires on the
system bus. These are interrupts 0, 1, 2, 8 and 13, and are never available to
expansion cards (remember, IRQ2 is now wired to
IRQ9 on the motherboard).
As explained in this section on the ISA
bus <../../buses/types/older_ISA.htm>, the
original bus was only 8 bits wide and had a single connector for expansion
cards. The bus was expanded to 16 bits and a second connector slot added next
to the first one; you can see this if you look at your motherboard, since all
modern PCs use 16-bit slots.
The addition of this extra connector coincided
with the addition of the second interrupt controller, and the lines for these
extra IRQs were placed on this second slot. This means that in order to access
any of these IRQs--10, 11, 12, 14 and 15--the card must have both connectors.
While almost no motherboards today have 8-bit-only bus slots, there are still
many expansion cards that only use one ISA connector. The most common example
is an internal modem. These cards can only use IRQs 3, 4, 5, 6 and 7 (and 6 is
almost always not available since it is used by the floppy disk controller).
They can also use IRQ 9 indirectly if they have the ability to use IRQ2, since
9 is wired to where 2 used to be.
Note: All of this applies to ISA and VESA local bus slots only. PCI slots handle
interrupts differently, using their own internal
interrupt system <../../buses/types/pci_Interrupts.htm>. If a
PCI card needs to use a regular IRQ line the BIOS/chipset will normally
"map" the PCI interrupt to a regular system interrupt. This is
normally done using IRQ9 to IRQ12.
Interrupt
Priority
The PC processes device interrupts according to
their priority level. This is a function of which interrupt line they use to
enter the interrupt controller. For this reason, the priority levels are
directly tied to the interrupt number:
·
On an old PC/XT, the priority
of the interrupts is 0, 1, 2, 3, 4, 5, 6, 7.
·
On a modern machine, it's
slightly more complicated (what else is new). Recall that the second set of
eight interrupts is piped through the IRQ2 channel on the first interrupt
controller. This means that the first controller views any of these interrupts
as being at the priority level of its "IRQ2". The result of this is
that the priorities become 0, 1, (8, 9, 10, 11, 12, 13, 14, 15), 3, 4, 5, 6, 7.
IRQs 8 to 15 take the place of IRQ2.
In any event, the priority level of the IRQs
doesn't make much of a difference in the performance of the machine, so it
isn't something you're going to want to worry about too much. If you are a real
performance freak, higher-priority IRQs may improve the performance of the
devices that use them slightly. If you could actually notice this in any way
other than examining the system under the microscope of a benchmark suite, I'd
be pretty surprised...
Non-Maskable
Interrupts (NMI)
All of the regular interrupts that we normally use
and refer to by number are called maskable
interrupts. The processor is able to mask, or temporarily ignore, any interrupt
if it needs to, in order to finish something else that it is doing. In
addition, however, the PC has a non-maskable interrupt (NMI) that can be used
for serious conditions that demand the processor's immediate attention. The NMI
cannot be ignored by the system unless it is shut off specifically.
When an NMI signal is received, the processor
immediately drops whatever it was doing and attends to it. As you can imagine,
this could cause havoc if used improperly. In fact, the NMI signal is normally
used only for critical problem situations, such as serious hardware errors. The
most common use of NMI is to signal a parity error
<../../../ram/err_Errors.htm> from the memory subsystem. This
error must be dealt with immediately to prevent possible data corruption.
Interrupts,
Multiple Devices and Conflicts
In general, interrupts are single-device
resources. Because of the way the system bus is designed, it is not feasible
for more than one device to use an interrupt at one time, because this can
confuse the processor and cause it to respond to the wrong device at the wrong
time. If you attempt to use two devices with the same IRQ, an IRQ conflict will result. This is one of
the types of resource conflicts
<../confl.htm>.
It is possible to share an IRQ among more than one
device, but only under limited conditions. In essence, if you have two devices
that you seldom use, and that you never use simultaneously, you may be able to
have them share an IRQ. However, this is not the preferred method since it is
much more prone to problems than just giving each device its own interrupt
line.
One of the most common problems regarding shared
IRQs is the use of the third and fourth serial (COM) ports, COM3 and COM4. By
default, COM3 uses the same interrupt as COM1 (IRQ4), and COM4 uses the same
interrupt as COM2 (IRQ3). If you have a mouse on COM1 and set up your modem as
COM3--a very common setup--guess what happens the first time you try to go
online? :^) You can share COM ports
on the same interrupt, but you have to be very careful not to use both devices
at once; in general this arrangement is not preferred. See here for ideas on dealing with COM port difficulties
<../../../../ts/x/comp/io.htm>.
Many modems will let you change the IRQ they use
to IRQ5 or IRQ2, for example, to avoid this problem. Other common areas where
interrupt conflicts occur are IRQ5, IRQ7 and IRQ12. The conflict resolution area of the Troubleshooting Expert
<../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm> can
sometimes help with these situations.
Summary
of IRQs and Their Typical Uses
The table below provides summary information about
the 16 IRQ levels in a typical PC. You may find this table useful when
considering how to configure your system, or for resolving IRQ conflicts. For an explanation of the categories, along with more
detailed descriptions, see here <num.htm>. To see IRQ usage
organized by device instead of IRQ number, see this device
resource summary <../config_Summary.htm>:
IRQ Bus Line? Priority Typical Default Use Other Common Uses
0 no 1 System timer None
1 no 2 Keyboard controller None
2 no (rerouted) n/a None; cascade for IRQs 8-15. Replaced by IRQ 9 Modems, very old (EGA) video cards, COM3 (third serial port), COM4 (fourth
serial port)
3 8/16-bit 11 COM2 (second serial port) COM4 (fourth serial port), modems, sound cards,
network cards, tape accelerator cards
4 8/16-bit 12 COM1 (first serial port) COM3 (third serial port), modems, sound
cards, network cards, tape accelerator cards
5 8/16-bit 13 Sound card LPT2 (second parallel port), LPT3 (third
parallel port), COM3 (third serial port), COM4 (fourth serial port), modems,
network cards, tape accelerator cards, hard disk controller on old PC/XT
6 8/16-bit 14 Floppy disk controller Tape accelerator cards
7 8/16-bit 15 LPT1 (first parallel port) LPT2 (second parallel port), COM3 (third
serial port), COM4 (fourth serial port), modems, sound cards, network cards,
tape accelerator cards
8 no 3 Real-time clock None
9 16-bit only 4 Network cards, sound cards, SCSI host
adapters, PCI devices, rerouted IRQ2 devices
10 16-bit only 5 Network cards, sound cards, SCSI host
adapters, secondary IDE channel, quaternary IDE channel, PCI devices
11 16-bit only 6 Network cards, sound cards, SCSI host
adapters, VGA video cards, tertiary IDE channel, quaternary IDE channel, PCI
devices
12 16-bit only 7 PS/2 mouse Network cards, sound cards, SCSI host
adapters, VGA video cards, tertiary IDE channel, PCI devices
13 no 8 Floating Point Unit (FPU /
NPU / Math Coprocessor) None
14 16-bit only 9 Primary IDE channel SCSI host adapters
15 16-bit only 10 Secondary IDE channel Network cards, SCSI host adapters
Home <../../../../index.htm> - Search <../../../../search.htm>
- Topics <../../../../topic.html>
- Up <index.htm>
This page has been served 44231 times. The PC
Guide (http://www.PCGuide.com) </cgi-bin/ads_S.pl?advert=spcd>
</cgi-bin/ads_S.pl?advert=spcd>
Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
</cgi-bin/ads_S.pl?advert=spcd>Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
IRQ Details By Number
This section lists each of the 16 interrupt lines
and provides a full description of what they are, how they are normally used,
and any special information that is relevant to them. The general format for
each section is as follows:
·
IRQ Number: The number of the IRQ from 0 to 15.
·
16-Bit Priority: The priority level of the interrupt <func_Priority.htm>. 1
is the highest and 15 is the lowest.
·
Bus Line: Indicates whether or not this IRQ is available to expansion devices on the system bus
<func_Bus.htm>. This will say "8/16 bit" for an
interrupt line available to all expansion devices, "16 bit only" for
a line available only to 16-bit cards, or "No" for an interrupt used
only by system devices.
·
Typical Default Use: Description of the device or
function that normally uses this IRQ in a regular modern PC.
·
Other Common Uses: This is a list of other
devices that commonly either use this IRQ or offer the use of this IRQ as one
of their options. This list isn't exhaustive because there are a lot of oddball cards out there that may
use unusual IRQs.
·
Description: A description of the interrupt and how it is
used, along with any relevant or interesting points about it or its history.
·
Conflicts: A discussion of the likelihood of conflicts with
this IRQ and what are the likely causes.
IRQ0
IRQ
Number: 0
16-Bit
Priority: 1
Bus
Line: No
Typical
Default Use: System timer.
Other
Common Uses: None; for system use only.
Description: This is the reserved interrupt for the internal system timer. It is used
exclusively for internal operations and is never available to peripherals or
user devices.
Conflicts: This is a dedicated interrupt line; there should never be any conflicts.
If software indicates a conflict on this IRQ, there is a good possibility of a
hardware problem somewhere on your system board.
IRQ1
IRQ
Number: 1
16-Bit
Priority: 2
Bus
Line: No
Typical
Default Use: Keyboard / keyboard controller.
Other
Common Uses: None; for system use only.
Description: This is the reserved interrupt for the keyboard controller. It is used
exclusively for keyboard input. Even on systems without a keyboard, IRQ1 is not
available for use by other devices. Note that the keyboard controller also
controls the PS/2 style mouse if the system has one, but the mouse uses a
separate line, IRQ12.
Conflicts: This is a dedicated interrupt line; there should never be any conflicts.
If software indicates a conflict on this IRQ, there is a good possibility of a
hardware problem somewhere on your system board; this can be a motherboard or
chipset (keyboard controller) problem.
IRQ2
IRQ
Number: 2
16-Bit
Priority: n/a
Bus
Line: No
Typical
Default Use: Cascade for IRQs 8 to 15.
Other
Common Uses: Not generally used. Can be used by modems, very
old (EGA) video cards, as an alternative IRQ for COM3 (third serial port) or
COM4 (fourth serial port). Rerouted to IRQ9 and appears to software as IRQ9.
Description: This is the interrupt number that is used to cascade
the second interrupt controller to the first <func_Controller.htm>,
allowing the use of extra IRQs 8 to 15. This use as a linkage between the two
interrupt controllers means that IRQ2 is no longer available for normal use.
For compatibility with older cards that used IRQ2 on the original PC or XT
machines (which had only one controller and a normal IRQ2 line), the
motherboard of modern PCs reroutes IRQ2 to IRQ9. Hence IRQ2 can still be used
but appears to the system as IRQ9. The most common cards that do this are old
EGA video cards, and newer cards making IRQ2 available with the knowledge that
it will be routed to IRQ9.
Conflicts: This interrupt is normally not used on most systems, mostly because the
whole IRQ2/IRQ9 thing confuses a lot of people so they tend to avoid it.
Conflicts on this line generally come from trying to use a device on IRQ2 and
another on IRQ9 at the same time. Some modems and serial port cards allow IRQ2
to be used as an alternative for the two standard lines used for modems and
serial ports (IRQ3 and IRQ4) in order to avoid conflicts in those two
heavily-contested areas. This is generally a good configuration decision since
unused IRQs from 3 to 7 are harder to find than unused IRQs from 10 to 15. If
you want to use IRQ2, move any device using IRQ9 to another line like 10 or 11.
IRQ3
IRQ
Number: 3
16-Bit
Priority: 11
Bus
Line: 8/16-bit
Typical
Default Use: COM2 (second serial port).
Other
Common Uses: COM4 (fourth serial port), modems, sound cards,
network cards, tape accelerator cards.
Description: This interrupt is normally used by the second serial port, COM2. It is
also the default interrupt for the fourth serial port, COM4, and a popular
option for modems, sound cards and other devices. Modems often come
pre-configured to use COM2 on IRQ3.
Conflicts: Conflicts on IRQ3 are relatively common. The two biggest problem areas are
first, modems that attempt to use COM2/IRQ3 and clash with the built-in COM2
port; and second, systems that attempt to use both COM2 and COM4 simultaneously
on this same interrupt line. In addition, some devices, particularly network
interface cards, come with IRQ3 as the default. In most cases the problem can
be avoided by changing the conflicting device to a different interrupt (IRQ2
and IRQ5 usually being the best choices). If the built-in COM2 is not being
used, it can be disabled in the BIOS setup
<../../bios/set/periph_Serial.htm>, which will allow a modem
to stay at COM2/IRQ3 without causing any problems. More general solutions to
these issues can be found in the conflict
resolution area of the Troubleshooting Expert
<../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm>.
IRQ4
IRQ
Number: 4
16-Bit
Priority: 12
Bus
Line: 8/16-bit
Typical
Default Use: COM1 (first serial port).
Other
Common Uses: COM3 (third serial port), modems, sound cards,
network cards, tape accelerator cards.
Description: This interrupt is normally used by the first serial port, COM1. On PCs
that do not use a PS/2-style mouse, this port (and thus this interrupt) are
almost always used by the serial mouse. IRQ4 is also the default interrupt for
the third serial port, COM3, and a popular option for modems, sound cards and
other devices. Modems sometimes come pre-configured to use COM3 on IRQ4.
Conflicts: Conflicts on IRQ4 are relatively common, although not as common as on
IRQ3. On systems that do not use a serial mouse, problems are less common,
because COM1 isn't automatically busy whenever the mouse is in use. The two
biggest problem areas are modems that attempt to use COM3/IRQ4 and clash with
COM1, and systems that attempt to use both COM1 and COM3 simultaneously on this
same interrupt line. In most cases the problem can be avoided by changing the
conflicting device to a different interrupt (IRQ2 and IRQ5 usually being the
best choices). If a PS/2 mouse is being used, you can disable the built-in COM1 port in the BIOS setup, which will
allow a modem to stay at COM3/IRQ4 without causing any problems. However, this
is not really recommended. More general solutions to these issues can be found
in the conflict resolution area of the
Troubleshooting Expert <../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm>.
IRQ5
IRQ
Number: 5
16-Bit
Priority: 13
Bus
Line: 8/16-bit
Typical
Default Use: Sound card (but varies widely).
Other
Common Uses: LPT2 (second parallel port), COM3 (third serial
port), COM4 (fourth serial port), modems, network cards, tape accelerator
cards, hard disk controller on old PC/XT.
Description: This is probably the single "busiest" IRQ in the whole system.
On the original PC/XT system this IRQ was used to control the (massive 10 MB)
hard disk drive. When the AT was introduced, hard disk control was moved to
IRQ14 to free up IRQ5 for 8-bit devices. As a result, IRQ5 is in most systems
the only free interrupt below IRQ9
and is therefore the first choice for use by devices that would otherwise
conflict with IRQ3, IRQ4, IRQ6 or IRQ7. IRQ5 is the default interrupt for the
second parallel port in systems that use two printers for example. It is also
the first choice that most sound cards make when looking for an IRQ setting.
IRQ5 is also a popular choice as an alternate line for systems that need to use
a third COM port, or a modem in addition to two COM ports.
Conflicts: Conflicts on IRQ5 are very common because of the large variety of devices
that have it as an option. Since virtually every PC today uses a sound card,
and they all like to grab IRQ5, it is almost always taken before you even start
looking at more esoteric peripherals. If a second parallel port (LPT2) is being
used to allow access to two printers or a printer and a parallel-port drive,
then IRQ5 will usually be taken right away. If for some very strange reason you
have three parallel ports, watch for
a conflict here or with IRQ7, since 5 and 7 are the only two normally used as
defaults for parallel ports. Sound cards that default to IRQ5 are generally
best left there, to avoid problems with poorly written older software that just
assumed the sound card would always be left at IRQ5. To whatever extent
possible, move devices that can use higher-valued IRQs away from IRQ5. For
example, you can't move COM3 to IRQ11, but you usually can move a network card
to it. See the conflict resolution area of the
Troubleshooting Expert
<../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm> for
more ideas.
IRQ6
IRQ
Number: 6
16-Bit
Priority: 14
Bus
Line: 8/16-bit
Typical
Default Use: Floppy disk controller.
Other
Common Uses: Tape accelerator cards.
Description: This interrupt is reserved for use by the floppy disk controller.
Technically, it is available for use by other devices, and some devices will
allow you to select IRQ6. Most however do not, realizing that virtually every
PC uses at least one floppy disk drive. The most common devices that will let
you use IRQ6 are probably tape drive accelerator cards. This is probably
because these cards are used for tape drives that run off the floppy interface,
and many of them can be set to drive floppy disks themselves.
Conflicts: Conflicts on IRQ6 are uncommon and are usually the result of an
incorrectly configured peripheral card, since IRQ6 is pretty standardized in
its use for the floppy disks. If you use a tape accelerator card along with an
integrated floppy disk controller on your motherboard, watch out for the
accelerator trying to take over IRQ6; some even do this by default.
IRQ7
IRQ
Number: 7
16-Bit
Priority: 15
Bus
Line: 8/16-bit
Typical
Default Use: LPT1 (first parallel port).
Other
Common Uses: COM3 (third serial port), COM4 (fourth serial
port), modems, sound cards, network cards, tape accelerator cards.
Description: This IRQ is used on most systems to drive the first parallel port,
normally for the use of a printer. These days of course many other devices use
parallel ports, including external drives. If you are not using a printer or
other device then IRQ7 can be used in a similar way to IRQ5: as an alternate
for any of the devices that would normally be fighting over IRQ3 or IRQ4.
Conflicts: Conflicts on IRQ7 are relatively unusual. One thing to watch out for if
you are using two parallel ports is to make sure the second one is set up to
use IRQ5 or another available IRQ. Some add-in parallel boards try to make LPT2
also use IRQ7, which generally won't work. Otherwise, avoiding using IRQ7 for
an expansion card if you are using it for LPT1 will eliminate conflicts in most
cases.
IRQ8
IRQ
Number: 8
16-Bit
Priority: 3
Bus
Line: No
Typical
Default Use: Real-time clock.
Other
Common Uses: None; for system use only.
Description: This is the reserved interrupt for the real-time clock timer. This timer
is used by software programs to manage events that must be calibrated to
real-world time; this is done by setting "alarms", which trigger this
interrupt at a specified time. For example, if you are using an electronic
datebook and have it set to pop up screen messages or beep the PC when it is
time for a meeting, the software will set a timer to count down to the
appropriate time. When the timer finishes its countdown, an interrupt will be
generated on IRQ8.
Conflicts: This is a dedicated interrupt line; there should never be any conflicts.
If software indicates a conflict on this IRQ, there is a good possibility of a
hardware problem somewhere on your system board.
IRQ9
IRQ
Number: 9
16-Bit
Priority: 4
Bus
Line: 16-bit only
Typical
Default Use: None.
Other
Common Uses: Network cards, sound cards, SCSI host adapters,
PCI devices, rerouted IRQ2 devices.
Description: This is usually an open IRQ on most systems, and is a popular choice for
use by peripherals, especially network cards. On most PCs it can be used freely
since it has no default setting.
Conflicts: There are a couple of things to watch out for when using this IRQ. First,
if you are trying to use IRQ2, you cannot use IRQ9 as well, since devices that
try to use IRQ2 really end up using IRQ9 instead. Also, some systems that use
PCI cards that require the use of a system IRQ line will grab IRQ9; this can be
changed in some cases using the BIOS setup
parameters that assign IRQs to PCI devices <../../bios/set/pci.htm>.
IRQ10
IRQ
Number: 10
16-Bit
Priority: 5
Bus
Line: 16-bit only
Typical
Default Use: None.
Other
Common Uses: Network cards, sound cards, SCSI host adapters,
secondary IDE channel, quaternary IDE channel, PCI devices.
Description: This is usually open and one of the easiest IRQs to use since it is
generally not contested by many devices. While the secondary IDE controller can sometimes be set to use IRQ10, it
almost always uses IRQ15 instead.
Conflicts: Conflicts on IRQ10 are unusual; the only thing to watch out for is a PCI
card that needs an interrupt line being assigned IRQ10 by the BIOS; this can be
changed in some cases using the BIOS setup
parameters that assign IRQs to PCI devices <../../bios/set/pci.htm>.
IRQ11
IRQ
Number: 11
16-Bit
Priority: 6
Bus
Line: 16-bit only
Typical
Default Use: None.
Other
Common Uses: Network cards, sound cards, SCSI host adapters,
VGA video cards, tertiary IDE channel, quaternary IDE channel, PCI devices.
Description: This line is usually open and relatively easy to use since it is generally
not contested by many devices. If you are using three IDE channels (the third
typically being on a sound card), IRQ11 is typically the one that the tertiary
controller will try to use. Also, some PCI video cards will try to use IRQ11.
Conflicts: Watch out for PCI cards, especially video cards, that grab IRQ11. This can
be changed in some cases using the BIOS setup
parameters that assign IRQs to PCI devices <../../bios/set/pci.htm>.
IRQ12
IRQ
Number: 12
16-Bit
Priority: 7
Bus
Line: 16-bit only
Typical
Default Use: PS/2 mouse.
Other
Common Uses: Network cards, sound cards, SCSI host adapters,
VGA video cards, tertiary IDE channel, PCI devices.
Description: On machines that use a PS/2 mouse, this is the IRQ reserved for its use.
Using a PS/2 mouse frees up the COM1 serial port and the interrupt it uses
(IRQ4) for other devices. Normally this is a good trade since free IRQs with
numbers below 8 are harder to find than ones above 8. If a PS/2 mouse is not
used, IRQ12 is a good choice for use by other devices such as network cards.
Conflicts: There are some potential problems here. Watch out for PCI cards that can
sometimes be assigned this line by the system BIOS. This can be changed in some
cases using the BIOS setup parameters that assign
IRQs to PCI devices <../../bios/set/pci.htm>. If you are using
a PS/2 mouse you need to make sure no other devices use IRQ12.
IRQ13
IRQ
Number: 13
16-Bit
Priority: 8
Bus
Line: No
Typical
Default Use: Floating point unit (FPU / NPU / Math
coprocessor).
Other
Common Uses: None; for system use only.
Description: This is the reserved interrupt for the integrated floating point unit (on
80486 or later machines) or the math coprocessor (on 80386 or earlier machines
that use one). It is used exclusively for internal signaling and is never
available for use by peripherals.
Conflicts: This is a dedicated interrupt line; there should never be any conflicts.
If software indicates a conflict on this IRQ, there is a good possibility of a
hardware problem somewhere on your system board, or possibly with your
processor or math coprocessor.
IRQ14
IRQ
Number: 14
16-Bit
Priority: 9
Bus
Line: 16-bit only
Typical
Default Use: Primary IDE channel.
Other
Common Uses: SCSI host adapters.
Description: On most PCs, this IRQ is reserved for use by the primary IDE controller,
which provides access to the first two IDE/ATA devices (usually hard disk
drives and/or CD-ROM drives). On machines that do not use IDE devices at all,
this IRQ can be used for another purpose (such as a SCSI host adapter to
provide SCSI drives). In order to do this, you will normally have to disable
the IDE channel using either the appropriate BIOS
setting <../../bios/set/periph_IDE.htm> (for integrated IDE
support on newer boards) or jumpers on the controller board (for older machines
that use an IDE controller card).
Conflicts: Problems with IRQ14 are rare, since the universality of its use for IDE
means most peripheral vendors avoid offering it as an option. If you are using
SCSI and not IDE, and want to use IRQ14, make sure any integrated IDE
controllers are disabled first.
IRQ15
IRQ
Number: 15
16-Bit
Priority: 10
Bus
Line: 16-bit only
Typical
Default Use: Secondary IDE channel.
Other
Common Uses: Network cards, SCSI host adapters.
Description: On most newer PCs, this IRQ is reserved for use by the secondary IDE
controller, which provides access to the third and fourth IDE/ATA devices
(usually hard disk drives and/or CD-ROM drives). If you are not using IDE, or
are using only two devices and want to put them on the primary channel to free
up this IRQ, that can be done easily as long as you remember to disable the
secondary IDE channel using either the appropriate
BIOS setting <../../bios/set/periph_IDE.htm> (for integrated
IDE support on newer boards) or jumpers on the controller board (for older
machines that use an IDE controller card).
Conflicts: Problems with IRQ15 typically result from assigning a peripheral to use it
while forgetting to disable the integrated secondary IDE controller. Most
Pentium or later (PCI-based) motherboards have two integrated IDE controllers.
Some people incorrectly assume that there will be no conflict if nothing is
attached to the secondary channel, but this is not always the case.
Direct Memory Access (DMA) Channels
Direct memory access (DMA) channels are system
pathways used by many devices to transfer information directly to and from
memory. DMA channels are not nearly as "famous" as IRQs as system
resources go. This is mostly for a good reason: there are fewer of them and
they are used by many fewer devices, and hence they usually cause fewer problems with system setup. However, conflicts
on DMA channels can cause very strange system problems and can be very
difficult to diagnose. DMAs are used most commonly today by floppy disk drives,
tape drives and sound cards.
</cgi-bin/ads_S.pl?advert=skcc>
</cgi-bin/ads_S.pl?advert=skcc>
KC Computers, ranked highly in the customer satisfaction survey at www.resellerratings.com </cgi-bin/ads_S.pl?advert=skcc>
</cgi-bin/ads_S.pl?advert=skcc>KC Computers, ranked highly in the customer satisfaction survey at www.resellerratings.com </cgi-bin/ads_S.pl?advert=skcc>
DMA Channel Function and Operation
This section takes a look at DMA channels and how
they work. This includes an explanation of the different types of DMA channels,
the DMA controller, and a summary of the different DMA channels used in the PC.
Why DMA
Channels Were Invented for Data Transfer
As you know, the processor is the
"brain" of the machine, and in many ways it can also be likened to
the conductor of an orchestra. In early machines the processor really did
almost everything. In addition to running programs it was also responsible for
transferring data to and from peripherals. Unfortunately, having the processor
perform these transfers is very inefficient, because it then is unable to do
anything else.
The invention of DMA enabled the devices to cut
out the "middle man", allowing the processor to do other work and the
peripherals to transfer data themselves, leading to increased performance.
Special channels were created, along with circuitry to control them, that
allowed the transfer of information without the processor controlling every
aspect of the transfer. This circuitry is normally part of the system chipset
on the motherboard.
Note that DMA channels are only on the ISA bus
(and EISA and VLB, since they are derivatives of it). PCI devices do not use
standard DMA channels at all.
Third-Party
and First-Party DMA (Bus Mastering)
Standard DMA is sometimes called "third
party" DMA. This refers to the fact that the system DMA controller is
actually doing the transfer (the first two parties are the sender and receiver
of the transfer). There is also a type of DMA called "first party"
DMA. In this situation, the peripheral doing the transfer actually takes
control of the system bus to perform the transfer. This is also called bus mastering.
Bus mastering provides much better performance
than regular DMA because modern devices have much smarter and faster DMA
circuitry built into them than exists in the old standard ISA DMA controller.
Newer DMA modes are now available, such as Ultra
DMA <../../../hdd/if/ide/std_Ultra.htm> (mode 3 or DMA-33)
that provide for very high transfer rates.
Limitations
of Standard DMA
While the use of DMA provided a significant
improvement over processor-controlled data transfers, it too eventually reached
a point where its performance became a limiting factor. DMA on the ISA bus has
been stuck at the same performance level for over 10 years. For old 10 MB XT
hard disks, DMA was a top performer. For a modern 8 GB hard disk, transferring
multiple megabytes per second, DMA is insufficient.
On newer machines, disks are controlled using
either programmed I/O (PIO) or first-party DMA (bus
mastering) on the PCI bus <../../buses/types/pci_IDEBM.htm>,
and not using the standard ISA DMA that is used for devices like sound cards. Hard disk transfer modes are discussed in detail here
<../../../hdd/if/ide/modes.htm>. This type of DMA does not
rely on the slow ISA DMA controllers, and allows these high-performance devices
the bandwidth they need. In fact, many of the devices that used to use DMA on
the ISA bus use bus mastering over the PCI bus for faster performance. This
includes newer high-end SCSI cards, and even network and video cards.
DMA
Controllers
Standard DMA transfers are managed by the DMA
controller, built into the system chipset
<../../chip/index.htm> on modern PCs. The original PC and XT
had one of these controllers and supported 4 DMA channels, 0 to 3.
Starting with the IBM AT, a second DMA controller
was added. Much in the way that the second
interrupt controller was cascaded with the first
<../irq/func_Controller.htm>, the first DMA controller is
cascaded to the second. The difference is that with IRQs, the second controller
is cascaded to the first, but with DMAs the first is cascaded to the second. As
a result, there are 8 DMAs, from 0 to 7, but DMA 4 is not usable. There is no
rerouting as with IRQ2 and IRQ9 here, because all of the original DMAs (0 to 3)
are still usable directly.
DMA
Channels and the System Bus
All of the DMA channels except channel 4 are
accessible to devices on the ISA system bus. Channel 4 is used to cascade the two DMA controllers together. PCI
devices do not use standard system DMA channels.
As was the case with IRQs, the second DMA
controller was added when the ISA bus was expanded to 16 bits with the creation
of the AT. The lines to access these extra DMA channels were placed on the
second part of the AT slot that is used by 16-bit cards. This means that only
16-bit cards can access DMA channels 5, 6 or 7. Unfortunately, many devices
even today are still only 8-bit cards. You can tell by looking at them and
seeing that they only use the first part of the two-part ISA bus connector on
the motherboard.
DMA
Request (DRQ) and DMA Acknowledgment (DACK)
Each DMA channel is comprised of two signals: the DMA request signal (DRQ) and the DMA
acknowledgment signal (DACK).
Some peripheral cards have separate jumpers for these instead of a single DMA
channel jumper. If this is the case, make sure that the DRQ and DACK are set to
the same number, otherwise the device won't work (I wonder what goes through
the minds of some peripheral card designers. :^) )
DMA,
Multiple Devices and Conflicts
Like interrupts, DMA channels are single-device
resources. If two devices try to use the same DMA channel at the same time,
information will get mixed up between the two devices trying to use it, and any
number of problems can be the result. DMA channel conflicts can be very
difficult to diagnose. See here for more details on
resource conflicts <../confl.htm>.
It is possible to share a DMA channel among more
than one device, but only under limited conditions. In essence, if you have two
devices that you seldom use, and that you never use simultaneously, you may be
able to have them share a channel. However, this is not the preferred method
since it is much more prone to problems than just giving each device its own
resource.
One problem area with DMA channels is that most
devices want to use DMA channels with numbers 0 to 3 (on the first DMA
controller). DMA channels 5 to 7 are relatively unused because they require
16-bit cards. Considering that DMA channel 0 is never available, and DMA 2 is
used for the floppy disk controller, that doesn't leave many options. On one of
my systems I wanted to set up an ECP parallel port, a tape accelerator and a
voice modem in addition to my sound card. I ran out of DMA channels between 1
and 3 very quickly. I still had DMA channels 6 and 7 open but could not use
them because all the devices I wanted to use were either on 8-bit cards or
wouldn't support the higher numbers for software reasons.
Speaking of the ECP parallel port, this is another
new area of concern regarding DMA resource conflicts. Many people don't realize
that this high-speed parallel port option requires the use of a DMA channel.
(Your BIOS setup program will usually have a setting
to select the DMA channel <../../bios/set/periph_ParallelECP.htm>,
right under where you enable ECP
<../../bios/set/periph_ParallelMode.htm>. This should be a
good hint but still a lot of people don't notice this. :^) ) The usual default
for this port is DMA 3, which is also used by many other types of devices. The conflict resolution area of the Troubleshooting Expert
<../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm> can
sometimes help with these situations.
Summary
of DMA Channels and Their Typical Uses
The table below provides summary information about
the 8 DMA channel numbers in a typical PC. You may find this table useful when
considering how to configure your system, or for resolving DMA conflicts. For an explanation of the categories, along with more
detailed descriptions, see here <num.htm>. To see DMA channel
usage organized by device instead of DMA number, see this device resource summary <../config_Summary.htm>.
DMA Bus Line? Typical Default Use Other Common Uses
0 no Memory Refresh None
1 8/16-bit Sound card (low DMA) SCSI host adapters, ECP parallel ports,
tape accelerator cards, network cards, voice modems
2 8/16-bit Floppy disk controller Tape accelerator cards
3 8/16-bit None ECP parallel ports, SCSI host adapters, tape
accelerator cards, sound card (low DMA), network cards, voice modems, hard disk
controller on old PC/XT
4 no None; cascade for DMAs 0-3 None
5 16-bit only Sound card (high DMA) SCSI host adapters, network cards
6 16-bit only None Sound cards (high DMA), network cards
7 16-bit only None Sound cards (high DMA), network cards
Home <../../../../index.htm> - Search <../../../../search.htm>
- Topics <../../../../topic.html>
- Up <index.htm>
</cgi-bin/ads_S.pl?advert=spcd>
</cgi-bin/ads_S.pl?advert=spcd>
Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
</cgi-bin/ads_S.pl?advert=spcd>Get your own copy of The PC Guide to use off-line! </cgi-bin/ads_S.pl?advert=spcd>
DMA Channel Details By Number
This section lists each of the 8 DMA channels and
provides a full description of what they are, how they are normally used, and
any special information that is relevant to them. The general format for each
section is as follows:
·
Channel Number: The number of the DMA channel from 0 to 7.
·
Bus Line: Indicates whether or not this DMA channel is
available to expansion devices on the system bus. This will say "8/16
bit" for DMA accessible by all expansion devices, "16 bit only"
for a channel available only to 16-bit cards, or "No" for a channel
reserved for use only by system devices.
·
Typical Default Use: Description of the device or
function that normally uses this DMA channel in a regular modern PC.
·
Other Common Uses: This is a list of other
devices that commonly either use this channel or offer the use of this channel
as one of their options. This list isn't exhaustive because there are a lot of oddball cards out there that may
use unusual DMAs.
·
Description: A description of the channel and how it is used,
along with any relevant or interesting points about it or its history.
·
Conflicts: A discussion of the likelihood of conflicts with
this DMA channel and what are the likely causes.
DMA0
Channel
Number: 0
Bus
Line: No
Typical
Default Use: Memory (DRAM) Refresh.
Other
Common Uses: None; for system use only.
Description: This DMA channel is reserved for use by the internal DRAM refresh
circuitry. Dynamic RAM
<../../../ram/types_DRAM.htm> (used for system memory on
almost all PCs) must be refreshed frequently to make sure that it does not lose
its contents. DMA channel 0 is used for this purpose and is not available for
use by peripherals.
Conflicts: Most devices stay far away from DMA0, recognizing its use by the system.
Beware however, as some devices actually offer DMA0 as an option. For example,
some sound cards do. Do not use DMA0
for peripherals. If you have no devices set to use DMA0 but a conflict becomes
apparent anyway, it could be a problem with your motherboard.
DMA1
Channel
Number: 1
Bus
Line: 8/16-bit
Typical
Default Use: Low DMA channel for sound card.
Other
Common Uses: SCSI host adapters, ECP parallel ports, tape
accelerator cards, network cards, voice modems.
Description: This DMA channel is normally taken by the sound card in your PC for its
"low" DMA channel. Most sound cards today actually use two DMA
channels; one must be chosen from DMAs 1, 2 or 3, while the other can be any
free DMA channel (and so is selected from the less-used 5, 6 or 7). DMA1 is
also a popular choice for many other peripherals, largely for historical
reasons (on the original XT, DMA3 was used for the hard disk so DMA1 was all
that was left open for everything else to share).
Conflicts: DMA1 is one of the two most contested channels in the system (the other
being DMA3, which is often worse). It is important to watch for conflicts
between multiple devices here, particularly if you are using a sound card. It
is preferable in general to leave the sound card on DMA1 and move any other
devices out of its way, for compatibility with older (poorly written) software
that assumes the sound card is on DMA1. Also watch out for ECP parallel port
conflicts here. More general solutions to resource conflicts can be found in
the conflict resolution area of the Troubleshooting
Expert <../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm>.
DMA2
Channel
Number: 2
Bus
Line: 8/16-bit
Typical
Default Use: Floppy disk controller.
Other
Common Uses: Tape accelerator cards.
Description: This DMA channel is used on virtually every PC for the floppy disk
controller. As such, it is usually not offered as an option for use by most
peripherals. Some do offer it as an option however. In particular, tape
accelerator cards often offer the use of DMA2 as an option. This is probably
because these cards are used for tape drives that run off the floppy interface,
and many of them can be set to drive floppy disks themselves.
Conflicts: DMA2 is not often a source of conflicts, as long as you remember not to
put any other devices on it if you have a floppy disk controller in your system
(which almost everyone does). Beware tape accelerator cards that default to
DMA2 for their channel assignment.
DMA3
Channel
Number: 3
Bus
Line: 8/16-bit
Typical
Default Use: None.
Other
Common Uses: ECP parallel ports, SCSI host adapters, tape
accelerator cards, sound card (low DMA), network cards, voice modems.
Description: This DMA channel is normally the only one free on the first controller
(DMAs 0 to 3) when you are using a sound card. As a result, it is probably the
"busiest" channel in the PC, with many different devices vying for
its services. One of the most common uses of this channel is by ECP parallel
ports, which require a DMA channel unlike other parallel port modes. On very
old XT systems, DMA channel 3 is used by the hard disk drive.
Conflicts: DMA3 is probably the worst channel in the system for conflicts, because so
many devices try to use it. It is important to watch for conflicts between
multiple devices here, particularly if you are using a sound card or ECP
parallel port. More general solutions to resource conflicts can be found in the
conflict resolution area of the Troubleshooting
Expert <../../../../ts/x/comp/mbsys/sys_ResourceConflict.htm>.
DMA4
Channel
Number: 4
Bus
Line: No
Typical
Default Use: Cascade for DMA channels 5 to 7.
Other
Common Uses: None; for system use only.
Description: This DMA channel is reserved for cascading the two DMA controllers on
systems with a 16-bit ISA bus. It is not available for use by peripherals.
Conflicts: There should not be any conflicts on this channel; any problems with it
indicate a possible system hardware failure.
DMA5
Channel
Number: 5
Bus
Line: 16-bit only
Typical
Default Use: High DMA channel for sound card.
Other
Common Uses: SCSI host adapters, network cards.
Description: This DMA channel is normally taken by the sound card in your PC for its
"high" DMA channel. Most sound cards today actually use two DMA
channels; one must be chosen from DMAs 1, 2 or 3 (the "low" channel),
while the other is selected from a high-numbered channel like this one. Some
network cards also use this channel, though others don't use DMA at all.
Conflicts: Few conflicts arise with this channel because there are relatively few
devices that can use DMA channels 5, 6 or 7.
DMA6
Channel
Number: 6
Bus
Line: 16-bit only
Typical
Default Use: None.
Other
Common Uses: Sound cards (high DMA), network cards.
Description: This DMA channel is normally open and available for use by peripherals. It
is one of the least used channels in the system and is an alternative location
for the "high" sound card DMA channel or other devices.
Conflicts: Few conflicts arise with this channel because there are relatively few
devices that can use DMA channels 5, 6 or 7.
DMA7
Channel
Number: 7
Bus
Line: 16-bit only
Typical
Default Use: None.
Other
Common Uses: Sound cards (high DMA), network cards.
Description: This DMA channel is normally open and available for use by peripherals. It
is one of the least used channels in the system and is an alternative location
for the "high" sound card DMA channel or other devices.
Conflicts: Few conflicts arise with this channel because there are relatively few
devices that can use DMA channels 5, 6 or 7.
Home <../../../../index.htm> - Search <../../../../search.htm>
- Topics <../../../../topic.html>
- Up <index.htm>
This page has been served 15477 times. The PC
Guide (http://www.PCGuide.com) </cgi-bin/ads_S.pl?advert=sout>
</cgi-bin/ads_S.pl?advert=sout>
Outpost.com - Hardware. Software. Answers. </cgi-bin/ads_S.pl?advert=sout>
</cgi-bin/ads_S.pl?advert=sout>Outpost.com - Hardware. Software. Answers. </cgi-bin/ads_S.pl?advert=sout>
Input / Output (I/O) Addresses
Input/output addresses (usually called I/O addresses for short) are resources
used by virtually every device in the computer. Conceptually, they are very
simple; they represent locations in memory that are designated for use by
various devices to exchange information between themselves and the rest of the
PC.
Note: I/O addresses are referred to in hexadecimal notation. See here for an explanation of what this means
<../../../intro/works/comput_Math.htm>, if you are not
familiar with it.
Memory-Mapped
I/O
You can think of I/O addresses like a bunch of
small two-way "mailboxes" in the system's memory. Take for example a
communications (COM) port that has a modem connected to it. When information is
received by the modem, it needs to get this information into the PC. Where does
it put the data it pulls off the phone line?
One answer to this problem is to give each device
its own small area of memory to work with. This is called memory-mapped I/O. When the modem gets a byte of data it sends it
over the COM port, and it shows up in the COM port's designated I/O address
space. When the CPU is ready to process the data, it knows where to look to
find it. When it later wants to send
information over the modem, it uses this address again (or another one near
it). This is a very simple way of dealing with the problem of information
exchange between devices.
I/O
Address Space Width
Unlike IRQs and DMA channels, which are of uniform
size and normally assigned one per device--sound cards use more than one
because they are really many devices wrapped into one package--I/O addresses
vary in size. The reason is simple: some devices (e.g., network cards) have
much more information to move around than others (e.g., keyboards).
The size of the I/O address is also in some cases
dictated by the design of the card and (as usual) compatibility reasons with
older devices. Most devices use an I/O address space of 4, 8 or 16 bytes; some
use as few as 1 byte and others as many as 32 or more. The wide variance in the
size of the I/O addresses can make it difficult to determine and resolve
resource conflicts, because often I/O addresses are referred to only by the first byte of the I/O address.
For example, people may say to "put your
network card at 360h", which may seem not to conflict with your LPT1
parallel port at address 378h. In fact many network cards take up 32 bytes for
I/O; this means they use up 360-37Fh, which totally overlaps with the parallel
port (378-37Fh). The I/O address summary map
helps you to see which I/O addresses are most used, and to visualize and avoid
potential conflicts.
I/O
Addresses, Multiple Devices and Conflicts
I/O addresses, like other system resources, are
normally used only by single devices. Having multiple devices try to use the
same address would cause information to get mixed up and overwritten, sort of
like having two people share a mailbox (where none of the envelopes had
anything printed on them. :^) )
There are some unusual exceptions to this however,
mostly for historical reasons. They are discussed in the next section where
individual addresses are reviewed. One of the problems with I/O addresses and
conflicts is simply keeping track of them all. They can be quite confusing to
keep straight, particularly since different devices use different sized address
spaces.
I/O addresses suffer from the same problem that
IRQs and DMA channels do: many conflicts occur not because there aren't enough
I/O addresses to go around, but because they aren't allocated or spaced out in
an organized way. Too many devices attempt to use the same addresses, or have
too few different configuration options to allow them all to find a place to use
without getting in each others' way. This is largely due to historical reasons.
One additional note about parallel ports. The I/O
addresses used for the different parallel ports (LPT1, LPT2, LPT3) are not
universal. Originally IBM defined different defaults for monochrome-based PCs
and for color PCs. Of course, all new systems have been color for many years,
but even some new systems still default LPT1 to 3BCh. Here is how the two
different labeling schemes typically work. See the
section on logical devices <logic.htm> for more details:
Port "Monochrome" Systems "Color" Systems
LPT1 3BC-3BFh 378-37Fh
LPT2 378-37Fh 278-27Fh
LPT3 278-27Fh --
I/O
Address Details By Number
Here I describe some of the more interesting I/O
addresses in use in the typical PC. Of particular interest are those where
conflicts are likely to occur, due to a large number of devices using the
address or offering it as an option. A complete list of I/O addresses is
provided in the summary in the next section:
·
060h and 064h: These two addresses are used by the keyboard
controller, which operates both the keyboard and the PS/2 style mouse (on
devices that use it).
·
130-14Fh and 140-15Fh: These addresses are sometimes
offered as options for SCSI host adapters. Note that these options partially
overlap (from 140-14Fh).
·
220-22Fh: This is the default address for many sound cards.
It is also an option for some SCSI host adapters (first 16 bytes).
·
240-24Fh: This is an optional address for sound cards and
network cards (first 16 bytes for NE2000 cards).
·
260-26Fh and 270-27Fh: This is an optional address
for sound cards and network cards. NE2000-compatible network cards take 32
bytes; if set to use this I/O address, they will conflict with several system
devices as well as the I/O address for either LPT2 or LPT3 in the 270-27Fh
area.
·
280-28Fh: This is an optional address for sound cards and
network cards (first 16 bytes for NE2000 cards).
·
300-30Fh: This is the default for many network cards
(NE2000 cards extend to 31Fh). 300-301h is also an option for the MIDI port on
many sound cards.
·
320-32Fh and 330-33Fh: This is a busy area in the I/O
memory map. First, 330-331h is the default for the MIDI port on many sound
cards. 320-33Fh is an option for some NE2000-compatible network cards and will conflict
with the MIDI port at this setting. Some SCSI host adapters also offer 330-34Fh
as an option. Finally, the old PC/XT hard disk controller also uses 320-323h.
·
340-34Fh: Optional areas for several device types overlap
here, including two options for SCSI host adapters (330-34Fh and 340-35Fh) as
well as network cards.
·
360-36Fh and 370-37Fh: This is another "high
traffic" area. 378-37Fh is used on most systems for the first parallel
port, and 376-377h is used for the secondary IDE controller's slave drive.
These can conflict with an NE2000-compatible network card placed at location
360h. Tape accelerator cards often default to 370h, which will also conflict
with a network card placed at 360h).
·
3B0-3BBh and 3C0-3DFh: These are used by VGA video
adapters. They take all of the areas originally assigned for monochrome cards
(3B0-3BBh), CGA adapters (3D0-3DFh) and EGA adapters (3C0-3CFh).
·
3E8-3EFh: There is a potential conflict here in locations
3EE-3EFh if you are using a third serial port (COM3) and a tertiary IDE
controller.
·
3F0-3F7h: There is actually a "standard" resource
conflict here: the floppy disk controller and the slave drive on the primary
IDE controller "share" locations 3F6-3F7h. These devices are actually
both present in many systems. Fortunately, this conflict (which exists for
historical reasons) is fairly well known and compensated for, so it will not
result in problems in a typical system. Note that some tape accelerator cards
also offer the use of 3F0h as an option, which will conflict with the floppy
disk controller.
I/O
Address Summary Map
The table below shows the I/O addresses from 000
to 3FFh, along with the devices that typically use them. This table is slightly
different than the ones that show default and optional use of IRQs and DMA
channels. There are many different addresses of different sizes, so in order to
keep the table a manageable size, it was made somewhat two-dimensional. Each
row is 16 bytes and is divided into four columns; the first is for bytes 0 to
3, the second 4 to 7, the third 8 to B and the fourth C to F. So to find
address 3BCh, you would look in the fourth column of row "3B0-3BFh".
Items in the table in bold print represent standard devices in a typical PC
configuration. Items in regular print represent optional devices or optional
locations for addresses of standard devices. Blank spaces are areas that are
open. Multiple lines are used to show multiple items that go in the same
address space. Where you see two or more items overlapping in the same address
space, there is the potential for a resource conflict.
To see I/O address usage organized by device
instead of address, see this device resource
summary <config_Summary.htm> instead:
Addr. First Quad (xx0h to xx3h) Second Quad (xx4h to xx7h) Third Quad (xx8h to xxBh) Fourth Quad (xxCh to xxFh)
000-00Fh DMA controller, channels 0 to 3
010-01Fh (System use)
020-02Fh Interrupt controller #1 (020-021h) (System use)
030-03Fh (System use)
040-04Fh System timers (System use)
050-05Fh (System use)
060-06Fh Keyboard & PS/2 mouse (060h), Speaker (061h) Keyboard & PS/2 mouse (064h)
070-07Fh RTC/CMOS, NMI (070-071h) (System use)
080-08Fh DMA page register 0-2 (081-083h) DMA page register 3 (087h) DMA page registers 4-6 (089-08Bh) DMA page register 7 (08Fh)
090-09Fh (System use)
0A0-0Afh Interrupt controller #2 (0A0-0A1h) (System use)
0B0-0BFh (System use)
0C0-0CFh DMA controller, channels 4-7 (0C0-0DFh, bytes 1-16)
0D0-0DFh DMA controller, channels 4-7 (0C0-0DFh, bytes 17-32)
0E0-0Efh (System use)
0F0-0FFh Floating point unit (FPU/NPU/Math coprocessor)
100-10Fh (System use)
110-11Fh (System use)
120-12Fh (System use)
130-13Fh SCSI host adapter, (130-14Fh, bytes 1 to
16)
140-14Fh SCSI host adapter, (130-14Fh, bytes 17 to
32)
SCSI host adapter, (140-15Fh, bytes 1 to 16)
150-15Fh SCSI host adapter, (140-15Fh, bytes 17 to
32)
160-16Fh Quaternary IDE controller, master drive
170-17Fh Secondary IDE controller, master drive
180-18Fh
190-19Fh
1A0-1AFh
1B0-1BFh
1C0-1CFh
1D0-1DFh
1E0-1EFh Tertiary IDE controller, master drive
1F0-1FFh Primary IDE controller, master drive
200-20Fh Joystick port (System use, 20C-20Dh)
210-21Fh
220-22Fh Sound card
SCSI host adapter, (220-23Fh, bytes 1 to 16)
230-23Fh SCSI host adapter, (220-23Fh, bytes 17 to
32)
240-24Fh Sound card
Non-NE2000 network card
NE2000 network card (240-25Fh, bytes 1 to 16)
250-25Fh NE2000 network card (240-25Fh, bytes 17 to
32)
260-26Fh Sound card
Non-NE2000 network card
NE2000 network card (260-27Fh, bytes 1 to 16)
270-27Fh (System use) Plug and Play system devices LPT2 (second parallel port) (color systems)
LPT3 (third parallel port) (monochrome
systems)
NE2000 network card (260-27Fh, bytes 17 to 32)
280-28Fh Sound card
Non-NE2000 network card
NE2000 network card (280-29Fh, bytes 1 to 16)
290-29Fh NE2000 network card (280-29Fh, bytes 17 to
32)
2A0-2Afh vvv Non-NE2000 network card
NE2000 network card (2A0-2BFh, bytes 1 to 16)
2B0-2BFh NE2000 network card (2A0-2BFh, bytes 17 to
32)
2C0-2CFh
2D0-2DFh
2E0-2Efh COM4 (fourth serial port)
2F0-2FFh COM2 (second serial port)
300-30Fh Sound card (MIDI port) (300-301h)
Non-NE2000 network card
NE2000 network card (300-31Fh, bytes 1 to 16)
310-31Fh NE2000 network card (300-31Fh, bytes 17 to
32)
320-32Fh Non-NE2000 network card
NE2000 network card (320-33Fh, bytes 1 to 16)
Hard disk controller on old PC/XT
330-33Fh Sound card (MIDI port) (330-331h)
NE2000 network card (320-33Fh, bytes 17 to 32)
SCSI host adapter, (330-34Fh, bytes 1 to 16)
340-34Fh SCSI host adapter, (330-34Fh, bytes 17 to
32)
SCSI host adapter, (340-35Fh, bytes 1 to 16)
Non-NE2000 network card
NE2000 network card (340-35Fh, bytes 1 to 16)
350-35Fh SCSI host adapter, (340-35Fh, bytes 17 to
32)
NE2000 network card (340-35Fh, bytes 17 to 32)
360-36Fh Tape accelerator card (360h) Quaternary IDE controller (slave drive)
(36E-36Fh)
Non-NE2000 network card
NE2000 network card (360-37Fh, bytes 1 to 16)
370-37Fh Tape accelerator card (370h) Secondary IDE controller (slave drive) (376-377h) LPT1 (first parallel port)
(color systems)
LPT2 (second parallel port) (monochrome
systems)
NE2000 network card (360-37Fh, bytes 17 to 32)
380-38Fh Sound card (FM synthesizer)
390-39Fh
3A0-3AFh
3B0-3BFh VGA/Monochrome Video LPT1 (first parallel port) (monochrome
systems)
3C0-3CFh VGA/EGA Video
3D0-3DFh VGA/CGA Video
3E0-3EFh Tape accelerator card (3E0h) COM3 (third serial port)
Tertiary IDE controller (slave drive)
(3EE-3EFh)
3F0-3FFh Floppy disk controller COM1 (first serial port)
Tape accelerator card (3F0h) Primary IDE controller
(slave drive) (3F6-3F7h)
Home <../../../index.htm> - Search <../../../search.htm>
- Topics <../../../topic.html> - Up <index.htm>
This page has been served 21990 times. The PC
Guide (http://www.PCGuide.com) </cgi-bin/ads_S.pl?advert=scru>
</cgi-bin/ads_S.pl?advert=scru>
Memory upgrades tailored to your PC: improve your system and help support The PC Guide! </cgi-bin/ads_S.pl?advert=scru>
</cgi-bin/ads_S.pl?advert=scru>Memory upgrades tailored to your PC: improve your system and help support The PC Guide! </cgi-bin/ads_S.pl?advert=scru>
Logical Devices
Some devices have both a physical address and also
a logical name. The two most commonly-encountered device types that work this
way are serial ports (called COM1 to COM4) and parallel ports (LPT1 to LPT3).
Actually, hard disks are labeled this way too, A:, C: etc., even though most
people don't think of them the same way. The purpose of this logical labeling
is to make it easier to refer to devices without having to know their specific
addresses. It's much simpler for software to be able to refer to a COM port by
name than by an address.
Logical
Name Assignment
Logical device names are assigned by the system
BIOS during the power-on self test, when the system is booted up. The BIOS
searches for devices by I/O address in a predefined order, and assigns them a
logical name dynamically, in
numerical order. The following are the normal default assignments for COM
ports, in order:
Port I/O Address Default IRQ
COM1 3F8-3FFh 4
COM2 2F8-2FFh 3
COM3 3E8-3EFh 4
COM4 2E8-2EFh 3
For parallel ports it is slightly more
complicated. Originally IBM defined different defaults for monochrome-based PCs
and for color PCs. Of course, all new systems have been color for many years,
but even some new systems still put LPT1 at 3BCh. Here is how the two different
labeling schemes typically work:
Port "Monochrome" Systems "Color" Systems Default IRQ
LPT1 3BC-3BFh 378-37Fh 7
LPT2 378-37Fh 278-27Fh 5
LPT3 278-27Fh -- 5
Most new systems have LPT1 at 378-37Fh. Note that
the sequences are really the same, in a way; on a "monochrome" system
if you don't put a device at 3BC-3BFh but instead put it at 378-37Fh, the BIOS
will make that LPT1 since it didn't find an LPT1 at 3BCh.
Tip: If you want to run three parallel ports (for some reason) you should put
LPT1 at 3BCh. By default most new systems put LPT1 at 378h and will not support
three parallel ports.
Problems
With Logical Device Names
Most of the problems that arise with the use of
logical device names occur when devices are added or removed from the system.
The most common problem is software that will refuse to work because the
logical device name assigned to a physical device has changed, as a result of a
device being added to or removed from the system.
Most software refers to a device by its name such
as "LPT1". However, the names are assigned dynamically by the BIOS at
boot time, when it searches your system to see what hardware it has. If you
originally had "LPT1" at 378-37Fh and you add a new parallel port and
give it the address 3BC-3BFh, then the new
one will now be LPT1 and your old port will become LPT2. This is because, as
mentioned before, the ports are labeled dynamically based on a predefined
search order, and 3BC is looked at first. If this happens, all of your software
that used to print to LPT1 will now print to LPT2, and you will either have to
switch the devices' connections to the PC, or change the software.
Home <../../../index.htm> - Search <../../../search.htm>
- Topics <../../../topic.html> - Up <index.htm>
Memory Addresses and Device BIOSes
While not really considered a standard system
resource like the others mentioned in this section, a brief discussion of
memory addresses is warranted here. Some devices, in addition to using
interrupt lines, DMA channels and/or I/O addresses, require some space in the upper memory area <../../ram/logic_UMA.htm>
for their own use. As with other resources, problems and conflicts can result
if you attempt to overlap two such devices, or try to use the memory for
programs when an adapter needs it.
The devices that use a memory area generally use
it for their own BIOS, which contains code to control the device and is invoked
by direct calls or calls from the internal system BIOS. These BIOSes are
"mapped" into the upper memory area in particular places and the BIOS
looks for them there and executes them if found. This is part of the system boot process <../bios/boot_Sequence.htm>.
There are three standard BIOSes present in most
systems and located pretty much at the same place:
·
System BIOS: The main system BIOS is located in a 64KB block
of memory from F0000h to FFFFFh.
·
VGA Video BIOS: This is the BIOS that controls your video card.
It is normally in the 32KB block from C0000h to C7FFFh.
·
IDE Hard Disk BIOS: The BIOS that controls your
IDE hard disk, if you have IDE in your system (which most do) is located from
C8000h to CBFFFh.
The most common add-in device to use a dedicated
memory address space for its own BIOS is a SCSI host adapter. This may default
to C8000-CBFFFh, which will conflict with an IDE drive that is also in the
system, but can be configured to use a different address space instead, such as
D0000-D7FFFh. In addition, network cards that have the ability to boot the
computer over the network typically also use a memory area for the boot BIOS.
Warning: Many systems use a memory manager (like EMM386) to allow the unused system
RAM in the upper memory area to be used by programs, to save conventional
memory (the standard 640KB normally available to programs.) If your system does
this and you add a device that needs some of the upper memory area for its
BIOS, you may have to add a parameter to the memory manager to tell it not to
try to use the space that the device needs. See
here for more details <../../ram/logic_UMB.htm>.
No comments:
Post a Comment