The dictionary defines a cache as a place for storing things. A computer’s cache memory is exactly that, a place for storing information that must be accessed quickly. Cache memory can be a special section of memory integrated into the Central Processing Unit (CPU) or it can be a reserved part of the main memory.
In either case, it is used by the CPU to optimize its ability to perform certain calculations. In order to understand the concept of caching, we need to know a little bit about why the need for cache memory arose.
A computer has two main facilities for data storage and retrieval, memory (called primary storage) and disk drives (hard and floppy) and tapes (called secondary storage). The distinguishing characteristic of primary storage is that it is fast and it is volatile which means that when the power is turned off, the data in primary storage is lost. Secondary storage is slower but it is non-volatile. Data written onto a disk drive remains there barring some mechanical or electrical failure.
When a computer starts up, its memory is empty so any programs and data that are required must be read from the disk drive. It often happens that not all of the required information will fit in memory at the same time so some of it is left on the disk to be read in when needed. Now since the memory is full, when the time comes to read in more information from the disk, something will have to be removed from memory to make room.
And if what is removed is needed again later, it will have to be read again from the disk. This process is called paging or swapping and is a common feature of computers ranging from desktops to mainframes.
Swapping works quite well as long as it doesn’t have to occur very often. Since reading from a disk is very slow compared to reading from memory, you can see that if a program has to stop frequently and wait for something to be read from the disk there could be a severe impact on performance. This is the impact that cache memory was designed to mitigate.
Let’s assume that Program A that is executing in memory uses another program (called Program B) to perform some function. When the time comes that program B is needed, Program A tells the operating system to find it. The operating system locates Program B, reads it from disk to memory, and lets it perform its function. When Program B finishes, it is removed from memory.
Since it’s not doing anything, there’s no need to let it use memory that’s needed for other programs. However, if Program A needs Program B over and over again, you can see that Program A will be slowed down by having to wait every time while Program B is read from the disk. Cache memory can prevent this slowdown.
In a caching system, when Program A calls for Program B, a special part of the operating system called the cache memory manager reads Program B from disk into the cache memory. Everything that is in the cache is marked with the last time it was used. This mark is called a timestamp.
If the cache is full when Program B is requested, the cache manager will remove the oldest cache entry and replace it with Program B. When Program B finishes, it is not automatically removed from the cache. It stays there until its space is needed for another program or until it is used again. Each time it is used, its timestamp is updated. So as long as Program A keeps calling Program B, it will be in memory and a disk read is not necessary.
Of course, like all solutions, this one is not perfect. If the cache fills up, then the slowdown reappears since either Program B cannot be put in the cache or it is removed immediately after it finishes. Also if the cache holds data and Program A modifies the data, it must be written to the disk so that it will not be lost if a failure occurs.
If a disk write is required after every use of Program B then a slowdown could occur. For the most part, however, caching is an elegant and effective solution to the problem of excessive paging.