Have you ever been confused with how CPU, RAM, and storage are different from one another? Which one is more important for data science work? I made a fun illustration for you to quickly understand their differences.
CPU (Central Processing Unit): This is the brainpower of your computer that performs operations and tasks. In the illustration, it’s the brown cat working hard on sorting the balls by their color. The smarter cat you have (better and newer processors) and the more cats you have working at the same time (more CPUs working in parallel), your computer will be able to complete the tasks much faster.
RAM (Random Access Memory): RAM temporarily keeps track of all your computer’s concurrent activities and data. In the illustration, RAM is the yellow desk where the cat can temporarily place the beakers filled with balls (e.g. data, programs, applications). The larger the desk, the more beakers the cat can temporarily put on the desk at once. On the contrary, even if you have 100 intelligent cats working in parallel, if the desk is teeny tiny, the sorting will take a very long time to complete because there simply isn’t enough desk space for all 100 cats to operate simultaneously.
Storage: This is the hard disk drive where all your data, files, and programs are saved permanently. As you can see in the drawing, it takes a bit of a time for the turtle to ship the balls (data) back and forth for the cat (CPU) to work on.
So… which components are more important for your machine learning projects? Which laptop or virtual instance specs should you choose? The key is to balance out these components. For example, there is no point investing in 12 CPUs for parallel processing if your RAM is only 4GB. Based on my experience building models in laptops and virtual instances, a good rule of thumb is to have at least 16GB of RAM and 8 CPUs. If your code constantly keeps on crashing while running, increasing the RAM size is more helpful than increasing the number of CPUs.