How Many Cores Do You Need For Deep Learning?

The old hardware version, which you all know and love, has greatly changed. Almost this entire guide has been completely updated. I decided it was time to basically (almost) write it from scratch about How Many Cores Do You Need For Deep Learning?

This time, I tried to be a little more detailed and inclusive. I’ll continue to update this, but I also want to ensure that my readers will be able to comprehend the subject even if I cease doing so in the future. So, you’ve decided to buy a machine specifically for building machine learning models.

Or maybe you work for a company where the buzzwords from this book are frequently used, and you want to understand what they mean a little better. I’ve chosen to write this guide because this isn’t a particularly straightforward subject. There are many ways to talk about those concepts, and this article will focus on one of them.

How Many Cores Do You Need For Deep Learning?

Deep learning demands more outstandingly powerful cores. And once you set up TensorFlow manually for GPU, CPU cores are not used for training. So, if your budget is not enough, you can choose a CPU with 4 cores, but I’d recommend an i7 with 6 cores and a GPU from Nvidia for long-term use.

Cores Do You Need For Deep Learning

How Many CPU Cores Do I Need? What Are PCI-E Lanes?

The Central Processing Unit, also known as the CPU, is the next component of the training machine in development that we may discuss now that we have finished discussing graphics cards.

Typically, a GPU needs 16 PCI-Express lanes. The primary link between the CPU and GPU is PCI-Express. Fortunately, most Intel off-the-shelf components support it.

Problems typically arise when connecting two cards because doing so calls for 32 lanes, which are absent from most low-cost consumer cards. As soon as we reach 4 cards, we’ll be reliant on pricey Xeon-series (server-targeted) cards.

CPU

The primary error that people make is that they focus excessively on a CPU’s PCIe lanes. PCIe lanes shouldn’t be of much concern to you. Instead, check to see if the amount of GPUs you want is supported by the CPU and motherboard you are using. The second most frequent error is getting a CPU that is too powerful.

CPU And PCI-Express

PCIe lanes are quite popular! The problem is that it hardly impacts the effectiveness of deep learning. PCIe lanes are only necessary to swiftly transport data from your CPU RAM to your GPU RAM if you have one GPU.

With 16 lanes, 8 lanes, or 4 lanes, an ImageNet batch of 32 photos (32x225x225x3) and 32-bit requires 1.1 milliseconds, 2.3 milliseconds, and 4.5 milliseconds, respectively. These are theoretical values, although, in reality, PCIe is frequently twice as slow. Nevertheless, this is still quite quick!

Delay can be disregarded because PCIe lines frequently have latency in the millisecond range. Combining them, we arrive at the following timing for a ResNet-152 and a 32-image ImageNet mini-batch:

  • Forward and backward pass: 216 milliseconds (ms)
  • 8 PCIe lanes CPU->GPU transfer: About 5 ms (2.3 ms)
  • 16 PCIe lanes CPU->GPU transfer: About 2 ms (1.1 ms theoretical)
  • 4 PCIe lanes CPU->GPU transfer: About 9 ms (4.5 ms)

Therefore, increasing the number of PCIe lanes from 4 to 16 will result in a roughly 3.2% performance boost. However, you only gain 0% performance when using PyTorch’s data loader with pinned memory. Therefore, if you only use one GPU, don’t waste your money on PCIe lanes!

Make sure to choose a CPU PCIe lane and motherboard PCIe lane combination that supports the appropriate number of GPUs. Make sure you purchase a CPU that supports 2 GPUs if you plan to own 2 GPUs someday and a motherboard that supports 2 GPUs; do not, however, focus exclusively on PCIe lanes when making this purchase.

Choose A CPU

Intel Core-Series CPUs

Intel offered certain chips with 28, 40, and 44 PCI-E lanes at very attractive pricing during the original revision of this guide (this was generation 8). Since then, further generations have entered the market (12, Alder Lake, was just announced), and those parts have been swapped out for the more pricey “series X” parts that cater to enthusiasts.

Due to their speed and number of PCI-E lanes, those components have now become the kings of deep learning hardware. Most generation 11 i7/i9 chips have 20 lanes and are ideal if you intend to create a computer with just one GPU. The 16 lanes on all of the Generation 10 vehicles make them all still a fantastic deal.

The X-designated sections have 48 lanes, which is plenty for 4 GPUs with 8 lanes apiece and even has enough energy for two NVMe-based SSD drives if you need extra cowbell (or lanes). However, they are currently only available for generation 10 (Icy Lake).

The 10900X, the least expensive component, has 48 lanes and 10 cores. Parts with 12, 16, and 18 cores follow. The X-series for Intel’s current generation 11 (Rocket Lake) of CPUs does not appear to be coming anytime soon.

In-depth hardware support is also available for running deep models on Intel Ice Lake CPUs. If you are interested in that sort of thing, I advise reading more about it to determine whether it would be helpful.

Make sure you fully comprehend how CPU sockets operate before continuing. At this point, I’m assuming the reader is aware of what this entails. The more complicated socket configuration used by X-series parts necessitates more expensive motherboards. This shouldn’t significantly impact practice because we are talking about very expensive computers, to begin with.

Xeon Series CPUs

The Xeon CPU series is Intel’s line of processors designed for servers and data centers (unlike Core for desktops). Outside of this topic, there are many more reasons why they are more expensive.

They can supply the full 16 PCI-E lanes per GPU you would desire and require, among other things, by supporting multi-CPU systems (since you can combine and attach a couple of CPUs and their respective lanes).

Those CPUs are far more durable and should be able to handle significantly larger computational loads over extended periods is also crucial. This is comparable to how NVIDIA’s Data-Center family of GPUs outperforms consumer-level GeForce GPUs.

It’s also crucial to keep in mind that Xeon CPUs will raise the cost of the entire system for several reasons:

  • ECC (error correcting code) memory, which is more expensive, is necessary for Xeons.
  • The cost of memory, motherboards, and enclosures for Xeons is often higher (but more resilient).
  • Parts for Xeon processors are substantially more expensive than Core i7/i9 equivalents.

Despite those above, a Xeon-based workstation is generally what you want if you worry about multi-CPU systems constantly running under high loads. In the past, Xeon-based computers were more common, but Intel has subsequently added the abovementioned i9-X CPUs with plenty of PCI-E lanes for most consumer/enthusiast demands.

Before continuing, I should state that I lack experience in constructing Xeon-based computers and am unable to suggest a particular CPU that offers the best value for money. Additionally, numerous Xeon CPU alternatives are available, a lot more than for CPUs at the consumer level.

The number of supported CPU connections is one factor that drastically raises their prices. In addition, they have substantially bigger cache sizes, which is very helpful for many jobs but is again outside the purview of this essay.

Xeon CPUs must enable 8×8 PCI-E lanes (64 total) for an 8-GPU workstation to function. That is unless you choose AMD.

Conclusion

Popular high-end gaming systems are similar to the hardware just discussed in How Many Cores Do You Need For Deep Learning? Most components (motherboard, RAM, PSU, etc.) will fit perfectly in a high-end gaming setup, except the extravagant GPU sets. If you’re having difficulties finding everything here on Google, try changing “deep learning” to “gaming,” and you should have no trouble.

Frequently Asked Questions

What hardware requirements exist for deep learning?

It would help if you searched for RAM that is between 8GB and 16GB, ideally 16 GB. Try to buy an SSD with a capacity of between 256 and 512 GB to install the operating system and save some important projects. Moreover, a 1TB to 2 TB HDD space for storing deep learning projects’ datasets.

Do I need a lot of RAM for deep learning?

As a general guideline, you should have at least as much RAM for deep learning as GPU memory, plus an additional 25% for expansion. If you have both SSD and HDD setup, this straightforward formula will help you remain on top of your RAM requirements and save you a ton of time when moving between them.

What number of GPUs are required for deep learning?

While the amount of GPUs for a deep learning workstation may vary depending on your choice, it is advisable to try connecting as many as possible to your deep learning model. Your best bet for deep learning will be to start with at least four GPUs.

Are four cores sufficient for machine learning?

A 4 core CPU should be adequate for beginners on a low budget. It can train gradually. Since they have more RAM in the graphics adapter, GPUs were created to provide a better visual experience.