In the previous article we explained how core counting should be done for comparison between Intel and AMD processors to be possible and got some performance numbers from a simplistic (and non-realistic) "floating point" benchmark (more on this later). We performed tests on two different server machines we have available in production:
Intel Xeon E5620 2.40GHz
AMD Opteron 6328 3.2 GHZ
In terms of operations per GHZ per second we can summarize the previously obtained results as follows:
Intel single thread: 24/2.4 = 10Normalizing the results from Intel's single thread score up, we get
Intel two threads: 28/2.4 = 11.67
AMD single thread: 24/3.2 = 7.5
AMD two threads: 40/3.2 = 12.5
AMD single thread: 0.75This normalization serves the purpose of evaluating both single thread performance and SMT scalability. We see that Intel's single thread performance is significantly better than AMD's. We also see that Intel's Hyperthreading enhances performance by 17% whereas AMD's core subdivision allows for 67% more throughput compared to the single thread situation.
Intel single thread: 1
Intel two threads: 1.17
AMD two threads: 1.25
If we assume that we want most of our machines to be busy running a large number of threads, we might prefer to use Intel's two thread performance as a reference. In that case, we obtain
AMD single thread: 0.64These scores can be regarded as relative efficiencies of each processor. From here we can derive an expression for the optimal per GHZ throughput of a fully busy machine, ie, a machine that is executing at least 2 threads per core:
Intel single thread: 0.86
Intel two threads: 1
AMD two threads: 1.07
Per GHZ perf level = N * n * SIf we need to calculate the total performance of a system we can just multiply the previous formula by the processor clock frequency
N - number of processors
n - number of cores per processor
S - processor score running two threads
Perf level = N * n * S * FNote: the previous formula assumes that we are dividing AMD's advertised number of cores by two, as explained in the previous article.
We have seen so far that, for the tested workflow, the difference in performance between Intel and AMD is not significant IF the systems is running at least two threads per core. But what about the financials?
We recently had the opportunity to compare two different server machines. All other things equal, we had
AMD dual Opteron 6320 x 4 core 2.8 GHZ machine = 2708 EUR
Intel dual E5-2620V2 x 6 core 2.1 GHZ machine = 2890 EUR
Assuming the efficiencies are the same as the ones found on the test machines (more on this below) we would find:
Throughput per EUR
TPE_AMD = 2 * 4 * 2.8 * 1.07 / 2708 = 0.008850Thus, we get nearly a match
TPE_INTEL = 2 * 6 * 2.1 * 1 / 2890 = 0.008720
TPE_AMD = 0.998 * TPE_INTELFurther testing
But it turns out that our simplistic floating-point benchmark was actually doing only integer calculations, due to the use of the bc command line calculator that works internally with integers. This was pointed out by Henry Wong, from stuffedcow.net, during a discussion of his own test results. By replacing bc with a simple loop of mathematical libc operations, compiled with gcc, we were able to test pure (still very simplistic) floating-point performance on the same machines tested before.
In terms of operations per GHZ per second we found
Intel single thread: 3.33/2.4 =1.39By performing the same normalizations as before we got
Intel two threads: 5.59/2.4 = 2.33
AMD single thread: 2.86/3.2 = 0.89
AMD two threads: 4.41/3.2 = 1.38
AMD single thread: 0.64and
AMD two threads: 0.99
Intel single thread: 1
Intel two threads: 1.68
AMD single thread: 0.38The results seem very disappointing for the AMD machine: single thread performance difference is even higher than on the previous test and this time the extra scalability that compensated for the weak single thread performance is not there. Intel shows a hyperthreading bonus of 68% whereas for AMD we see only about 55%.
AMD two threads:0.56
Intel single thread: 0.60
Intel two threads: 1
The testing we performed was meant to allow some intuition to be gained into the subject. But we know that floating point performance is a subtle topic and that we must be careful about drawing conclusions from basic testing. Therefore, we decided to have a look at an industry standard benchmark.
Reference results from spec.org
Looking at real benchmark output at spec.org we found again different results. For the throughput of floating point operations we have the following base scores:
Dell Inc. - PowerEdge M710 (Intel Xeon E5620, 2.40 GHz)Please note that at spec.org they are using AMD's core counting on the score table...In terms of operations per GHZ per second values above translate to:
two thread score (16 threads on 8 cores): 164 / 8 = 20.50
Advanced Micro Devices - Supermicro A+ Server 1022G-NTF, AMD Opteron 6328
two thread score (16 threads on 8 cores, counted Intel's way ): 289 / 8 = 36.13
Intel two threads: 20.5/2.4 = 8.54Normalizing we get:
AMD two threads: 36.13/3.2 = 11.29
Intel two threads: 1In terms of floating point throughput per EUR we would find, by combining the performance numbers from our directly tested processors with the quotes for the new machines,
AMD two threads: 1.32
TPE_AMD = 2 * 4 * 2.8 * 1.32 / 2708 = 0.0109Fortunately, we can get exact numbers from spec.org for the processors we got quotes for
TPE_INTEL = 2 * 6 * 2.1 * 1 / 2890 = 0.0087
TPE_AMD = 1.25 * TPE_INTEL
Dell Inc. - PowerEdge R720 (Intel Xeon E5-2620 v2, 2.10 GHz)In terms of operations per GHZ
two thread score (16 threads on 8 cores): 375 / = 31.25
Advanced Micro Devices - Supermicro A+ Server 4022G-6F, AMD Opteron 6320
two thread score (16 threads on 8 cores, counted Intel's way ): 268 / 8 = 33.50
Intel two threads: 31.25/2.1 = 14.88Normalizing we get:
AMD two threads: 33.5/2.8 = 11.96
Intel two threads: 1That would mean
AMD two threads: 0.80
TPE_AMD = 2 * 4 * 2.8 * 0.8 / 2708 = 0.0066
TPE_INTEL = 2 * 6 * 2.1 * 1 / 2890 = 0.0087
TPE_AMD = 0.76 * TPE_INTEL
What this means is that Intel has dramatically increased its throughput per GHZ at least for the SPEC benchmark, that uses 2 threads per core. Therefore, efficiency factors from older processors can hardly be used for TPE comparisons.
Note: Unfortunately we couldn't find a way to compare single thread scores with multi thread scores for these CPUs because spec.org runs different tests for speed (single thread) and throughput (multiple threads). In the floating point speed test AMD delivers just slightly less per GHZ (96%) then Intel, for these specific CPUs. An comparison of the two different test types is available here.
Conclusion
Since multicore processors are standard nowadays and multiprocessor machines are becoming more and more affordable it is more important to compare total per EUR throughput than maximum single thread performance.
Virtualization is here to stay and therefore the parallel throughput of current processors is of paramount importance - if the system doesn't perform well enough one can buy another one or a larger one. Unless, of course, one runs a small number of non-parallel workloads where peak single thread throughput is the defining variable.
However, we have seen that both single thread processor performance and double-thread scalability are highly dependent on the workload. The difference between integer and floating point calculations became evident from a pair of very simple cpu tests.
The most important conclusion is that in face of the inherent complexity of the subject and the artificial complexity introduced by certain marketing teams (the core confusion...) it is very hard to base purchase decisions on third party benchmarks.
For mission critical computing situations we should certainly test our specific workload on different processors and calculate the specific TPE (throughput per EUR) for the candidate systems.
References
Integer calculation script
Floating point calculation script and aux C loop (mathtest)
.
Sem comentários:
Enviar um comentário