2012 i5 Ivy Bridge Tests and Benchmark Update

      This will be the last update before I start using a larger data set for the test. This time we have my friend's i5 Ivy-Bridge. The system has turbo-boost locked off and is over-clocked to 4.1GHz. With this data set size, I was unable to see statistical improvement over the Sandy-Bridge at 3.4-3.8GHz.

2012 Via Nano Tests and Benchmark Update

      In June of 2012, my previous file server died, or rather, the ethernet adapter died, and I had to get a new one. The board I chose was a Via M910 with a Nano QuadCore as the CPU. Yes, "QuadCore" is the model name. Just try Googleing that...

      Before putting linux on it, I did a little benchmarking in Windows 7. First up was video playback. Amazingly, I saw mpc-hc happily using DXVA. For over ten years, Via has claimed that their chipsets support hardware playback of whatever formats were then popular. It has *never* worked. Hell has, in fact, frozen over.

      To test the CPU, I turned off DXVA. The most stressing 1080p h.264 I could come up with would just hit 60% usage. For reference, the dual-core Sandy Bridge i5 2537 ULV would hit 100% and drop frames. DXVA on the i5 does not work either.

      Moving on to the Gaussian benchmark, the QuandCore posted x86-sse2-x64 / x86OMP-sse2OMP-x64OMP results of 3.42-2.51-1.79 / 0.88-0.79-0.59 seconds. I repeated the test because I didn't believe them. Normally when you perform a major core improvement you, oh I don't know, tell someone? First off, in 32-bit code, the new Nano is about 30% faster than the old Nano. Never mind that the old Nano is clocked at 1.6GHz and the new one is clocked at 1.2GHz. I have heard rumors that the new one can up the clockspeed to 1.5GHz, but that doesn't cover the difference.

      Next, it seems that the Nano is not a very good 32-bit CPU. The 64-bit code finishes in just shy of half the time. (52% of the time). All of a sudden, we have a CPU that is almost as fast as the i5 ULV, while being clocked lower. It's in a completely different league for the old Atom/Nano/E-series competition.

      Ok, multi-core. This is the part I had to rub my eyes. It was beating dual-core Core2s and i5s and basically tied the Phenom II. Now, it's scaleing is not linear. the 4-core runs were right at 3x the speed of the single core runs, so there is some form of clockspeed adjustment going on. However, this CPU does appear to be a reasonable decktop CPU. Not just "Good enough", but actually quite fast. I'm amazed.

2011 Gaussian Blur Benchmark

      In 2011 a co-worker and I decided that a bi-variate Gaussian blur that we had written and optimized would make a good benchmark for raw, computational power. I wanted as wide a range of systems as I could rather than testing every little pesky revision. I think you will see what I mean. The compiler was Visual Studio 2010. The software was compiled while changing the SSE2, OpenMP, and x64 flags. The times are taken directly before and after the call to the Gaussian function and the run is repeated 30 times to minimize errors from system activity.
      We discovered that since no CPU has been made with amd64 support, but without SSE2, the compiler ignores the SSE2 flag while in 64 bit. Also, we found that the same CPU running XP-32, XP-64, Vista-32, Vista-64, and Win7-64 would give the same answer within measurement error. Just because a system does not have 64bit numbers does not mean that the CPU does not support 64bit code. Even today many machines are still sold with 32bit windows installed. Hard to believe, but true. Older machines were difficult. Software built in VS2010 does not run on Win2k and most machines from the early days of XP have long since been taken out of service.

The Players
      Via C3 (Cyrix) - 0.8GHz
      Via C3 (EPIA) - 1.0GHz
      Pentium 4 512k cache - 3.4GHz
      Athlon XP 2200+ - 1.8GHz
      Dual Opteron 246 - 2.0GHz
      Athlon64 FX-55 - 2.66GHz
      Athlon X2 4800+ - 2.41GHz
      Pentium D - 2.8GHz
      Celeron M - 0.9GHz
      Pentium M ULV - 1.2GHz
      Via Nano - 1.6GHz
      Atom D510 - 1.66GHz
      E-350 - 1.6GHz
      Core 2 Duo X6800
      Core 2 Quad Q9550
      Phenom II x4 940 - 3.0GHz
      Core 2 Duo T7500 - 2.2GHz
      Core 2 Duo T9800 - 2.93GHz
      Xeon x5472 dual-quad 2.99GHz
      Via QuadCore 1.2GHz/???GHz
      Core i5 2537M - 2.3GHz/1.40GHz
      Core i5 450M - 2.66GHz/2.40GHz
      Core i5 dual 650 - 3.46GHz/3.19GHz
      Core i7 quad 940 - 3.2GHz/2.93GHz
      Core i7 quad 2600K - 3.8GHz/3.4GHz
      Core i5 quad 3570K Overclocked - 4.1GHz - turbo locked

      The sorting is somewhat chronological except that I tended to clump similar CPUs together. For instance, all the modern low power CPUs (Celeron M, Pentium M, nano, atom, and E-350) are together in the center.
      Update (02-17-2012): Added the Sandy Bridge i5 2537M (Samsung 9 Series) as well as 64-bit results for the Opteron 246.
Single Tread Performance
      You can immediately tell that this benchmark does not tell the complete story. For instance, the chart shows that the Pentium 4 should be faster than the FX-55, however those two computers are in the same lab. The P4 would often take 20-30 seconds just to open PowerPoint while people would argue over getting to use the FX-55. Other interesting notes: The three K8 core chips (Opteron, FX, and Athlon X2) seem to scale perfectly with clockspeed. The Celeron and Pentium M chips get a *very* large benefit from SSE2. My lovely 180watt Dual Opteron was beaten by...an 18watt E-350. Finally, the Pentium M at a mere 1.2GHz is competing well against the Nano, Atom, and E-350; all clocked at 1.6GHz.
      Of particular note: nearly all Core 2 chips posted faster than the original Core i7, even at lower clockspeeds. It was also beaten by a Core i5. I am now highly suspect of web benchmarking sites. This was just too repeatable.
      Lastly, what do you think of the C3s? Cute huh? I wish I could have tested a Pentium II or III for a fair comparison.
Multi Tread Performance
      The multi-cpu chart shows what you really get for your money. I kept the single-cpu chips in the chart because I wanted to know how much the OpenMP overhead hurt performance when it was not used. On average, the loss was only around 4%. That's really not a big deal. This means that programmers really won't have to worry about maintaining single and multi-threaded code-sets.
      In more interesting news, you would think that the Pentium D was the god of all CPUs when it came out. Do you know anyone who agreed with that in the field? The Atom is able to use two cores plus hyper-threading to pull ahead of the Nano, but is still beaten by the E-350. Speaking of the E-350, it is slower than desktop chips, but notice that it is only half as fast as the laptop Core i5. That is very respectable for such low power. For the laptop i5, dual core is barely faster than single core. Here is turbo-boost working against us. Using 4 cores plus hyper-threading, the original i7 is finally able to pull ahead of most Core 2s, but is still beaten by the eight core Core 2 Xeon. The best time was posted by the brand new sandy-bridge i7 2600K.
      Quite a fun benchmark. For my part, I am going to be giving an E-350 laptop a real close look.

Main Page