Personal Profile

AMD

AMD Ryzen Downcore Control

 

AMD Ryzen 7 processor

AMD Ryzen Downcore Control

AMD Ryzen 7 processors comes with a nice feature: the downcore control. This feature allows to enable / disable cores. Ryzen 7 and Ryzen 5 chips use the same die, which is made up of two CCX (Cpu CompleX), each CCX having 4 cores. So disabling cores on Ryzen 7 makes it possible to emulate a Ryzen 5 CPU.

 

AMD Ryzen CCX

AMD Ryzen CCX

The downcore control is an option available in the BIOS of X370 motherboards (maybe on other chipsets like the B350 too, I don’t know).

Here is the downcore control in the MSI X370 Gaming Pro Carbon BIOS:

 

MSI X370 Gaming Pro Carbon - Downcore control in the BIOS

MSI X370 Gaming Pro Carbon – Downcore control in the BIOS

 

A Ryzen 7 CPU has two CCX with all cores enabled (8 cores or 8C/16T). It’s a HEIGHT (4 + 4) or Auto configuration:

1 2 3 4
1 2 3 4

 

AMD Ryzen 7 Downcore control - Auto

AMD Ryzen 7 Downcore control – Auto

A Ryzen 5 1600 or 1600X has two CCX and 6 cores (6C/12T) enabled:

SIX (3 + 3)

1 2 3 4
1 2 3 4

 

AMD Ryzen 7 Downcore control - SIX (3 + 3)

AMD Ryzen 7 Downcore control – SIX (3 + 3)

A Ryzen 5 1400 or 1500X has two CCX and 4 cores (4C/8T) enabled. This can be emulated on a Ryzen 7 with two configurations: FOUR (2 + 2) or FOUR (4 + 0).

FOUR (2 + 2)

1 2 3 4
1 2 3 4

FOUR (4 + 0)
This configuration is interesting because it uses only one CCX and avoids CCX inter-connection issues (mainly the slowness: two cores on different CCX can communicate at around 30GB/s –via Ryzen Infinity Fabric which depends on the memory controller clock speed– while two cores on the same CCX can communicate at 175GB/s).

1 2 3 4
1 2 3 4

 

AMD Ryzen 7 Downcore control - FOUR (2 + 2)

AMD Ryzen 7 Downcore control – FOUR (2 + 2)

The downcore control allows to emulate 3C/6T and 2C/4T CPUs (Ryzen 3?) with the following configurations:

THREE (3 + 0)

1 2 3 4
1 2 3 4

 

AMD Ryzen 7 Downcore control - THREE (3 + 0)

AMD Ryzen 7 Downcore control – THREE (3 + 0)

TWO (1 + 1)

1 2 3 4
1 2 3 4

TWO (2 + 0)

1 2 3 4
1 2 3 4

 

AMD Ryzen 7 Downcore control - TWO (1 + 1)

AMD Ryzen 7 Downcore control – TWO (1 + 1)

 

Multi-GPU DirectX 12 shootouts show AMD with performance lead over Nvidia

One of the most exciting parts of Microsoft’s DirectX 12 API is the ability to pair graphics cards of varying generations, performance, or even manufacturers together in a single PC, to pool their resources and thus make games and applications run better. Unfortunately, testing “Explicit Multi Adaptor” (EMA) support under real-world conditions (i.e. not synthetic benchmarks) has so far proven difficult. There’s only been one game designed to take advantage of DX12’s numerous low-level improvements—including asynchronous compute, which allows GPUs to execute multiple command queues simultaneously—and the early builds of that game didn’t feature support for multiple GPUs.

Multi-GPU DirectX 12 shootouts show AMD with performance lead over Nvidia

Multi-GPU DirectX 12 shootouts show AMD with performance lead over Nvidia

 

As you might have guessed from the headline of this story, it does now. The latest beta version of Stardock’s real-time strategy game Ashes of the Singularity includes full support for EMA, meaning that for the first time we can observe what performance boost (if any) we get by doing the previously unthinkable and sticking an AMD and Nvidia card into the same PC. That’s not to mention seeing how EMA stacks up again SLI or Crossfire—which have to be turned off in order to use DX12’s multi-GPU features—and whether AMD can repeat the ridiculous performance gains seen in the older Ashes benchmark.

Benchmarks conducted by a variety of sites, including Anandtech, Techspot, PC World, and Maximum PC all point to the same thing: EMA works, scaling can reach as high as 70 percent when adding a second GPU, and yes, AMD and Nvidia cards play nicely together.

That EMA works at all is something of an achievement for developer Stardock. Not only is it the first developer to implement the technology into an actual game, but doing so is hard going. Unlike older APIs like DX11 and OpenGL and multi-GPU support under the the proprietary systems developed by Nvidia (SLI) and AMD (Crossfire), you have to be a tenacious developer indeed to work with EMA and DX12. Under DX12, work that was previously handled by the driver has to be done manually. That’s a double-edged sword: if the developer knows what they’re doing, DX12 could provide a big performance uplift; but if they don’t, performance could actually decrease.

That said, developers do have a few options for implementing multiple GPUs under DX12. Implicit Multi Adapter (IMA) is the easiest, and is essentially like a DX12 version of Crossfire or SLI, with the driver doing most of the work to distribute tasks between GPUs (a feature not part of the Ashes benchmark). Then there’s EMA, which has two modes: linked or unlinked mode. Linked mode requires GPUs to be close to the same hardware, while unlinked—which is what Ashes uses—allows any mix of GPUs to be used. The whole point of this, and why this works at all under DX12, is to make use of Split Frame Rendering (SFR). This breaks down each frame of a game into several tiles, which are then rendered in parallel by the GPUs. This is different to the Alternate Frame Rendering (AFR) used in DX12, where each GPU renders an entire frame each, duplicating data across each GPU.

In theory, with EMA and SFR, performance should go way up. Plus, users should benefit from pooling graphics memory (i.e. using two 4GB GPUs would actually result in 8GB of usable graphics memory). The one bad thing about the Ashes benchmark? It currently only supports AFR.

Hitman could become the first big DirectX 12 game

IO Interactive and AMD team up for a big performance boost on Radeon graphics cards.
DirectX 12 may soon appear in a big-budget game with next month’s launch of Hitman.

Hitman could become the first big DirectX 12 game-Hitman 2016

Hitman could become the first big DirectX 12 game-Hitman 2016

AMD says it’s collaborating with Hitman developer IO Interactive to enable the next-generation graphics tech. It sounds like this will be the first game to take advantage of DirectX 12’s Asynchonous Shaders feature, which spreads different tasks (such as lighting, physics, and memory) across the GPU’s individual computing units, letting them all work at the same time. This should allow for big gains in image quality without a performance hit.

Indeed, Hitman might be the first DirectX 12 game on the market from a major publisher. The stealth action thriller is set to launch on March 11, long before other confirmed DirectX titles such as Deus Ex: Mankind Divided and Fable Legends. It’s possible that Gears of War: Ultimate Edition could sneak in sooner with an early 2016 launch, but so far Microsoft hasn’t given a specific release date.

Aside from those new releases, some existing games such as Just Cause 3 and The Elder Scrolls Online are also in the works. Some smaller games such as Descent: Underground added experimental DirectX 12 support last year.

To take advantage of DirectX 12, players will need to be running Windows 10—Microsoft has no plans to bring the tech to older versions—and AMD cards will need to run the company’s Graphics Core Next Architecture, covering nearly every card released since 2012.

A Defining Moment for Heterogeneous Computing

The streets of downtown Austin, just cleared of music festival attendees and auto racing fans, are now filled with enthusiasts of a different sort. This year the city is host to SC15, the largest event for supercomputing systems and software, and AMD is on site to meet with customers and technology partners.

 

A Defining Moment for Heterogeneous Computing

A Defining Moment for Heterogeneous Computing

The hardware is here, of course, including industry-leading AMD FirePro™ graphics and the upcoming AMD Opteron™ A1100 64-bit ARM® processor. However, the big story for AMD at the show this year is the “Boltzmann Initiative”, delivering new software tools to take advantage of the processing power of our products, including those on the future roadmap, like the new “Zen” x86 CPU core coming next year.  Ludwig Boltzmann was a theoretical physicist and mathematician who developed critical formulas for predicting the behavior of different forms of matter. Today, these calculations are central to work done by the scientific and engineering communities we are targeting with these tools.

First though, just a quick review of what ties this story together: Heterogeneous Computing. The Heterogeneous System Architecture (HSA) Foundation was created in 2012, with AMD as a founding member, to make it dramatically easier to program heterogeneous computing systems. Heterogeneous computing takes advantage of CPUs, GPUs, and other accelerators such as DSPs and other programmable and fixed-function devices to help increase performance and efficiency with the goal of reduced energy use. The GPU in particular is a critical component since general purpose computing on a GPU (GPGPU) makes large performance gains achievable for certain applications through parallel execution. However, while effectively harnessing the GPU for computing has become easier, AMD is taking a huge leap forward today with the announcement of the Boltzmann Initiative and its three key new tools for developers.

The first innovation is our new, heterogeneous compute compiler (HCC) for C++ programming. Over the last several years, it’s been possible to program for GPU compute through the use of OpenCL™, an open industry standard language, or the proprietary CUDA language. Both provide a general-purpose model for data parallelism as well as low-level access to hardware. And while both are significant improvements in both ease and functionality compared to previous methods, they still require unique programming skills. This is a problem because the potential for leveraging the GPU is so great and so diverse. Applications ranging from 3D medical imaging to facial recognition, from climate analysis to human genome mapping can all benefit, to name a few.

Ultimately, for heterogeneous computing to become a mainstream reality, these technologies will need to become accessible to a majority of the programmers in the world through more familiar languages such as C++. By creating a logical model where heterogeneous processors fully share system resources such as memory, HSA promises a standard programming model that allows developers to write code that can run seamlessly on whatever processor block is best able to execute it. The idea of matching the right workload to the right processor is compelling and being embraced by many hardware and software companies. The new AMD C++ compiler makes that idea a whole lot easier to execute.

Second is our new Linux® driver. While the Windows® operating system is fantastic and supports billions of consumer client devices and commercial servers, Linux is highly popular in technical and scientific communities where collaboration on application development is the traditional model to maximize performance. By making an all new Linux driver available, AMD is helping expand the developer base for heterogeneous computing even further. Important benefits for the programmer of this new, headless Linux driver include low latency compute dispatch, peer-to-peer GPU support, Remote Direct Memory Access (RDMA) from InfiniBand™ interconnects directly to GPU memory, and Large Single Memory Allocation support. Combined with the new C++ compiler, the Linux driver is a powerful addition to the Boltzmann Initiative.

Finally, for applications already developed in CUDA, they can now be ported into C++. This is achieved using the new Heterogeneous-computing Interface for Programmers (HIP) tool that ports CUDA runtime APIs into C++ code. AMD testing shows that in many cases 90 percent or more of CUDA code can be automatically converted into C++ by HIP. The remainder will require manual programming, but this should take a matter of days, not months as before. Once ported, the application could run on a variety of underlying hardware, and enhancements could be made directly through C++. The overall effect would enable greater platform flexibility and reduced development time and cost.

The availability of the new C++ compiler, Linux driver and HIP tool means that heterogeneous computing will be available to many more software developers, substantially increasing the pool of programmers. That’s a tremendous amount of brain power that can now create applications that more readily take advantage of the underlying hardware. It also means many more applications can take advantage of parallelism, when applicable, enabling better performance and greater energy efficiency. I encourage you to stop by booth #727 at the Austin Convention Center this week to learn more!

Fable Legends: AMD and Nvidia go head-to-head in latest DirectX 12 benchmark

As DirectX 12 and Windows 10 roll out across the PC ecosystem, the number of titles that support Microsoft’s new API is steadily growing. Last month, we previewed Ashes of the Singularity and its DirectX 12 performance; today we’re examining Microsoft’s Fable Legends. This upcoming title is expected to debut on both Windows PCs and the Xbox One and is built with Unreal Engine 4.

Like Ashes, Fable Legends is still very much a work-in-progress. Unlike Ashes of the Singularity, which can currently be bought and played, Microsoft chose to distribute a standalone benchmark for its first DirectX 12 title. The test has little in the way of configurable options and performs a series of flybys through complex environments. Each flyby highlights a different aspect of the game, including its day/night cycle, foliage and building rendering, and one impressively ugly troll. If Ashes of the Singularity gave us a peek at how DX12 would handle several dozen units and intense particle effects, Fable Legends looks more like a conventional first-person RPG or FPS.

Fable2

There are other facets to Fable Legends that make this a particularly interesting match-up, even if it’s still very early in the DX12 development cycle. Unlike Ashes of the Singularity, which is distributed through Oxide, this is a test distributed directly by Microsoft. It uses the Unreal 4 engine — and Nvidia and Epic, Unreal’s developer, have a long history of close collaboration. Last year, Nvidia announced GameWorks support for UE4, and the UE3 engine was an early supporter of PhysX on both Ageia PPUs and later, Nvidia GeForce cards.

Test setup

We tested the GTX 980 Ti and Radeon Fury X in Windows 10 using the latest version of the operating system. Our testbed was an Asus X99-Deluxe motherboard with a Core i7-5960X, 16GB of DDR4-2667 memory. We tested an AMD-provided beta driver for the Fury X and with Nvidia’s latest WHQL-approved driver, 355.98. NVidia hasn’t released a beta Windows 10 driver since last April, and the company didn’t contact us to offer a specific driver for the Fable Legends debut.

Fable3

The benchmark itself was provided by Microsoft and can run in a limited number of modes. Microsoft provided three presets — a 720p “Low” setting, a 1080p “Ultra” and a 4K “Ultra” benchmark. There are no user-configurable options besides enabling or disabling V-Sync (we tested with V-Sync disabled) and the ability to specify low settings or ultra settings. There is no DX11 version of the benchmark. We ran all three variants on both the Fury X and GTX 980 Ti.

Test Results (Original and Amended):

Once other sites began posting their own test results, it became obvious that our own 980 Ti and Fury X benchmarks were both running more slowly than they should have. It’s normal to see some variation between review sites, but gaps of 15-20% in a benchmark with no configurable options? That meant a different problem. Initial retests confirmed the figures shown below, even after wiping and reinstalling drivers.

FableLegends

The next thing to check was power management — and this is where we found our smoking gun. We tested Windows 10 in its “Balanced” power configuration, which is our standard method of testing all hardware. While we sometimes increase to “High Performance” in corner cases or to measure its impact on power consumption, Windows can generally be counted on to handle power settings, and there’s normally no performance penalty for using this mode.

Imagine our surprise, then, to see the following when we fired up the Fable benchmark:

 

Fable-Bench

The benchmark is actively running in the screenshot above, with power conservation mode and clock speed visible at the same time. And while CPU clock speed isn’t the determining factor in most titles, clocking down to 1.17GHz is guaranteed to have an impact on overall frame rates. Switching to “High Performance” pegged the CPU clock between 3.2 and 3.3GHz — exactly where we’d expect it to be. It’s not clear what caused this problem — it’s either a BIOS issue with the Asus X99-Deluxe or an odd driver bug in Windows 10, but we’ve retested both GPUs in High Performance mode.

Fable-RetestThese new results are significantly different from our previous tests. 4K performance is unchanged, and the two GPUs still tie, but 1080p performance improves by roughly 8% on the GTX 980 Ti and 6% on the Fury X. Aftermarket GTX 980 Ti results show higher-clocked manufacturing variants of that card as outperforming the R9 Fury X, and those are perfectly valid data points — if you want to pay the relatively modest price premium for a high-end card with more clock headroom, you can expect a commensurate payoff in this test. Meanwhile, the R9 Fury X no longer wins 720p as it did before. Both cards are faster here, but the GTX gained much more from the clock speed boost, leaping up 27%, compared to just 2% for AMD. While this conforms to our general test trends in DX11, in which AMD performs more capably at higher resolutions, it’s still unusual to see only one GPU respond so strongly to such ludicrously low clock speeds.

These new runs, like the initials, were performed multiple times. We ran the benchmark 4x on each card, at each quality preset, but threw out the first run in each case. We also threw out runs that appeared unusually far from the average.

Why include AMD results?

In our initial coverage for this article, we included a set of AMD-provided test results. This was mostly done for practical reasons — I don’t actually have an R9 390X, 390, or R9 380, and therefore couldn’t compare performance in the midrange graphics stack. Our decision to include this information “shocked” Nvidia’s PR team, which pointed out that no other reviewer had found the R9 390 winning past the GTX 980.

Implications of impropriety deserve to be taken seriously, as do charges that test results have misrepresented performance. So what’s the situation here? While we may have shown you chart data before, AMD’s reviewer guide contains the raw data values themselves. According to AMD, the GTX 980 scored 65.36 FPS in the 1080p Ultra benchmark using Nvidia’s 355.98 driver (the same we driver we tested). Our own results actually point to the GTX 980 being slightly slower — when we put the card through its paces for this section of our coverage, it landed at 63.51 FPS. Still, that’s just a 3% difference.

AMD-Perf1

It’s absolutely true that Tech Report’s excellent coverage shows the GTX 980 beating past the R9 390  (TR was the only website to test an R9 390 in the first place). But that doesn’t mean AMD’s data is non-representative. Tech Report notes that it used a Gigabyte GTX 980, with a base clock of 1228MHz and a boost clock of 1329MHz. That’s 9% faster than the clocks on my own reference GTX 980 (1127MHz and 1216MHz respectively).

Multiply our 63.51 FPS by 1.09x, and you end up with 69 FPS — exactly what Tech Report reported for the GTX 980. And if you have an NV GTX 980 clocked at this speed, yes, you will outperform a stock-clocked R9 390. That, however, doesn’t mean that AMD lied in its test results. A quick trip to Newegg reveals that GTX 980s ship in a variety of clocks, from a low of 1126MHz to a high of 1304MHz. That, in turn, means that the highest-end GTX 980 is as much as 15% faster than the stock model. Buyers who tend to buy on price are much more likely to end up with cards at the base frequency, the cheapest EVGA GTX 980 is $459, compared to $484 for the 1266MHz version.

AMD-Perf2

There’s no evidence that AMD lied or misconstrued the GTX 980’s performance. Neither did Tech Report. Frankly, we prefer testing retail hardware when such equipment is available, but since GPU vendors tend to charge a premium for higher-clocked GPUs, it’s difficult to select any single card and declare it representative.

Amended Conclusion:

Nvidia’s overall performance in Fable Legends remains excellent, though whether Team Red or Green wins is going to depend on which type of card, specifically, you’ve chosen to purchase. The additional headroom left in many of Nvidia’s current designs is a feature, not a bug, and while it makes it more difficult to point at any single point and declare it representative of GTX 980 Ti or 980 performance, we suspect most enthusiasts appreciate the additional headroom.

The power issues that forced a near-total rewrite of this story, however, also point to the immaturity of the DirectX 12 ecosystem. Whether you favor AMD or Nvidia, it’s early days for both benchmarks and GPUs, and we wouldn’t recommend making drastic decisions around expected future DirectX 12 capability. There are still unanswered questions and unclear situations surrounding certain DirectX 12 features, like asynchronous computing on Nvidia cards, but the overall performance story from Team Red vs. Team Green is positive. The fact that a stock R9 390, at $329, outperforms a stock GTX 980 with an MSRP of $460, however, is a very nice feather in AMD’s cap.

 



Popular Pages
Popular posts
Interested
About me

My name is Sayed Ahmadreza Razian and I am a graduate of the master degree in Artificial intelligence .
Click here to CV Resume page

Related topics such as image processing, machine vision, virtual reality, machine learning, data mining, and monitoring systems are my research interests, and I intend to pursue a PhD in one of these fields.

جهت نمایش صفحه معرفی و رزومه کلیک کنید

My Scientific expertise
  • Image processing
  • Machine vision
  • Machine learning
  • Pattern recognition
  • Data mining - Big Data
  • CUDA Programming
  • Game and Virtual reality

Download Nokte as Free


Coming Soon....

Greatest hits

Gravitation is not responsible for people falling in love.

Albert Einstein

The fear of death is the most unjustified of all fears, for there’s no risk of accident for someone who’s dead.

Albert Einstein

Imagination is more important than knowledge.

Albert Einstein

Waiting hurts. Forgetting hurts. But not knowing which decision to take can sometimes be the most painful.

Paulo Coelho

Anyone who has never made a mistake has never tried anything new.

Albert Einstein

It’s the possibility of having a dream come true that makes life interesting.

Paulo Coelho

You are what you believe yourself to be.

Paulo Coelho

One day you will wake up and there won’t be any more time to do the things you’ve always wanted. Do it now.

Paulo Coelho


Site by images
Statistics
  • 3,615
  • 11,371
  • 93,831
  • 26,168
Recent News Posts