Personal Profile

GPU

TechPowerUp GPU-Z v0.8.7 Released

TechPowerUp GPU-Z v0.8.7 Released

TechPowerUp GPU-Z v0.8.7 Released

TechPowerUp released the latest version of GPU-Z, the popular graphics subsystem information, monitoring, and diagnostic utility. Version 0.8.7 adds support for new GPUs, fixes a variety of bugs, and improves the interface. To begin with, support is added for AMD Radeon R9 380X, R7 350, and the “Mullins” APU; NVIDIA GTX 980 8GB (notebooks), GTX 965M, GTX 750 (GM206), GT 710 (GK208), Quadro K1200, M5000, M2000M, M1000M, K2200M, GRID K160Q, and Tesla K80; and Intel “Skylake” Gen9 Graphics 510, P530, 540.

Among the bug fixes include improved Radeon Software version number detection, correct DirectX hardware-support readout on Intel “Skylake” IGPs, accurate 1.55V voltage reading for AMD “Fiji” GPUs in ULPS modes, a BSOD on Intel “Cloverview” Atom Z2760, SKU naming for AMD “Beema” chips; improved detection of CUDA devices running on Bus ID greater than 9, and a better explanation for OpenCL detection errors. The Armenian language pack has been improved.

TechPowerUp GPU-Z v0.8.7 Released

TechPowerUp GPU-Z v0.8.7 Released

DOWNLOAD: TechPowerUp GPU-Z 0.8.7 | GPU-Z 0.8.7 ASUS ROG Themed

The change-log follows.

  • Radeon Software Crimson Edition Driver version is now properly detected
  • Fixed 1.55V GPU voltage reading on AMD “Fiji” GPUs when in ULPS
  • Fixed “Skylake” iGPU DirectX 12 feature level to correctly show 12_1
  • Fixed bluescreen on Intel Cloverview (Atom Z2760)
  • Fixed CUDA detection for devices on bus number bigger than 9
  • Fixed AMD Beema naming
  • Improved explanation for OpenCL detection errors on AMD GPUs
  • Some HD 2000 and HD 3000 cards are now correctly recognized as ATI
  • Revision ID is now always displayed as two digits
  • Fixed shader model still being displayed on older cards
  • Fixed millisecond precision in log file timestamps
  • Updated Armenian language texts
  • Miscellaneous Stability fixes
  • Added support for NVIDIA GTX 980M 8GB, GTX 965M, GTX 750 (GM206), GT 710 (GK208), Quadro K1200, M5000, M2000M, M1000M, K2200M, GRID K160Q, Tesla K80
  • Added support for AMD R9 380X, R7 350, Mullins
  • Added support for Intel Skylake Graphics 510, P530, 540
TechPowerUp GPU-Z v0.8.7 Released

TechPowerUp GPU-Z v0.8.7 Released

IBM Watson Chief Technology Officer Rob High to Speak at GPU Technology Conference

IBM Watson Chief Technology Officer Rob High to Speak at GPU Technology Conference

IBM Watson Chief Technology Officer Rob High to Speak at GPU Technology Conference

Highlighting the key role GPUs will play in creating systems that understand data in human-like ways, Rob High, IBM Fellow, VP and chief technology officer for Watson, will deliver a keynote at our GPU Technology Conference, in Silicon Valley, on April 6.

Five years ago, Watson grabbed $1 million on Jeopardy!, competing against a pair of the TV quiz show’s top past winners. Today, IBM’s Watson cognitive computing platform helps doctors, lawyers, marketers and others glean key insights by analyzing large volumes of data.

High will join a lineup of speakers at this year’s GTC that includes NVIDIA CEO Jen-Hsun Huang and Toyota Research Institute CEO Gill Pratt, who will all highlight how machines are learning to solve new kinds of problems.

Fueling an AI Boom

Watson is among the first of a new generation of cognitive systems with far-reaching applications. It uses artificial intelligence technologies like image classification, video analytics, speech recognition and natural language processing to solve once intractable problems in healthcare, finance, education and law.

GPUs are at the center of this artificial intelligence revolution (see “Accelerating AI with GPUs: A New Computing Model”). And they’re part of Watson, too.

IBM announced late last year that its Watson cognitive computing platform has added NVIDIA Tesla K80 GPU accelerators. As part of the platform, GPUs enhance Watson’s natural language processing capabilities and other key applications. (Both IBM and NVIDIA are members of the OpenPOWER Foundation. The open-licensed POWER architecture is the CPU that powers Watson.)

GPUs are designed to race through a large number of tasks at once, something called parallel computing. That makes them ideal for many of the esoteric mathematical tasks that underpin cognitive computing, such as sparse and dense matrix math, graph analytics and Fourier transforms.

NVIDIA GPUs have proven their ability to accelerate applications on everything from PCs to supercomputers using all these techniques. Bringing the parallel computing capabilities of GPUs to these compute-intensive tasks allows more complex models to be used, and used quickly enough to power systems that can respond to human input.

Rob High, IBM Fellow, VP and chief technology officer for Watson, will speak at our GPU Technology Conference

Rob High, IBM Fellow, VP and chief technology officer for Watson, will speak at our GPU Technology Conference

Understanding Language

The capabilities brought to Watson from GPUs are key to understanding the vast sums of data people create every day — a problem that High and his team at IBM set out to solve with Watson.

With structured data representing only 20 percent of the world’s total, traditional computers struggle to process the remaining 80 percent of unstructured data. This means that many organizations are hampered from gathering data from unstructured text, video and audio that can give them a competitive advantage.

Cognitive systems, like Watson, set out to change that by focusing on understanding language as the starting point for human cognition. IBM’s engineers designed Watson to deal with the probabilistic nature of human systems.

Dive in at Our GPU Technology Conference

Our annual GPU Technology Conference is one of the best places to learn more about Watson and other leading-edge technologies, such as self-driving cars, artificial intelligence, deep learning and virtual reality.

Get Ready for the HTC Vive with NVIDIA GPUs

HTC today announced the consumer version of its Vive virtual reality headset. HTC Vive brings “room-scale” VR that enables new ways for gamers and professionals to walk around and interact with their virtual environments.

Get Ready for the HTC Vive with NVIDIA GPUs

Get Ready for the HTC Vive with NVIDIA GPUs

It will also bring a host of exciting new content that will be available from the Steam platform — such as Tilt Brush (which lets you paint on a virtual canvas), Everest VR (which lets you grapple your way up the world’s tallest peak) and Job Simulator (a fun take on modern work life).

HTC published its recommended specs to power its new headset. It also highlighted several GeForce GTX-powered PCs optimized for Vive from Alienware, MSI and HP.

To get the most out of your experience, you’ll need a PC with a GeForce GTX 970 or higher, a GeForce GTX 980-based notebook or a workstation with a Quadro M5000 or higher.

All these VR-ready GPUs support GameWorks VR and DesignWorks VR technologies, which help reduce latency and improve performance for VR games and apps. In fact, HTC Vive takes advantage of NVIDIA Direct Mode, a GameWorks VR feature that provides plug-and-play compatibility between NVIDIA GPUs and the headset.

If you’re looking for a VR-ready PC, or just need to update your graphics card, head over to GeForce.com and check out a wide range of GeForce GTX VR Ready graphics cards, PCs and notebooks.

Stay tuned for Feb. 29 when HTC Vive goes up for pre-order.

NVIDIA’s GPU-Horsepower in Autos Unleashed at GTC 2016

NVIDIA’s GPU-Horsepower in Autos Unleashed at GTC 2016

NVIDIA’s GPU-Horsepower in Autos Unleashed at GTC 2016

Hands off the steering wheel. Feet off the pedals. Sit back. Relax. We’ll take you on an eye-opening ride at next month’s GPU Technology Conference.

GTC 2016, set for April 4-7 in Silicon Valley, features an amazing collection of over 500 sessions and tutorials.

For the car-inclined, we’ll have an Automotive track with more than two dozen sessions on autonomous driving, driver assistance systems and next-generation human machine interfaces.

Our partners from around the globe — including Audi, Ford, Mercedes-Benz, Volvo, Eyeris and Elektrobit — will tell stories of how the automotive industry is being transformed, and how GPUs and deep learning are leading the way.

Mercedes-Benz “Concept IAA” (Intelligent Aerodynamic Automobile). Touch-based operating philosophy. Mercedes-Benz’s Concept IAA (Intelligent Aerodynamic Automobile) with touch-based operating philosophy.

Mercedes-Benz “Concept IAA” (Intelligent Aerodynamic Automobile). Touch-based operating philosophy. Mercedes-Benz’s Concept IAA (Intelligent Aerodynamic Automobile) with touch-based operating philosophy.

We’ll also have some of the coolest super cars powered by NVIDIA at the San Jose Convention Center. You’ll also see hands-on demonstrations of what happens inside DRIVE PX 2, the brain of the self-driving car. Try out the latest VR headsets and be transported to other worlds. And Gill Pratt, one of the world’s leading figures in AI and CEO of the Toyota Research Institute, will deliver a keynote address on April 7.

Here’s a snapshot of some of the sessions that should get your adrenaline pumping:

Audi VR experience advanced setup

Audi VR experience advanced setup

  • Audi AG’s Marcus Kuehne and Thomas Zuchtriegel, in a talk called Audi VR Experience – A Look into the Future of Digital Retail, will share challenges and learnings from Audi’s work creating a VR-based showroom.
  • Richard Membarth of DFKI and Christoph Lauer of Audi AG, in a talk on Safety-Critical Functions with High Reliability, will describe the new possibilities for crash prediction in embedded systems that are only possible by taking advantage of recent developments of embedded GPUs.
  • The Foundry’s Vilya Harvey will share how Mercedes-Benz worked with his firm on next-gen digital user experiences for drivers, in his talk Hollywood Under the Hood: The Mercedes Concept IAA.
  • Elektrobit’s Karsten Hoffmeister, speaking on Software Architectures for Autonomous Driving Vehicles, will discuss modern vehicle functions like advanced driver assistance systems and the rapidly growing demand for high performance computing power. He’ll also address the need for critical, highly reliable safety functions in the next generation of vehicle infrastructure platforms.
  • Modar Alaoui, CEO of Eyeris, will introduce attendees to vision software that reads facial micro-expressions in real time for use in driver monitoring systems. He’ll also include a live demo in his session, Driver Face Analytics & Emotion Recognition Using Deep Learning.

New Features in CUDA 7.5

Today I’m happy to announce that the CUDA Toolkit 7.5 Release Candidate is now available. The CUDA Toolkit 7.5 adds support for FP16 storage for up to 2x larger data sets and reduced memory bandwidth, cuSPARSE GEMVI routines, instruction-level profiling and more. Read on for full details.

16-bit Floating Point (FP16) Data

CUDA 7.5 expands support for 16-bit floating point (FP16) data storage and arithmetic, adding new half and half2 datatypes and intrinsic functions for operating on them. 16-bit “half-precision” floating point types are useful in applications that can process larger datasets or gain performance by choosing to store and operate on lower-precision data. Some large neural network models, for example, may be constrained by available GPU memory; and some signal processing kernels (such as FFTs) are bound by memory bandwidth.

Many applications can benefit by storing data in half precision, and processing it in 32-bit (single) precision. At GTC 2015 in March, NVIDIA CEO Jen-Hsun Huang announced that future Pascal architecture GPUs will include full support for such “mixed precision” computation, with FP16 (half) computation at higher throughput than FP32 (single) or FP64 (double) .

With CUDA 7.5, applications can benefit by storing up to 2x larger models in GPU memory. Applications that are bottlenecked by memory bandwidth may get up to 2x speedup. And applications on Tegra X1 GPUs bottlenecked by FP32 computation may benefit from 2x faster computation on half2 data.

CUDA 7.5 provides 3 main FP16 features:

  1. A new header, cuda_fp16.h defines the half and half2 datatypes and __half2float() and __float2half() functions for conversion to and from FP32 types, respectively.
  2. A new `cublasSgemmEx()“ routine performs mixed-precision matrix-matrix multiplications using FP16 data (among other formats) as inputs, while still executing all computation in full 32-bit precision. This allows multiplication of 2x larger matrices on the GPU.
  3. For current users of Drive PX with Tegra X1 GPUs (and on future GPUs such as Pascal), cuda_fp16.h also defines intrinsics for 16-bit computation and comparison. cuBLAS also includes cublasHgemm() (half-precision computation matrix-matrix multiply) routine for these GPUs.

NVIDIA GPUs implement the IEEE 754 floating point standard (2008), which defines half-precision numbers as follows (see Figure 1).

  • Sign: 1 bit
  • Exponent width: 5 bits
  • Significand precision: 11 bits (10 explicitly stored)

The range of half-precision numbers is approximately 5.96 \times 10^{-8} \ldots 6.55 \times 10^4. half2 structures store two half values in the space of a single 32-bit word, as the bottom of Figure 1 shows.

Figure 1: 16-bit half-precision data formats. Top: single `half` value. Bottom: `half2` vector representation.
Figure 1: 16-bit half-precision data formats. Top: single `half` value. Bottom: `half2` vector representation.

New cuSPARSE Routines Accelerate Natural Language Processing.

The cuSPARSE library now supports the cusparse{S,D,C,Z}gemvi() routine, which multiplies a dense matrix by a sparse vector, using the following equation.

\mathbf{y} = \alpha op(\mathbf{A}) \mathbf{x} + \beta \mathbf{y},

where \mathbf{A} is a dense matrix, \mathbf{x} is a sparse input vector, \mathbf{y} is a dense output vector, and op() is either a no-op, transpose, or conjugate transpose. For example:

\left[ \begin{array}{c} \mathbf{y}_1 \\ \mathbf{y}_2 \\ \mathbf{y}_3 \end{array} \right] = \alpha \left[ \begin{array}{ccccc} \mathbf{A}_{11} & \mathbf{A}_{12} & \mathbf{A}_{13} & \mathbf{A}_{14} & \mathbf{A}_{15} \\ \mathbf{A}_{21} & \mathbf{A}_{22} & \mathbf{A}_{23} & \mathbf{A}_{24} & \mathbf{A}_{25} \\ \mathbf{A}_{31} & \mathbf{A}_{32} & \mathbf{A}_{33} & \mathbf{A}_{34} & \mathbf{A}_{35} \end{array} \right] \left[ \begin{array}{c} - \\ 2 \\ - \\ - \\ 1 \end{array} \right] + \beta \left[ \begin{array}{c} \mathbf{y}_1 \\ \mathbf{y}_2 \\ \mathbf{y}_3 \end{array} \right]

This type of computation is useful in machine learning and natural language processing applications. Suppose I’m processing English language documents, so I start with a dictionary, which assigns a unique index to every word in the English language. If the dictionary has N entries, then any document can be represented with a Bag of Words (BoW): an N-dimensional vector in which each entry is the number of occurences of the corresponding dictionary word in the document.

In natural language processing and machine translation, it’s useful to compute a vector representation of words, where the vectors have O(300) dimensions (rather than a raw BoW representation which may have hundreds of thousands of dimensions, due to the size of the language dictionary). A good example of this approach is the word2vec algorithm, which maps natural language words into a semantically meaningful vector space. In word2vec, similar words map to similar locations in the vector space, which aids reasoning about word relationships, pattern recognition, and model generation.

Mapping a sentence or document represented as a BoW into the lower-dimensional word vector space requires multiplying a dense matrix with a sparse vector, where each row in the matrix corresponds to the vector corresponding to a dictionary word, and the vector is the sparse BoW vector for the sentence/document.

The new cusparse{S,D,C,Z}gemvi() routine in CUDA 7.5 makes it easier for developers of these complex applications to achieve high performance with GPUs. cuSPARSE routines are tuned for top performance on NVIDIA GPUs, so users don’t need to be experts in GPU performance.

To learn more about related techniques in machine translation, check out the recent post Introduction to Neural Machine Translation.

Pinpoint Performance Bottlenecks with Instruction-Level Profiling

One of the biggest challenges in optimizing code is determining where in the application to put optimization effort for the greatest impact. NVIDIA has been improving profiling tools with every release of CUDA, adding more focused introspection and smarter guided analysis. CUDA 7.5 further improves the power of the NVIDIA Visual Profiler (and NSight Eclipse Edition) by enabling true instruction-level profiling on Maxwell GM200 and later GPUs. This lets you quickly identify the specific lines of source code causing performance bottlenecks in GPU code, making it easier to apply advanced performance optimizations.

Before CUDA 7.5, the NVIDIA Visual Profiler supported kernel-level profiling: for each kernel, the profiler could tell you the amount of time spent, the relative importance as a fraction of total run time, and key statistics and limiters. For example, Figure 1 shows a kernel-level analysis showing that the kernel in question is possibly limited by instruction latencies.

Figure 2: Before CUDA 7.5, the NVIDIA Visual Profiler supported only kernel-level profiling, showing performance and key statistics and limiters for each kernel invocation.
Figure 2: Before CUDA 7.5, the NVIDIA Visual Profiler supported only kernel-level profiling, showing performance and key statistics and limiters for each kernel invocation. Click for full resolution.

CUDA 6 added support for more detailed profiling, correlating lines of code with the number of instructions executed by those lines, as Figure 2 shows. But the highest instructions count lines do not necessarily take the longest. In the example, these lines from a reduction are not taking as long as the true hotspot, which has longer stalls due to memory dependencies.

Figure 3: CUDA 6 added support for detailed profiling, showing the correspondence between source lines and assembly code, and the number of instructions executed for each source line.
Figure 3: CUDA 6 added support for detailed profiling, showing the correspondence between source lines and assembly code, and the number of instructions executed for each source line. Click for full resolution.

Per-kernel statistics and instruction counts are very useful information, but getting to the root of performance problems in complex kernels could still be difficult. When profiling, you want to know exactly which lines are taking the most execution time. With CUDA 7.5, the profiler uses program counter sampling to find and show specific “hot spot” lines of code where the kernel is spending most of its time, as Figure 3 shows.

Figure 3: New in CUDA 7.5, instruction-level-profiling pinpoints specific lines of code that are hotspots.
Figure 3: New in CUDA 7.5, instruction-level-profiling pinpoints specific lines of code that are hotspots. Click for full resolution.

Not only does the profiler show hotspot lines, but it shows potential reasons for the hotspot, based on the state of warps executing the lines. In this case, the hotspot is due to synchronization and memory latency, and the assembly view shows that the kernel is stalling on local memory loads (LDL) and __syncthreads(). Knowing this, the kernel developer can optimize the kernel to keep data in registers. Figure 4 shows the results after code tuning, where the kernel time has improved by about 2.5x.

Figure 4: By using instruction-level profiling, the developer was able to optimize the kernel performance, achieving a 2.5X kernel speedup.
Figure 4: By using instruction-level profiling, the developer was able to optimize the kernel performance, achieving a 2.5X kernel speedup. Click for full resolution.

Experimental Feature: GPU Lambdas

CUDA 7 introduced support for C++11, the latest version of the C++ language standard. Lambda expressions are one of the most important new features in C++11. Lambda expressions provide concise syntax for defining anonymous functions (and closures) that can be defined in line with their use, can be passed as arguments, and can capture variables.

C++11 lambdas are handy when you have a simple computation that you want to use as an operator in a generic algorithm, like the thrust::count_if() algorithm that I used in a past blog post. The following code from that post uses Thrust to count the frequency of ‘x’, ‘y’, ‘z’, and ‘w’ characters in a text. But before CUDA 7.5, this could only be done with host-side lambdas, meaning this code couldn’t execute on the GPU.

#include <initializer_list>

void xyzw_frequency_thrust_host(int *count, char *text, int n)
{
  using namespace thrust;

  *count = count_if(host, text, text+n, [](char c) {
    for (const auto x : { 'x','y','z','w' }) 
      if (c == x) return true;
    return false;
  });
}

CUDA 7.5 introduces an experimental feature: GPU lambdas. GPU lambdas are anonymous device function objects that you can define in host code, by annotating them with a __device__ specifier. Here is xyzw_frequency function modified to run on the GPU. The code indicates the GPU lambda with the __device__ specifier before the parameter list.

#include <initializer_list>

void xyzw_frequency_thrust_device(int *count, char *text, int n)
{
  using namespace thrust;

  *count = count_if(device, text, text+n, [] __device__ (char c) {
    for (const auto x : { 'x','y','z','w' }) 
      if (c == x) return true;
    return false;
  });
}

Parallel For Programming

GPU lambdas enable a “parallel-for” style of programming that lets you write parallel computations in-line with the code that invokes them—just like you would with a for loop. The following SAXPY shows how for_each() lets you write parallel code for a GPU in a style very similar to a simple for loop. Using Thrust in this way ensures you get great performance on the GPU, as well as performance portability to CPUs: the same code can be compiled and run for multi-threaded execution on CPUs using Thrust’s OpenMP or TBB backends.

void saxpy(float *x, float *y, float a, int N) {
    using namespace thrust;
    auto r = counting_iterator(0);
    for_each(device, r, r+N, [=] __device__ (int i) {
        y[i] = a * x[i] + y[i];
    });
}

GPU lambdas are an experimental feature in CUDA 7.5. To use them, you need to enable the feature by passing the flag --expt-extended-lambda to nvcc on the compiler command line. As an experimental feature, GPU lambda functionality is subject to change in future releases, and there are some limitations to how they can be used. See the CUDA C++ Programming Guide for full details. I’ll write more about GPU lambdas in a future blog post.

Windows Remote Desktop

With CUDA 7.5, you can now run Windows CUDA applications remotely via Windows Remote Desktop. This means that even without a CUDA-capable GPU in your Windows laptop, you can still run GPU-accelerated applications remotely on a Windows server or desktop PC. CUDA applications can also now be run as services on Windows.

These Windows capabilities are supported on all NVIDIA GPU products.

LOP3

A new LOP3 instruction is added to PTX assembly, supporting a range of 3-operand logic operations, such as A & B & C, A & B & ~C, A & B | C, etc. This functionality, supported on Compute Capability 5.0 and higher GPUs, can save instructions when performing complex logic operations on multiple inputs. See section 8.7.7.6 of the PTX ISA specification included with the CUDA Toolkit version 7.5.

More improvements

  • 64-bit API for cuFFT
  • n-dimensional Euclidian norm floating-point math functions
  • Bayer CFA to RGB conversion functions in NPP
  • Faster double-precision square-roots (sqrt)
  • Programming examples for the cuSOLVER library
  • Nsight Eclipse Edition supports the POWER platform

Platform Support

The CUDA 7.5 release notes include a full list of supported platforms; here are some notable changes.

  • Added: Ubuntu 15.04, Windows 10, and (upcoming) OS X 10.11
  • Added: host compiler support for Clang 3.5 and 3.6 on Linux.
  • Removed: Ubuntu 12.04 LTS on (32-bit) x86, cuda-gdb native debugging on Mac OS X
  • Deprecated: legacy (environment variable-based) command-line profiler. Use the more capable nvprof command-line profiler instead.

Download the CUDA 7.5 Release Candidate Today!

CUDA Toolkit 7.5 is now available for download. If you are not already a member of the free NVIDIA developer program, signing up is easy.

To learn more about the features in CUDA 7.5, register for the webinar “CUDA Toolkit 7.5 Features Overview” and put it on your calendar for September 22.

Alibaba’s AliCloud Partners with NVIDIA for Artificial Intelligence

Alibaba’s AliCloud Partners with NVIDIA for Artificial Intelligence

Alibaba’s AliCloud Partners with NVIDIA for Artificial Intelligence

Alibaba Group’s cloud computing business, AliCloud, signed a new partnership with NVIDIA to collaborate on AliCloud HPC, the first GPU-accelerated cloud platform for high performance computing (HPC) in China.

AliCloud will work with NVIDIA to broadly promote its cloud-based GPU offerings to its customers — primarily fast-growing startups – for AI and HPC work.

“Innovative companies in deep learning are one of our most important user communities,” said Zhang Wensong, chief scientist of AliCloud. “Together with NVIDIA, AliCloud will use its strength in public cloud computing and experiences accumulated in HPC to offer emerging companies in deep learning greater support in the future.”

Shanker Trivedi, NVIDIA’s Global VP and Zhang Wensong, chief scientist of AliCloud at the Shanghai Summit ceremony.

Shanker Trivedi, NVIDIA’s Global VP and Zhang Wensong, chief scientist of AliCloud at the Shanghai Summit ceremony.

The two companies will also create a joint research lab, providing AliCloud users with services and support to help them take advantage of GPU-accelerated computing to create deep learning and other HPC applications.

Diagnosing Cancer with Deep Learning and GPUs

Diagnosing Cancer with Deep Learning and GPUs

Diagnosing Cancer with Deep Learning and GPUs

Using GPU-accelerated deep learning, researchers at The Chinese University of Hong Kong pushed the boundaries of cancer image analysis in a way that could one day save physicians and patients precious time.

The team used a TITAN X GPU to win the 2015 Gland Segmentation Challenge held at the Medical Image Computing and Computer conference, the world’s leading conference on medical imaging.

Traditionally, pathologists diagnose cancer by looking for abnormalities in tumor tissue and cells under a microscope, but it’s a time-consuming process that is open to error.

An overview of the team’s proposed framework

An overview of the team’s proposed framework

The research team trained their deep convolutional neural network on a set of images of known abnormalities. They then used this training for segmenting individual glands from tissues to make it easier to distinguish individual cells, determine their size, shape and location relative to other cells. By calculating these measurements, pathologists can determine the likelihood of malignancy.

“Training with GPUs was 100 times faster than with CPUs,” said Hao Chen, a third-year Ph.D. student and member of the team that developed the solution. “That speed is going to become even more important as we advance our work.”

Accelerating Microsoft Cortana and Skype Translator

Microsoft Cortana

Alexey Kamenev, Software Engineer at Microsoft Research talks about their open-source Computational Network Toolkit (CNTK) for deep learning, which describes neural networks as a series of computational steps via a directed graph. Kamenev also shares a bit about how they’re using GPUs, the CUDA Toolkit and GPU-accelerated libraries for the variety of Microsoft products that benefit from deep learning, such as speech recognition for Skype Translator and Cortana.

“Basically, right now the whole world of deep learning is using GPUs and Microsoft is not an exception,” said Alexey.

Watch Alexey’s presentation about CNTK in the NVIDIA GPU Technology Theater at SC15: Watch Now

Deep Learning Helps Robot Learn to Walk the Way Humans Do

Deep Learning Helps Robot Learn to Walk the Way Humans Do

Deep Learning Helps Robot Learn to Walk the Way Humans Do

University of California, Berkeley researchers are using deep learning and NVIDIA GPUs to create a new generation of robots that adapt to changing environments and new situations without a human reprogramming them.

Their robot “Darwin” learned how to keep his balance on an uneven surface – and GPUs were essential for learning of this complexity.

“If we did the training on CPU, it would have required a week. With a GPU, it ended up taking three hours,” said Igor Mordatch, who is now using GPUs hosted in the Amazon Web Services cloud.

Without being taught, the deep learning robot rises from the floor to a standing position.

This type of humanoid robots could one day tackle dangerous tasks like handling rescue efforts or cleaning up disaster areas.

NVIDIA GPUs Power First Self-Driving Shuttle

wepodThe six-passenger WEpod shuttle became the world’s first vehicle without a steering wheel to be given license plates. Without any special lanes, magnets or rails, the shuttle successfully navigates between two towns in the Netherlands.

Created by a team of researchers from Delft University of Technology, WEpod use NVIDIA GPUs to tackle the massive computing challenge of training sophisticated deep learning models for their dynamic system which is then able to deal with real-world situations of mixed traffic quickly, reliably and safely.

The day is fast approaching when you will be able to request an autonomous shuttle with a mobile app.



Popular Pages
  • CV Resume Ahmadrezar Razian-سید احمدرضا رضیان-رزومه Resume Full name Sayed Ahmadreza Razian Nationality Iran Age 36 (Sep 1982) Website ahmadrezarazian.ir  Email ...
  • CV Resume Ahmadrezar Razian-سید احمدرضا رضیان-رزومه معرفی نام و نام خانوادگی سید احمدرضا رضیان محل اقامت ایران - اصفهان سن 33 (متولد 1361) پست الکترونیکی ahmadrezarazian@gmail.com درجات علمی...
  • Nokte feature image Nokte – نکته نرم افزار کاربردی نکته نسخه 1.0.8 (رایگان) نرم افزار نکته جهت یادداشت برداری سریع در میزکار ویندوز با قابلیت ذخیره سازی خودکار با پنل ساده و کم ح...
  • Tianchi-The Purchase and Redemption Forecasts-Big Data-Featured Tianchi-The Purchase and Redemption Forecasts 2015 Special Prize – Tianchi Golden Competition (2015)  “The Purchase and Redemption Forecasts” in Big data (Alibaba Group) Among 4868 teams. Introd...
  • Shangul Mangul Habeangur,3d Game,AI,Ahmadreza razian,boz,boz boze ghandi,شنگول منگول حبه انگور,بازی آموزشی کودکان,آموزش شهروندی,آموزش ترافیک,آموزش بازیافت Shangul Mangul HabeAngur Shangul Mangul HabeAngur (City of Goats) is a game for child (4-8 years). they learn how be useful in the city and respect to people. Persian n...
  • Brick and Mortar Store Recommendation with Budget Constraints-Featured Tianchi-Brick and Mortar Store Recommendation with Budget Constraints Ranked 5th – Tianchi Competition (2016) “Brick and Mortar Store Recommendation with Budget Constraints” (IJCAI Socinf 2016-New York,USA)(Alibaba Group...
  • Drowning Detection by Image Processing-Featured Drowning Detection by Image Processing In this research, I design an algorithm for image processing of a swimmer in pool. This algorithm diagnostics the swimmer status. Every time graph sho...
  • 1st National Conference on Computer Games-Challenges and Opportunities 2016-Featured 1st National Conference on Computer Games-Challenges and Opportunities 2016 According to the public relations and information center of the presidency vice presidency for science and technology affairs, the University of Isfah...
  • Design an algorithm to improve edges and image enhancement for under-sea color images in Persian Gulf-Featured 3rd International Conference on The Persian Gulf Oceanography 2016 Persian Gulf and Hormuz strait is one of important world geographical areas because of large oil mines and oil transportation,so it has strategic and...
  • 2nd Symposium on psychological disorders in children and adolescents 2016 2nd Symposium on psychological disorders in children and adolescents 2016 2nd Symposium on psychological disorders in children and adolescents 2016 Faculty of Nursing and Midwifery – University of Isfahan – 2 Aug 2016 - Ass...
  • GPU vs CPU Featured CUDA Optimizing raytracing algorithm using CUDA Abstract Now, there are many codes to generate images using raytracing algorithm, which can run on CPU or GPU in single or multi-thread methods. In t...
  • MyCity-Featured My City This game is a city simulation in 3d view. Gamer must progress the city and create building for people. This game is simular the Simcity.
Popular posts
Interested
About me

My name is Sayed Ahmadreza Razian and I am a graduate of the master degree in Artificial intelligence .
Click here to CV Resume page

Related topics such as image processing, machine vision, virtual reality, machine learning, data mining, and monitoring systems are my research interests, and I intend to pursue a PhD in one of these fields.

جهت نمایش صفحه معرفی و رزومه کلیک کنید

My Scientific expertise
  • Image processing
  • Machine vision
  • Machine learning
  • Pattern recognition
  • Data mining - Big Data
  • CUDA Programming
  • Game and Virtual reality

Download Nokte as Free


Coming Soon....

Greatest hits

Anyone who has never made a mistake has never tried anything new.

Albert Einstein

It’s the possibility of having a dream come true that makes life interesting.

Paulo Coelho

You are what you believe yourself to be.

Paulo Coelho

One day you will wake up and there won’t be any more time to do the things you’ve always wanted. Do it now.

Paulo Coelho

Waiting hurts. Forgetting hurts. But not knowing which decision to take can sometimes be the most painful.

Paulo Coelho

Imagination is more important than knowledge.

Albert Einstein

The fear of death is the most unjustified of all fears, for there’s no risk of accident for someone who’s dead.

Albert Einstein

Gravitation is not responsible for people falling in love.

Albert Einstein


Site by images
Recent News Posts