Personal Profile

Deep Learning

Deep Learning for Computer Vision with MATLAB and cuDNN

Deep Learning for Computer Vision with MATLAB and cuDNN

Deep Learning for Computer Vision with MATLAB and cuDNN

Deep learning is becoming ubiquitous. With recent advancements in deep learning algorithms and GPU technology, we are able to solve problems once considered impossible in fields such as computer vision, natural language processing, and robotics.

Deep learning uses deep neural networks which have been around for a few decades; what’s changed in recent years is the availability of large labeled datasets and powerful GPUs. Neural networks are inherently parallel algorithms and GPUs with thousands of cores can take advantage of this parallelism to dramatically reduce computation time needed for training deep learning networks. In this post, I will discuss how you can use MATLAB to develop an object recognition system using deep convolutional neural networks and GPUs.

Pet detection and recognition system.

Pet detection and recognition system.

Why Deep Learning for Computer Vision?

Machine learning techniques use data (images, signals, text) to train a machine (or model) to perform a task such as image classification, object detection, or language translation. Classical machine learning techniques are still being used to solve challenging image classification problems. However, they don’t work well when applied directly to images, because they ignore the structure and compositional nature of images. Until recently, state-of-the-art techniques made use of feature extraction algorithms that extract interesting parts of an image as compact low-dimensional feature vectors. These were then used along with traditional machine learning algorithms.

Enter Deep learning. Deep convolutional neural networks (CNNs), a specific type of deep learning algorithm, address the gaps in traditional machine learning techniques, changing the way we solve these problems. CNNs not only perform classification, but they can also learn to extract features directly from raw images, eliminating the need for manual feature extraction. For computer vision applications you often need more than just image classification; you need state-of-the-art computer vision techniques for object detection, a bit of domain expertise, and the know-how to set up and use GPUs efficiently. Through the rest of this post, I will use an object recognition example to illustrate how easy it is to use MATLAB for deep learning, even if you don’t have extensive knowledge of computer vision or GPU programming.

Example: Object Detection and Recognition

The goal in this example is to detect a pet in a video and correctly label the pet as a cat or a dog. To run this example, you will need MATLAB®, Parallel Computing Toolbox™, Computer Vision System Toolbox™ and Statistics and Machine Learning Toolbox™. If you don’t have these tools, request a trial at For this problem I used an NVIDIA Tesla K40 GPU; you can run it on any MATLAB compatible CUDA-enabled NVIDIA GPU.

Our approach involves two steps:

  1. Object Detection: “Where is the pet in the video?”
  2. Object Recognition: “Now that I know where it is, is it a cat or a dog?”

Figure 1 shows what the final result looks like.

Using a Pretrained CNN Classifier

The first step is to train a classifier that can classify images of cats and dogs. I could either:

  1. Collect a massive amount of cropped, resized and labeled images of cats and dogs in a reasonable amount of time (good luck!), or
  2. Use a model that has already been trained on a variety of common objects and adapt it for my problem.
Figure 2: Pretrained ImageNet model classifying the image of the dog as 'beagle'.
Figure 2: Pretrained ImageNet model classifying the image of the dog as ‘beagle’.

For this example, I’m going to go with option (2) which is common in practice. To do that I’m going to first start with a pretrained CNN classifier that has been trained on the ImageNet dataset.

I will be using MatConvNet, a CNN package for MATLAB that uses the NVIDIA cuDNN library for accelerated training and prediction. [To learn more about cuDNN, see this Parallel Forall post.] Download and install instructions for MatConvNet are available on its home page. Once I’ve installed MatConvNet on my computer, I can use the following MATLAB code to download and make predictions using the pretrained CNN classifier. Note: I also use the cnnPredict() helper function, which I’ve made available on Github.

%% Download and predict using a pretrained ImageNet model

% Setup MatConvNet

% Download ImageNet model from MatConvNet pretrained networks repository
urlwrite('', 'imagenet-vgg-f.mat'); = load('imagenet-vgg-f.mat');

% Load and display an example image
img = imread('dog_example.png');

% Predict label using ImageNet trained vgg-f CNN model
label = cnnPredict(cnnModel,img);

The pretrained CNN classifier works great out of the box at object classification. The CNN model is able to tell me that there is a beagle in the example image (Figure 2). While this is certainly a great starting point, our problem is a little different. I want to be able to (1) put a box around where the pet is (object detection) and then (2) label it accurately as a dog or a cat (classification). Let’s start by building a dog vs cat classifier from the pretrained CNN model.

Training a Dog vs. Cat Classifier

The objective is simple. I want to solve a simple classification task: given an image I’d like to train a classifier that can accurately tell me if it’s an image of a dog or a cat. I can do that easily with this pretrained classifier and a few dog and cat images.

To get a small collection of labeled images for this project, I went around my office asking colleagues to send me pictures of their pets. I segregated the images and put them into separate ‘cat’ and ‘dog’ folders under a parent called ‘pet_images’. The advantage of using this folder structure is that the imageSet function can automatically manage image locations and labels. I loaded them all into MATLAB using the following code.

%% Load images from folder
% Use imageSet to load images stored in pet_images folder
imset = imageSet('pet_images','recursive');

% Preallocate arrays with fixed size for prediction
imageSize =;
trainingImages = zeros([imageSize sum([imset(:).Count])],'single');

% Load and resize images for prediction
for ii = 1:numel(imset)
  for jj = 1:imset(ii).Count
      trainingImages(:,:,:,jj) = imresize(single(read(imset(ii),jj)),imageSize(1:2));

% Get the image labels
trainingLabels = getImageLabels(imset);
summary(trainingLabels) % Display class label distribution

Feature Extraction using a CNN

What I’d like to do next is use this new dataset along with the pretrained ImageNet to extract features. As I mentioned earlier, CNNs can learn to extract generic features from images. These features can be used to train a new classifier to solve a different problem, like classifying cats and dogs in our problem.

CNN algorithms are compute-intensive and can be slow to run. Since they are inherently parallel algorithms, I can use GPUs to speed up the computation. Here is the code that performs the feature extraction using the pretrained model, and a comparison of multithreaded CPU (Intel Core i7-3770 CPU) and GPU (NVIDIA Tesla K40 GPU) implementations.

%% Extract features using pretrained CNN

% Depending on how much memory you have on your GPU you may use a larger
% batch size. I have 400 images, so I choose 200 as my batch size = 200;

% Make prediction on a CPU
[~, cnnFeatures, timeCPU] = cnnPredict(cnnModel,trainingImages,'UseGPU',false);
% Make prediction on a GPU
[~, cnnFeatures, timeGPU] = cnnPredict(cnnModel,trainingImages,'UseGPU',true);

% Compare the performance increase
title(sprintf('Approximate speedup: %2.00f x ',sum(timeCPU)/sum(timeGPU)))
ylabel('Time(sec)'), grid on, grid minor
Figure 3: Comparision of execution times for feature extraction using a CPU (left) and NVIDIA Tesla K40 GPU (right).
Figure 3: Comparision of execution times for feature extraction using a CPU (left) and NVIDIA Tesla K40 GPU (right).
Figure 4: The CPU and GPU time required to extract features from 1128 images.
Figure 4: The CPU and GPU time required to extract features from 1128 images.

As you can see the performance boost you get from using a GPU is significant, about 15x for this feature extraction problem.

The function cnnPredict is a wrapper around MatConvNet’s vl_simplenn predict function. The highlighted line of code in Figure 5 is the only modification you need to make to run the prediction on a GPU. Functions like gpuArray in the Parallel Computing Toolbox make it easy to prototype your algorithms using a CPU and quickly switch to GPUs with minimal code changes.

Figure 5: The `gpuArray` and `gather` functions allow you to transfer data from the MATLAB workspace to the GPU and back.
Figure 5: The `gpuArray` and `gather` functions allow you to transfer data from the MATLAB workspace to the GPU and back.

Train a Classifier Using CNN Features

With the features I extracted in the previous step, I’m now ready to train a “shallow” classifier. To train and compare multiple models interactively, I can use the Classification Learner app in the Statistics and Machine Learning Toolbox. Note: for an introduction to machine learning and classification workflows in MATLAB, check out this Machine Learning Made Easy webinar.

Next, I will directly train an SVM classifier using the extracted features by calling the fitcsvm function using cnnFeatures as the input or predictors and trainingLabels as the output or response values. I will also cross-validate the classifier to test its validation accuracy. The validation accuracy is an unbiased estimate of how the classifier would perform in practice on unseen data.

%% Train a classifier using extracted features

% Here I train a linear support vector machine (SVM) classifier.
svmmdl = fitcsvm(cnnFeatures,trainingLabels);

% Perform crossvalidation and check accuracy
cvmdl = crossval(svmmdl,'KFold',10);
fprintf('kFold CV accuracy: %2.2f\n',1-cvmdl.kfoldLoss)

svmmdl is my classifier that I can now use to classify an image as a cat or a dog.

Object Detection

Most images and videos frames have a lot going on in them. In addition to a dog, there may be a tree or a raccoon chasing the dog. Even with a great image classifier, like the one I built in the previous step, it will only work well if I can locate the object of interest in an image (dog or cat), crop the object and then feed it to a classifier. The step of locating the object is called object detection.

For object detection, I will use a technique called Optical Flow that uses the motion of pixels in a video from frame to frame. Figure 6 shows a single frame of video with the motion vectors overlaid.

Figure 6: A single frame of video with motion vectors overlaid (left) and magnitude of the motion vectors (right).
Figure 6: A single frame of video with motion vectors overlaid (left) and magnitude of the motion vectors (right).

The next step in the detection process is to separate out pixels that are moving, and then use the Image Region Analyzer app to analyze the connected components in the binary image to filter out the noisy pixels caused by the camera motion. The output of the app is a MATLAB function (I’m going to call it findPet) that can locate where the pet is in the field of view.

Tying the Workflow Together

I now have all the pieces I need to build a pet detection and recognition system.

To quickly recap, I can:

  • Detect the location of the pet in new images;
  • Crop the pet from the image and extract features using a pretrained CNN;
  • Classify the features using an SVM classifier.

Pet Detection and Recognition

Tying all these pieces together, the following code shows my complete MATLAB pet detection and recognition system.

%% Tying the workflow together
vr = VideoReader(fullfile('PetVideos',''));
vw = VideoWriter('test.avi','Motion JPEG AVI');
opticFlow = opticalFlowFarneback;

while hasFrame(vr)
% Count frames
frameNumber = frameNumber + 1;

% Step 1. Read Frame
videoFrame = readFrame(vr);

% Step 2. Detect ROI
vFrame = imresize(videoFrame,0.25); % Get video frame
frameGray = rgb2gray(vFrame); % Convert to gray for detection
bboxes = findPet(frameGray,opticFlow); % Find bounding boxes
if ~isempty(bboxes)
img = zeros([imageSize size(bboxes,1)]);
for ii = 1:size(bboxes,1)
img(:,:,:,ii) = imresize(imcrop(videoFrame,bboxes(ii,:)),imageSize(1:2));

% Step 3. Recognize object
% (a) Extract features using a CNN
[~, scores] = cnnPredict(cnnModel,img,'UseGPU',true,'display',false);

% (b) Predict using the trained SVM Classifier
label = predict(svmmdl,scores);

% Step 4. Annotate object
videoFrame = insertObjectAnnotation(videoFrame,'Rectangle',bboxes,cellstr(label),'FontSize',40);

% Step 5. Write video to file

fprintf('Frames processed: %d of %d\n',frameNumber,ceil(vr.FrameRate*vr.Duration));


Solutions to real-world computer vision problems often require tradeoffs depending on your application: performance, accuracy, and simplicity of the solution. Advances in techniques such as deep learning have significantly raised the bar in terms of the accuracy of tasks like visual recognition, but the performance costs were too significant for mainstream adoption. GPU technology has closed this gap by accelerating training and prediction speeds by orders of magnitude.

MATLAB makes computer vision with deep learning much more accessible. The combination of an easy-to-use application and programming environment, a complete library of standard computer vision and machine learning algorithms, and tightly integrated support for CUDA-enabled GPUs makes MATLAB an ideal platform for designing and prototyping computer vision solutions.

5 Startups Playing Big, and Betting on the Future, with Deep Learning

Real Life Analytics: Accurate, Automatic Ads

To power targeted in-store ads, the U.K.’s Real Life Analytics offers retailers a webcam and a dongle to attach to a digital display. Seems simple. But the deep learning software running inside that dongle does astonishing things.

Approach the display’s webcam, and a deep learning neural network figures out your age and gender. In milliseconds, it flips on an ad targeting your demographic. Meanwhile, the deep learning network — designed with DIGITS deep learning training software using the cuDNN-accelerated Caffe framework — analyzes your real-time engagement. Running on our Tegra chip, of course.


ZZ Photo
ZZ Photo’s “DeepPet” algorithm is up to five times more accurate than traditional object recognition.

ZZ Photo: Putting Pets on the Pedestal

ZZ Photo, a startup based in Ukraine, can help you sort out the thousands of images you’ve stashed in your PCs. Using CUDA-enabled GPUs to speed up computations in their neural networks, ZZ Photo can detect images on PCs. It then sorts and arranges the photos, tagging them by face, scene or pet.

That’s right. ZZ Photo’s “DeepPet” algorithm can tell the difference between your labradoodle and chiweeni. It’s up to five times more accurate than traditional object recognition algorithms in identifying cats and dogs.

MicroBlink: Math-Solving App Heads to No. 1

MicroBlink PhotoMath app
MicroBlink’s PhotoMatch app reads and solves mathematical problems in real time.

With students recently returning to school, MicroBlink’s PhotoMath app headed to the top of the class as the No. 1 iPhone free U.S. download in early September. The app reads and solves mathematical problems in real time. Just take a picture of the problem with your smart phone or tablet.

MicroBlink, founded in Zagreb, Croatia, uses NVIDIA GPUs to train PhotoMath’s deep learning algorithms. The app can now handle fractions, inequalities, quadratic equations and more. It makes math simple by showing users how to solve math problems step by step. And parents rave about how the tool checks their kids’ homework.

HyperVerge: Innovative Image Identification

Forget scrolling past a series of selfies to find a photo of your driver’s license. HyperVerge, a startup out of India, has developed Silver. The mobile image recognition app uses GPUs for data processing and training their application engines.

The app sorts photos on mobile devices. It categorizes photos as faces, screenshots, and memes. It even identifies documents — a category that includes handwritten notes, ID scans and checks. HyperVerge has also developed tools to delete poor quality and duplicate photos.

ViSenze: Search Without Keywords

ViSenze’s image recognition technology powers visual search with uncanny accuracy.

If a picture is worth a thousand words, why are we doing so much typing into search engines? ViSenze, a Singapore-based startup, lets you search e-commerce platforms visually. Drop an image into its deep learning-powered platform and it quickly pulls up scores of similar images, without relying on keywords or manual image tagging.

In fact, its image recognition technology automatically does the tagging by attributes such as shape, color and pattern. So, for example, if you’ve found a dress but want to see similar sleeveless versions from your favorite e-tailers, or if you like a handbag but want to see variations in leather or with a tapered shape, ViSenze zeroes in with amazing accuracy and speed.

Bringing Massive Computing Power to the Masses

These are just a few of the startups using our GPUs to embrace the deep learning revolution. It’s no surprise. GPU acceleration is ideal for the demands of deep learning algorithms. These algorithms power applications in fields ranging from medical imaging analysis to self-driving cars.

Training computers on these algorithms requires they teach themselves. To do that, they process enormous amounts of data. Our DIGITS deep learning software and cuDNN programming library speed things along. For off-the-shelf capability, there’s the DIGITS DevBox. Combining four NVIDIA GeForce GTX TITAN X GPUs, DIGITS software and deep learning tools, it’s the world’s fastest deskside deep learning machine.

With tools like these, a startup can be as equipped to tackle deep learning problems as tech leaders with huge server rooms.

There’s no better place for GPU-using startups to highlight their groundbreaking work than the annual Emerging Companies Summit, where we’ll award $100,000 to the most promising venture. The daylong event, part of our annual GPU Technology Conference, will take place on April 6, 2016.

Popular Pages
  • CV Resume Ahmadrezar Razian-سید احمدرضا رضیان-رزومه Resume Full name Sayed Ahmadreza Razian Nationality Iran Age 36 (Sep 1982) Website  Email ...
  • CV Resume Ahmadrezar Razian-سید احمدرضا رضیان-رزومه معرفی نام و نام خانوادگی سید احمدرضا رضیان محل اقامت ایران - اصفهان سن 33 (متولد 1361) پست الکترونیکی درجات علمی...
  • Nokte feature image Nokte – نکته نرم افزار کاربردی نکته نسخه 1.0.8 (رایگان) نرم افزار نکته جهت یادداشت برداری سریع در میزکار ویندوز با قابلیت ذخیره سازی خودکار با پنل ساده و کم ح...
  • Tianchi-The Purchase and Redemption Forecasts-Big Data-Featured Tianchi-The Purchase and Redemption Forecasts 2015 Special Prize – Tianchi Golden Competition (2015)  “The Purchase and Redemption Forecasts” in Big data (Alibaba Group) Among 4868 teams. Introd...
  • Brick and Mortar Store Recommendation with Budget Constraints-Featured Tianchi-Brick and Mortar Store Recommendation with Budget Constraints Ranked 5th – Tianchi Competition (2016) “Brick and Mortar Store Recommendation with Budget Constraints” (IJCAI Socinf 2016-New York,USA)(Alibaba Group...
  • Drowning Detection by Image Processing-Featured Drowning Detection by Image Processing In this research, I design an algorithm for image processing of a swimmer in pool. This algorithm diagnostics the swimmer status. Every time graph sho...
  • Shangul Mangul Habeangur,3d Game,AI,Ahmadreza razian,boz,boz boze ghandi,شنگول منگول حبه انگور,بازی آموزشی کودکان,آموزش شهروندی,آموزش ترافیک,آموزش بازیافت Shangul Mangul HabeAngur Shangul Mangul HabeAngur (City of Goats) is a game for child (4-8 years). they learn how be useful in the city and respect to people. Persian n...
  • 1st National Conference on Computer Games-Challenges and Opportunities 2016-Featured 1st National Conference on Computer Games-Challenges and Opportunities 2016 According to the public relations and information center of the presidency vice presidency for science and technology affairs, the University of Isfah...
  • Design an algorithm to improve edges and image enhancement for under-sea color images in Persian Gulf-Featured 3rd International Conference on The Persian Gulf Oceanography 2016 Persian Gulf and Hormuz strait is one of important world geographical areas because of large oil mines and oil transportation,so it has strategic and...
  • 2nd Symposium on psychological disorders in children and adolescents 2016 2nd Symposium on psychological disorders in children and adolescents 2016 2nd Symposium on psychological disorders in children and adolescents 2016 Faculty of Nursing and Midwifery – University of Isfahan – 2 Aug 2016 - Ass...
  • MyCity-Featured My City This game is a city simulation in 3d view. Gamer must progress the city and create building for people. This game is simular the Simcity.
  • GPU vs CPU Featured CUDA Optimizing raytracing algorithm using CUDA Abstract Now, there are many codes to generate images using raytracing algorithm, which can run on CPU or GPU in single or multi-thread methods. In t...
Popular posts
About me

My name is Sayed Ahmadreza Razian and I am a graduate of the master degree in Artificial intelligence .
Click here to CV Resume page

Related topics such as image processing, machine vision, virtual reality, machine learning, data mining, and monitoring systems are my research interests, and I intend to pursue a PhD in one of these fields.

جهت نمایش صفحه معرفی و رزومه کلیک کنید

My Scientific expertise
  • Image processing
  • Machine vision
  • Machine learning
  • Pattern recognition
  • Data mining - Big Data
  • CUDA Programming
  • Game and Virtual reality

Download Nokte as Free

Coming Soon....

Greatest hits

It’s the possibility of having a dream come true that makes life interesting.

Paulo Coelho

The fear of death is the most unjustified of all fears, for there’s no risk of accident for someone who’s dead.

Albert Einstein

Anyone who has never made a mistake has never tried anything new.

Albert Einstein

One day you will wake up and there won’t be any more time to do the things you’ve always wanted. Do it now.

Paulo Coelho

You are what you believe yourself to be.

Paulo Coelho

Gravitation is not responsible for people falling in love.

Albert Einstein

Imagination is more important than knowledge.

Albert Einstein

Waiting hurts. Forgetting hurts. But not knowing which decision to take can sometimes be the most painful.

Paulo Coelho

Site by images
Recent News Posts