Personal Profile

AI

Deep Learning for Computer Vision with MATLAB and cuDNN

Deep Learning for Computer Vision with MATLAB and cuDNN

Deep Learning for Computer Vision with MATLAB and cuDNN

Deep learning is becoming ubiquitous. With recent advancements in deep learning algorithms and GPU technology, we are able to solve problems once considered impossible in fields such as computer vision, natural language processing, and robotics.

Deep learning uses deep neural networks which have been around for a few decades; what’s changed in recent years is the availability of large labeled datasets and powerful GPUs. Neural networks are inherently parallel algorithms and GPUs with thousands of cores can take advantage of this parallelism to dramatically reduce computation time needed for training deep learning networks. In this post, I will discuss how you can use MATLAB to develop an object recognition system using deep convolutional neural networks and GPUs.

Pet detection and recognition system.

Pet detection and recognition system.

Why Deep Learning for Computer Vision?

Machine learning techniques use data (images, signals, text) to train a machine (or model) to perform a task such as image classification, object detection, or language translation. Classical machine learning techniques are still being used to solve challenging image classification problems. However, they don’t work well when applied directly to images, because they ignore the structure and compositional nature of images. Until recently, state-of-the-art techniques made use of feature extraction algorithms that extract interesting parts of an image as compact low-dimensional feature vectors. These were then used along with traditional machine learning algorithms.

Enter Deep learning. Deep convolutional neural networks (CNNs), a specific type of deep learning algorithm, address the gaps in traditional machine learning techniques, changing the way we solve these problems. CNNs not only perform classification, but they can also learn to extract features directly from raw images, eliminating the need for manual feature extraction. For computer vision applications you often need more than just image classification; you need state-of-the-art computer vision techniques for object detection, a bit of domain expertise, and the know-how to set up and use GPUs efficiently. Through the rest of this post, I will use an object recognition example to illustrate how easy it is to use MATLAB for deep learning, even if you don’t have extensive knowledge of computer vision or GPU programming.

Example: Object Detection and Recognition

The goal in this example is to detect a pet in a video and correctly label the pet as a cat or a dog. To run this example, you will need MATLAB®, Parallel Computing Toolbox™, Computer Vision System Toolbox™ and Statistics and Machine Learning Toolbox™. If you don’t have these tools, request a trial at www.mathworks.com/trial. For this problem I used an NVIDIA Tesla K40 GPU; you can run it on any MATLAB compatible CUDA-enabled NVIDIA GPU.

Our approach involves two steps:

  1. Object Detection: “Where is the pet in the video?”
  2. Object Recognition: “Now that I know where it is, is it a cat or a dog?”

Figure 1 shows what the final result looks like.

Using a Pretrained CNN Classifier

The first step is to train a classifier that can classify images of cats and dogs. I could either:

  1. Collect a massive amount of cropped, resized and labeled images of cats and dogs in a reasonable amount of time (good luck!), or
  2. Use a model that has already been trained on a variety of common objects and adapt it for my problem.
Figure 2: Pretrained ImageNet model classifying the image of the dog as 'beagle'.
Figure 2: Pretrained ImageNet model classifying the image of the dog as ‘beagle’.

For this example, I’m going to go with option (2) which is common in practice. To do that I’m going to first start with a pretrained CNN classifier that has been trained on the ImageNet dataset.

I will be using MatConvNet, a CNN package for MATLAB that uses the NVIDIA cuDNN library for accelerated training and prediction. [To learn more about cuDNN, see this Parallel Forall post.] Download and install instructions for MatConvNet are available on its home page. Once I’ve installed MatConvNet on my computer, I can use the following MATLAB code to download and make predictions using the pretrained CNN classifier. Note: I also use the cnnPredict() helper function, which I’ve made available on Github.

%% Download and predict using a pretrained ImageNet model

% Setup MatConvNet
run(fullfile('matconvnet-1.0-beta15','matlab','vl_setupnn.m'));

% Download ImageNet model from MatConvNet pretrained networks repository
urlwrite('http://www.vlfeat.org/matconvnet/models/imagenet-vgg-f.mat', 'imagenet-vgg-f.mat');
cnnModel.net = load('imagenet-vgg-f.mat');

% Load and display an example image
imshow('dog_example.png');
img = imread('dog_example.png');

% Predict label using ImageNet trained vgg-f CNN model
label = cnnPredict(cnnModel,img);
title(label,'FontSize',20)

The pretrained CNN classifier works great out of the box at object classification. The CNN model is able to tell me that there is a beagle in the example image (Figure 2). While this is certainly a great starting point, our problem is a little different. I want to be able to (1) put a box around where the pet is (object detection) and then (2) label it accurately as a dog or a cat (classification). Let’s start by building a dog vs cat classifier from the pretrained CNN model.

Training a Dog vs. Cat Classifier

The objective is simple. I want to solve a simple classification task: given an image I’d like to train a classifier that can accurately tell me if it’s an image of a dog or a cat. I can do that easily with this pretrained classifier and a few dog and cat images.

To get a small collection of labeled images for this project, I went around my office asking colleagues to send me pictures of their pets. I segregated the images and put them into separate ‘cat’ and ‘dog’ folders under a parent called ‘pet_images’. The advantage of using this folder structure is that the imageSet function can automatically manage image locations and labels. I loaded them all into MATLAB using the following code.

%% Load images from folder
% Use imageSet to load images stored in pet_images folder
imset = imageSet('pet_images','recursive');

% Preallocate arrays with fixed size for prediction
imageSize = cnnModel.net.normalization.imageSize;
trainingImages = zeros([imageSize sum([imset(:).Count])],'single');

% Load and resize images for prediction
for ii = 1:numel(imset)
  for jj = 1:imset(ii).Count
      trainingImages(:,:,:,jj) = imresize(single(read(imset(ii),jj)),imageSize(1:2));
  end
end

% Get the image labels
trainingLabels = getImageLabels(imset);
summary(trainingLabels) % Display class label distribution

Feature Extraction using a CNN

What I’d like to do next is use this new dataset along with the pretrained ImageNet to extract features. As I mentioned earlier, CNNs can learn to extract generic features from images. These features can be used to train a new classifier to solve a different problem, like classifying cats and dogs in our problem.

CNN algorithms are compute-intensive and can be slow to run. Since they are inherently parallel algorithms, I can use GPUs to speed up the computation. Here is the code that performs the feature extraction using the pretrained model, and a comparison of multithreaded CPU (Intel Core i7-3770 CPU) and GPU (NVIDIA Tesla K40 GPU) implementations.

%% Extract features using pretrained CNN

% Depending on how much memory you have on your GPU you may use a larger
% batch size. I have 400 images, so I choose 200 as my batch size
cnnModel.info.opts.batchSize = 200;

% Make prediction on a CPU
[~, cnnFeatures, timeCPU] = cnnPredict(cnnModel,trainingImages,'UseGPU',false);
% Make prediction on a GPU
[~, cnnFeatures, timeGPU] = cnnPredict(cnnModel,trainingImages,'UseGPU',true);

% Compare the performance increase
bar([sum(timeCPU),sum(timeGPU)],0.5)
title(sprintf('Approximate speedup: %2.00f x ',sum(timeCPU)/sum(timeGPU)))
set(gca,'XTickLabel',{'CPU','GPU'},'FontSize',18)
ylabel('Time(sec)'), grid on, grid minor
Figure 3: Comparision of execution times for feature extraction using a CPU (left) and NVIDIA Tesla K40 GPU (right).
Figure 3: Comparision of execution times for feature extraction using a CPU (left) and NVIDIA Tesla K40 GPU (right).
Figure 4: The CPU and GPU time required to extract features from 1128 images.
Figure 4: The CPU and GPU time required to extract features from 1128 images.

As you can see the performance boost you get from using a GPU is significant, about 15x for this feature extraction problem.

The function cnnPredict is a wrapper around MatConvNet’s vl_simplenn predict function. The highlighted line of code in Figure 5 is the only modification you need to make to run the prediction on a GPU. Functions like gpuArray in the Parallel Computing Toolbox make it easy to prototype your algorithms using a CPU and quickly switch to GPUs with minimal code changes.

Figure 5: The `gpuArray` and `gather` functions allow you to transfer data from the MATLAB workspace to the GPU and back.
Figure 5: The `gpuArray` and `gather` functions allow you to transfer data from the MATLAB workspace to the GPU and back.

Train a Classifier Using CNN Features

With the features I extracted in the previous step, I’m now ready to train a “shallow” classifier. To train and compare multiple models interactively, I can use the Classification Learner app in the Statistics and Machine Learning Toolbox. Note: for an introduction to machine learning and classification workflows in MATLAB, check out this Machine Learning Made Easy webinar.

Next, I will directly train an SVM classifier using the extracted features by calling the fitcsvm function using cnnFeatures as the input or predictors and trainingLabels as the output or response values. I will also cross-validate the classifier to test its validation accuracy. The validation accuracy is an unbiased estimate of how the classifier would perform in practice on unseen data.

%% Train a classifier using extracted features

% Here I train a linear support vector machine (SVM) classifier.
svmmdl = fitcsvm(cnnFeatures,trainingLabels);

% Perform crossvalidation and check accuracy
cvmdl = crossval(svmmdl,'KFold',10);
fprintf('kFold CV accuracy: %2.2f\n',1-cvmdl.kfoldLoss)

svmmdl is my classifier that I can now use to classify an image as a cat or a dog.

Object Detection

Most images and videos frames have a lot going on in them. In addition to a dog, there may be a tree or a raccoon chasing the dog. Even with a great image classifier, like the one I built in the previous step, it will only work well if I can locate the object of interest in an image (dog or cat), crop the object and then feed it to a classifier. The step of locating the object is called object detection.

For object detection, I will use a technique called Optical Flow that uses the motion of pixels in a video from frame to frame. Figure 6 shows a single frame of video with the motion vectors overlaid.

Figure 6: A single frame of video with motion vectors overlaid (left) and magnitude of the motion vectors (right).
Figure 6: A single frame of video with motion vectors overlaid (left) and magnitude of the motion vectors (right).

The next step in the detection process is to separate out pixels that are moving, and then use the Image Region Analyzer app to analyze the connected components in the binary image to filter out the noisy pixels caused by the camera motion. The output of the app is a MATLAB function (I’m going to call it findPet) that can locate where the pet is in the field of view.

Tying the Workflow Together

I now have all the pieces I need to build a pet detection and recognition system.

To quickly recap, I can:

  • Detect the location of the pet in new images;
  • Crop the pet from the image and extract features using a pretrained CNN;
  • Classify the features using an SVM classifier.

Pet Detection and Recognition

Tying all these pieces together, the following code shows my complete MATLAB pet detection and recognition system.

%% Tying the workflow together
vr = VideoReader(fullfile('PetVideos','videoExample.mov'));
vw = VideoWriter('test.avi','Motion JPEG AVI');
opticFlow = opticalFlowFarneback;
open(vw);

while hasFrame(vr)
% Count frames
frameNumber = frameNumber + 1;

% Step 1. Read Frame
videoFrame = readFrame(vr);

% Step 2. Detect ROI
vFrame = imresize(videoFrame,0.25); % Get video frame
frameGray = rgb2gray(vFrame); % Convert to gray for detection
bboxes = findPet(frameGray,opticFlow); % Find bounding boxes
if ~isempty(bboxes)
img = zeros([imageSize size(bboxes,1)]);
for ii = 1:size(bboxes,1)
img(:,:,:,ii) = imresize(imcrop(videoFrame,bboxes(ii,:)),imageSize(1:2));
end

% Step 3. Recognize object
% (a) Extract features using a CNN
[~, scores] = cnnPredict(cnnModel,img,'UseGPU',true,'display',false);

% (b) Predict using the trained SVM Classifier
label = predict(svmmdl,scores);

% Step 4. Annotate object
videoFrame = insertObjectAnnotation(videoFrame,'Rectangle',bboxes,cellstr(label),'FontSize',40);
end

% Step 5. Write video to file
writeVideo(vw,videoFrame);

fprintf('Frames processed: %d of %d\n',frameNumber,ceil(vr.FrameRate*vr.Duration));
end
close(vw);

Conclusion

Solutions to real-world computer vision problems often require tradeoffs depending on your application: performance, accuracy, and simplicity of the solution. Advances in techniques such as deep learning have significantly raised the bar in terms of the accuracy of tasks like visual recognition, but the performance costs were too significant for mainstream adoption. GPU technology has closed this gap by accelerating training and prediction speeds by orders of magnitude.

MATLAB makes computer vision with deep learning much more accessible. The combination of an easy-to-use application and programming environment, a complete library of standard computer vision and machine learning algorithms, and tightly integrated support for CUDA-enabled GPUs makes MATLAB an ideal platform for designing and prototyping computer vision solutions.

AI invasion will allow workers to empathise

Jobs for the bots: robots will take on mundane work, enabling humans to focus on interpersonal tasks.

Jobs for the bots: robots will take on mundane work, enabling humans to focus on interpersonal tasks.

There’s a clue to the future of work in the relief you feel when your phone call to a big corporation is answered by, of all things, a human.

It makes sense. People are replete with empathy and compassion, like to solve problems and enjoy communicating through stories. And these profoundly human traits are the areas where artificial intelligence (AI) trails humans. Because they are our strengths they point to the future of the office and to our workplace relationships with robots and AI.

In the future, people will spend more time dealing with other people rather than investing their energy in spreadsheets, machinery and computer screens. Rote decision making, repetitive tasks and data management will be owned by our silicon-chip workmates.

You can already glimpse this labour allocation in action – there are accounting apps that extract the information from photographs of receipts and automatically compile end-of-month reports. Meanwhile, the accelerating capability of AI to understand spoken human language will cause immense disruption. “Will we ultimately be able to replace most telephone operators? Yes,” says Paul Murphy, chief executive of voice technology company Clarify.io. “In fact I’d say speech recognition and understanding has the potential to eliminate any job where the role of the human is that of intermediary.”

Meanwhile, we will be employed to tell stories, empathise, see the big picture, solve complex problems and adapt fast to changing situations.

Rather than displacing humans, AI will augment human strengths. This will lead to the invention of new roles, which fall into three categories.

Thinking differently

AI and robots excel at following pre-set rules. People will thrive when they learn to harness machines for data insights, which they can use for problem-solving and innovation. An architect, for example, will be able to work much faster than today because of the range of technologies available, such as augmented reality visualisation and virtual reality headsets. But providing a solution that fits within the constraints of space, planning restrictions, budget and aesthetic style would be nigh-on impossible to automate.

Thinking bigger

Computers can’t see the context, connection and patterns that humans can, despite crunching vast amounts of data at speed. For example, an automated ad-buying program might be brilliant at buying online advertising space for the right audience at the right price, but it might fail to realise that the day after an air accident would be the wrong day to advertise certain products or certain taglines. The future will involve people who oversee machine decision-making.

Social interaction

The analytical powers of robots enable them to suggest decisions in healthcare, financial investment and other areas based on huge quantities of data. IBM’s Watson computer, for example, can monitor a vast array of data inputs to identify possible medical problems and propose courses of treatment. But the communication of advice and the contextualised understanding of the best course of action for a specific patient is best handled by humans. As with medicine, so with finance: the role of the specialist human will be to mediate between the wonders of automation and the needs and desires of the patient or customer.

Artificial intelligence to amplify digital transformation: Vishal Sikka

The digital transformation can best be achieved by adopting automation and artificial intelligence (AI) and the growing symbiosis between Infosys and Oracle is going to help achieve this goal faster than ever, said Infosys CEO Vishal Sikka.

Addressing a gathering of top innovators at Oracle’s OpenWorld 2015 conference here on October 27, Sikka emphasised on how AI can be a great amplifier to simplify and enable existing landscapes as well as build intelligent systems that help us solve our most complex emerging problems.

“The world is looking at providing services in a better way. I observe three major shifts – focus on experience among consumers, emergence of AI and the ultimate cloud phenomenon,” he added.

Sikka also announced that the Infosys Finacle’s core banking solution – running on new and secure Oracle SuperCluster M7 microprocessor – has set a new record for the number of banking transactions processed.

“The solution supported more than two billion bank accounts with near linear scalability. The results showcase Finacle’s capabilities to manage extraordinarily large transaction volumes to help banks cater to their growing business demands at reduced costs,” he said.

The tests were conducted across a mix of delivery channel transactions that could originate from branches, ATMs, online and mobile channels.

According to Ganesh Ramamurthy, senior vice president (product development) at Oracle, the SuperCluster M7 microprocessor and SPARC T7 and M7 systems offer breakthrough technology for memory intrusion protection and encryption.

“Infosys’ latest Finacle results on SuperCluster M7 demonstrate the superior performance, efficiency and security capabilities of SPARC M7 with Oracle Database 12c and WebLogic Server 12c for critical banking functions,” he explained.

According to Sikka, their future strategy will not be completed without the help from Oracle and its diverse portfolio.

“We together are creating a sort of symbiosis. Infosys is emerging as a great change agent and we are collaborating with Oracle in innovations in java,” he said.

He also spoke about AiKiDo – a new offering that comprise three enhanced service offerings in knowledge-based IT (KBIT), platforms and design thinking.

Infosys has deployed a number of systems that replicate human decision-making in areas such as financial service regulation and ticketing of IT issues, thus enabling productivity improvements by up to 40 percent and saving customers millions of dollars annually.

In addition to this, Infosys is working with global clients to use artificial intelligence to address business challenges.

Infosys is utilising artificial intelligence techniques to solve complex engineering problems in design, testing, and certification of complex engineering products.

“I am optimistic that artificial intelligence techniques will help us solve next-generation problems, and that humans will play the most important part in this process,” Dr Sikka added.

Infosys has delivered nearly 30 projects for clients using artificial intelligence. Many of these first projects have been in manufacturing and financial services.

Infosys is currently developing solutions based on artificial intelligence to solve complex problems in the engineering space.

It’s happening: ‘Pepper’ robot gains emotional intelligence

Last week we weighed in on the rise of robotica aka sexbots, noting that improvements in emotion and speech recognition would likely spur development in this field. Now a new offering from Softbank promises to be just such a game changer, equipping robots with the technology necessary to interact with humans in a social settings.  The robot is called Pepper, and it is being launched at an exorbitant cost by its makers Softbank and Aldabaran.

Pepper is being billed as the first “emotionally intelligent” robot. While it can’t wash your floors or take out the trash, it may just decompress your next domestic row with a witty remark or well-timed turn of phrase. It accomplishes such feats through the use of novel emotion recognition techniques. Emotion recognition may seem like a strange, and perhaps unnecessary, skill for a robot. However, it will be a crucial one if machines are ever able to make the leap from the factory worker to domestic caregiver.

Even in humans, emotion recognition can be devilishly difficult to achieve. Those afflicted with autism represent a portion of humanity that has been referred to as “emotion-blind” due to the difficulty they have in reading expressions.  In many ways, robots have hitherto occupied similar territory. While Softbank hasn’t revealed the exact proprietary algorithms used to achieve emotion recognition, the smart money is on some form of deep neural network.

To date, most attempts at emotion recognition have employed a branch of artificial intelligence called machine learning, in which training data, most often labeled, is fed into an algorithm that uses statistical techniques to “recognize” characteristics that set the examples apart. It’s likely that Pepper uses a variation on this, employing algorithms trained on thousands of labeled photographs or videos to learn what combination of pixels represent a smiling face versus a startled or angry one.

Pepper is also connected to the cloud, feeding data from its sensors to server clusters, where the lion’s share of processing will take place.  This should allow their emotion recognition algorithms to improve over time, as repeated use provides fresh training examples. A similar method enabled Google’s speech recognition system to overtake so many others in the field. Every time someone uses the system and corrects a misapprehended word, they provide a new training example for the AI to improve its performance. In the case of a massive search system like Google’s, training examples add up very quickly.

This may explain why Softbank is willing to go ahead with the launch of Pepper despite the financials indicating it will be a loss-making venture. If rather than optimizing profit, they are using Pepper as a means towards perfecting emotion recognition, than this may be part of a larger play to gain superior intellectual property. If that’s the case, then it probably won’t be long before we see other tech giants wading into the arena, offering new and competitive variations on Pepper.

While it may seem strange to think of our emotions as being a lucrative commodity, commanding millions of tech dollars and vied for by sleek-looking robots, such a reality could well be in store.

Microsoft Bing Predicts and the future of gambling

Like an 800 pound gorilla flailing wildly in a Victorian tea house, artificial intelligence has been disrupting one industry after another of late. Now the latest group to feel the burn is the gambling consortiums in Las Vegas. Microsoft’s AI engine, Bing Predicts, made headlines recently by beating the Las Vegas odds in predicting winners for week one of the NFL season. Its previous successes are even more breathtaking, correctly predicting the outcomes of all 15 games in the 2014 Brazil World Cup knockout round and almost all the results of the 2015 Academy Awards, including the winners of best picture, best director, best actor, and best actress. Which is all to say Microsoft’s AI is turning out to be an incredibly good gambler and the ramifications will go well beyond the world of sports betting.

Let’s take a look at how Bing Predicts was able to outwit the best sporting minds in Las Vegas, and in the process, explore how AI is poised to upend the world of professional gambling. The basic principle driving Microsoft’s success at gambling rests on the “wisdom of the crowd.” In regards to predicting NFL winners, not only does the AI algorithm take into account such diverse variables as a team’s previous margins of victory, player statistics (rushing yards and passing yards for example), stadium surfaces, weather conditions, and so on, the secret sauce that seems to give it an edge over the other experts is the ability to quantify aggregate sentiments on the social web.

Walter Sun and the Bing Predicts team at Microsoft

Walter Sun and the Bing Predicts team at Microsoft

By tapping into social media and digesting the opinions of thousands, if not millions, of Twitter and Facebook users, the AI can pick up intangibles that defy even the most hardcore of human statisticians. For instance, the model might detect a rumor among Twitter users that the Patriots starting quarterback just had a fight with his wife in the wee hours before Sunday’s game and hence is less likely to be at the top of his form. While such rumors may prove to be unfounded, they have a core of truth enough of the time that they give the model a statistical advantage. In precise terms, Walter Sun, who heads up the Bing Predicts team, found that analyzing this so-called “wisdom of the crowd” actually increases the accuracy of their predictions by 5%.

While 5% may seem like a small amount, when it comes to beating the Las Vegas odds, if one is consistently beating the experts 5% of the time, that equals a fortune in gambling earnings and a troubling turn of events for Vegas bookies. This raises a real question: can professional sports gambling survive in a world where a Silicon Valley corporation holds the highest card in the deck? But if Vegas bookies think they have a lot to worry about, they are just one among many. A whole slew of industries are essentially gambling houses, and any algorithm that could beat their models would pose a major threat to their very existence.

Notable among these are the fields of insurance and commodities trading. If Microsoft or another one of the Silicon Valley behemoths that are developing cutting edge AI can leverage their advantage in the prediction business to outgun the industry leaders in some of these fields, they wouldn’t have to wait long to achieve supremacy in the market. Brace yourself: We may be headed towards a world dominated by a handful of tech corporations vying with each other to develop the best AI prediction algorithm.

In-depth introduction to machine learning in 15 hours of expert videos

In January 2014, Stanford University professors Trevor Hastie and Rob Tibshirani (authors of the legendary Elements of Statistical Learning textbook) taught an online course based on their newest textbook, An Introduction to Statistical Learning with Applications in R (ISLR). I found it to be an excellent course in statistical learning (also known as “machine learning”), largely due to the high quality of both the textbook and the video lectures. And as an R user, it was extremely helpful that they included R code to demonstrate most of the techniques described in the book.

If you are new to machine learning (and even if you are not an R user), I highly recommend reading ISLR from cover-to-cover to gain both a theoretical and practical understanding of many important methods for regression and classification. It is available as a free PDF download from the authors’ website.

If you decide to attempt the exercises at the end of each chapter, there is a GitHub repository of solutions provided by students you can use to check your work.

As a supplement to the textbook, you may also want to watch the excellent course lecture videos (linked below), in which Dr. Hastie and Dr. Tibshirani discuss much of the material. In case you want to browse the lecture content, I’ve also linked to the PDF slides used in the videos.

Chapter 1: Introduction (slidesplaylist)

Chapter 2: Statistical Learning (slidesplaylist)

Chapter 3: Linear Regression (slidesplaylist)

Chapter 4: Classification (slidesplaylist)

Chapter 5: Resampling Methods (slidesplaylist)

Chapter 6: Linear Model Selection and Regularization (slidesplaylist)

Chapter 7: Moving Beyond Linearity (slidesplaylist)

Chapter 8: Tree-Based Methods (slidesplaylist)

Chapter 9: Support Vector Machines (slidesplaylist)

Chapter 10: Unsupervised Learning (slidesplaylist)

 

لطفاً برای خرید این مجموعه ، درخواست خود را ایمیل فرمایید

 

Playing Games Might Help AI Advance

A new company wants to build artificial intelligence through game play.

The “artificial intelligence” found in most computer games isn’t very intelligent at all. Characters in the games tend to be controlled by algorithms that produce patterns of behaviors designed to seem natural and realistic, but the characters are actually rigid, with no capacity to learn or adapt.

One company hopes to come up with something a lot smarter by providing a platform that lets software learn how to behave within a game, whether in response to basic stimuli or to more complex situations. The hope is that this kind of learning will eventually allow complex behavior to emerge in game characters—and make for better AI in a range of applications.

Keen Software, based in the Czech Republic and the U.K., makes several “sandbox” games in which players can construct complex virtual structures and machines using realistic materials and physics. This July, the company spun out a business called GoodAI that aims to develop sophisticated AI using machine learning. Marek Rosa, Keen’s CEO, invested $10 million of his own money in the new company.

GoodAI has released open-source software called Brain Simulator that can be used to train a series of artificial neural networks in how to respond to stimuli from a game environment. Through trial and error, these networks can learn how to play a simple game. And several networks can be chained together to create more complex behavior, making it possible for software to learn how to achieve an objective that may require numerous steps.

The company’s researchers have shown that Brain Simulator can be used to train software to play some simple two-dimensional games. These include Breakout, in which a player bounces a ball off a wall of bricks (which disappear once hit), and a maze game that requires completing a series of different tasks.

The virtual character in the maze game “will start to do some random actions, and will be observing how he is changing the environment, or how it’s changing him,” Rosa says. “While he’s changing the environment, he’s learning all these associations and these patterns.”

Learning associations and patterns happens to be a key goal for AI in general, which is why Rosa hopes to eventually develop forms of artificial intelligence with broad utility beyond games. That’s reminiscent of the approach taken by an AI startup called DeepMind that Google bought last year (see “Google’s AI Masters Space Invaders”).DeepMind is using customized machine-learning approaches to teach software to play various simple games.

AI researchers have long used game play as a way to test artificial-intelligence software, says Roman Yampolskiy, an assistant professor at the University of Louisville. “From checkers to chess to poker and go, some of the greatest accomplishments in AI research have been demonstrated around the game board,” he says. What’s interesting about the approach GoodAI and DeepMind are taking is their computers are not given prior understanding of a game’s rules, he says.

However, it’s still not clear whether the strategy will be useful beyond games. Yampolskiy, who has looked at GoodAI’s software, says that while it is a worthwhile contribution to the field, it may be very hard to use as the basis for a more general-purpose AI.

Automating big-data analysis

Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.

MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.

In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.

“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

Between the lines

Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.

Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.

“What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”

In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITx doesn’t record either of those statistics, but it does collect data from which they can be inferred.

Featured composition

Kanter and Veeramachaneni use a couple of tricks to manufacture candidate features for data analyses. One is to exploit structural relationships inherent in database design. Databases typically store different types of data in different tables, indicating the correlations between them using numerical identifiers. The Data Science Machine tracks these correlations, using them as a cue to feature construction.

For instance, one table might list retail items and their costs; another might list items included in individual customers’ purchases. The Data Science Machine would begin by importing costs from the first table into the second. Then, taking its cue from the association of several different items in the second table with the same purchase number, it would execute a suite of operations to generate candidate features: total cost per order, average cost per order, minimum cost per order, and so on. As numerical identifiers proliferate across tables, the Data Science Machine layers operations on top of each other, finding minima of averages, averages of sums, and so on.

It also looks for so-called categorical data, which appear to be restricted to a limited range of values, such as days of the week or brand names. It then generates further feature candidates by dividing up existing features across categories.

Once it’s produced an array of candidates, it reduces their number by identifying those whose values seem to be correlated. Then it starts testing its reduced set of features on sample data, recombining them in different ways to optimize the accuracy of the predictions they yield.

“The Data Science Machine is one of those unbelievable projects where applying cutting-edge research to solve practical problems opens an entirely new way of looking at the problem,” says Margo Seltzer, a professor of computer science at Harvard University who was not involved in the work. “I think what they’ve done is going to become the standard quickly — very quickly.”

Object recognition for free

System designed to label visual scenes according to type turns out to detect particular objects, too.

Object recognition — determining what objects are where in a digital image — is a central research topic in computer vision.

But a person looking at an image will spontaneously make a higher-level judgment about the scene as whole: It’s a kitchen, or a campsite, or a conference room. Among computer science researchers, the problem known as “scene recognition” has received relatively little attention.

Last December, at the Annual Conference on Neural Information Processing Systems, MIT researchers announced the compilation of the world’s largest database of images labeled according to scene type, with 7 million entries. By exploiting a machine-learning technique known as “deep learning” — which is a revival of the classic artificial-intelligence technique of neural networks — they used it to train the most successful scene-classifier yet, which was between 25 and 33 percent more accurate than its best predecessor.

At the International Conference on Learning Representations this weekend, the researchers will present a new paper demonstrating that, en route to learning how to recognize scenes, their system also learned how to recognize objects. The work implies that at the very least, scene-recognition and object-recognition systems could work in concert. But it also holds out the possibility that they could prove to be mutually reinforcing.

“Deep learning works very well, but it’s very hard to understand why it works — what is the internal representation that the network is building,” says Antonio Torralba, an associate professor of computer science and engineering at MIT and a senior author on the new paper. “It could be that the representations for scenes are parts of scenes that don’t make any sense, like corners or pieces of objects. But it could be that it’s objects: To know that something is a bedroom, you need to see the bed; to know that something is a conference room, you need to see a table and chairs. That’s what we found, that the network is really finding these objects.”

Torralba is joined on the new paper by first author Bolei Zhou, a graduate student in electrical engineering and computer science; Aude Oliva, a principal research scientist, and Agata Lapedriza, a visiting scientist, both at MIT’s Computer Science and Artificial Intelligence Laboratory; and Aditya Khosla, another graduate student in Torralba’s group.

Under the hood

Like all machine-learning systems, neural networks try to identify features of training data that correlate with annotations performed by human beings — transcriptions of voice recordings, for instance, or scene or object labels associated with images. But unlike the machine-learning systems that produced, say, the voice-recognition software common in today’s cellphones, neural nets make no prior assumptions about what those features will look like.

That sounds like a recipe for disaster, as the system could end up churning away on irrelevant features in a vain hunt for correlations. But instead of deriving a sense of direction from human guidance, neural networks derive it from their structure. They’re organized into layers: Banks of processing units — loosely modeled on neurons in the brain — in each layer perform random computations on the data they’re fed. But they then feed their results to the next layer, and so on, until the outputs of the final layer are measured against the data annotations. As the network receives more data, it readjusts its internal settings to try to produce more accurate predictions.

After the MIT researchers’ network had processed millions of input images, readjusting its internal settings all the while, it was about 50 percent accurate at labeling scenes — where human beings are only 80 percent accurate, since they can disagree about high-level scene labels. But the researchers didn’t know how their network was doing what it was doing.

The units in a neural network, however, respond differentially to different inputs. If a unit is tuned to a particular visual feature, it won’t respond at all if the feature is entirely absent from a particular input. If the feature is clearly present, it will respond forcefully.

The MIT researchers identified the 60 images that produced the strongest response in each unit of their network; then, to avoid biasing, they sent the collections of images to paid workers on Amazon’s Mechanical Turk crowdsourcing site, who they asked to identify commonalities among the images.

Beyond category

“The first layer, more than half of the units are tuned to simple elements — lines, or simple colors,” Torralba says. “As you move up in the network, you start finding more and more objects. And there are other things, like regions or surfaces, that could be things like grass or clothes. So they’re still highly semantic, and you also see an increase.”

According to the assessments by the Mechanical Turk workers, about half of the units at the top of the network are tuned to particular objects. “The other half, either they detect objects but don’t do it very well, or we just don’t know what they are doing,” Torralba says. “They may be detecting pieces that we don’t know how to name. Or it may be that the network hasn’t fully converged, fully learned.”

In ongoing work, the researchers are starting from scratch and retraining their network on the same data sets, to see if it consistently converges on the same objects, or whether it can randomly evolve in different directions that still produce good predictions. They’re also exploring whether object detection and scene detection can feed back into each other, to improve the performance of both. “But we want to do that in a way that doesn’t force the network to do something that it doesn’t want to do,” Torralba says.

“Our visual world is much richer than the number of words that we have to describe it,” says Alexei Efros, an associate professor of computer science at the University of California at Berkeley. “One of the problems with object recognition and object detection — in my view, at least — is that you only recognize the things that you have words for. But there are a lot of things that are very much visual, but maybe there aren’t easy describable words for them. Here, the most exciting thing for me would be that, by training on things that we do have labels for — kitchens, bathrooms, shops, whatever — we can still get at some of these visual elements and visual concepts that we wouldn’t even be able to train for, because we can’t name them.”

“More globally,” he adds, “it suggests that even if you have some very limited labels and very limited tasks, if you train a model that is a powerful model on them, it could also be doing less limited things. This kind of emergent behavior is really neat.”

 



Popular Pages
Popular posts
Interested
About me

My name is Sayed Ahmadreza Razian and I am a graduate of the master degree in Artificial intelligence .
Click here to CV Resume page

Related topics such as image processing, machine vision, virtual reality, machine learning, data mining, and monitoring systems are my research interests, and I intend to pursue a PhD in one of these fields.

جهت نمایش صفحه معرفی و رزومه کلیک کنید

My Scientific expertise
  • Image processing
  • Machine vision
  • Machine learning
  • Pattern recognition
  • Data mining - Big Data
  • CUDA Programming
  • Game and Virtual reality

Download Nokte as Free


Coming Soon....

Greatest hits

You are what you believe yourself to be.

Paulo Coelho

Waiting hurts. Forgetting hurts. But not knowing which decision to take can sometimes be the most painful.

Paulo Coelho

The fear of death is the most unjustified of all fears, for there’s no risk of accident for someone who’s dead.

Albert Einstein

It’s the possibility of having a dream come true that makes life interesting.

Paulo Coelho

Imagination is more important than knowledge.

Albert Einstein

Anyone who has never made a mistake has never tried anything new.

Albert Einstein

One day you will wake up and there won’t be any more time to do the things you’ve always wanted. Do it now.

Paulo Coelho

Gravitation is not responsible for people falling in love.

Albert Einstein


Site by images
Statistics
  • 6,279
  • 19,112
  • 62,644
  • 18,332
Recent News Posts