Colorization of grayscale images is a simple task for the human imagination.
Researchers from the Toyota Technological Institute at Chicago and University of Chicago developed a fully automatic image colorization system using deep learning and GPUs. Their paper mentions previous approaches required some level of user input.
Using a TITAN X GPU, they trained their deep neural network to predict hue and chroma distributions for each pixel given its hypercolumn descriptor. The predicted distributions then determine color assignment at test time.
With multiple applications that can benefit from automatic colorization (such as historical photographs and videos, artist assistance), the research strives to make colorization cost-effective and less-time consuming.
5 April, Facebook introduced a new feature that automatically generates text descriptions of pictures using advanced object recognition technology.
Until now, people using screen readers would only hear the name of the person who shared the photo, followed by the term “photo” when they came upon an image in News Feed. Now they will get a richer description of what’s in a photo. For instance, someone could now hear, “Image may contain three people, smiling, outdoors.”
The Facebook researchers noted that it took nearly ten months to roll the feature out publicly, as they had to train their deep learning models to recognize more than just the people in the images. For instance, since people mostly care about who is in the photo and what they are doing, but sometimes the background of the photo is what makes it interesting or significant.
While that may be intuitive to humans, it is quite challenging to teach a machine to provide as much useful information as possible while acknowledging the social context.
Their neural network models were trained on a million parameters, but they have carefully selected a set of about 100 concepts based on prominence in photos as well as the accuracy of the visual recognition system. They also avoided concepts that had very specific meanings like smiling, jewelry, cars, and boats. Currently, they are ensuring their object detection algorithm on the objects have a minimum precision rate of 0.8.
Last night Google’s AI AlphaGo won the first in a five-game series against the world’s best Go player, in Seoul, South Korea. The success comes just five months after a slightly less experienced version of the same program became the first machine to defeat any Go professional by winning five games against the European champion.
This victory was far more impressive though because it came at the expense of Lee Sedol, 33, who has dominated the ancient Chinese game for a decade. The European champion, Fan Hui, is ranked only 663rd in the world.
And the machine, by all accounts, played a noticeably stronger game than it did back in October, evidence that it has learned much since then. Describing their research in the journal Nature, AlphaGo’s programmers insist that it now studies mostly on its own, tuning its deep neural networks by playing millions of games against itself.
The object of Go is to surround and capture territory on a 19-by-19 board; each player alternates to place a lozenge-shaped white or black piece, called a stone, on the intersections of the lines. Unlike in chess, the player of the black stones moves first.
The neural networks judge the position, and do so well enough to play a good game. But AlphaGo rises one level further by yoking its networks to a system that generates a “tree” of analysis that represents the many branching possibilities that the game might follow. Because so many moves are possible the branches quickly become an impenetrable thicket, one reason why Go programmers haven’t had the same success as chess programmers when using this “brute force” method alone. Chess has a far lower branching factor than Go.
It seems that AlphaGo’s self-improving capability largely explains its quick rise to world mastery. By contrast, chess programs’ brute-force methods required endless fine-tuning by engineers working together with chess masters. That partly explains why programs took nine years to progress from the first defeat of a grandmaster in a single game, back in 1988, to defeating then World Champion Garry Kasparov, in a six-game match, in 1997.
Even that crowning achievement—garnered with worldwide acclaim by IBM’s Deep Blue machine—came only on the second attempt. The previous year Deep Blue had managed to win only one game in the match—the first. Kasparov then exploited weaknesses he’d spotted in the computer’s game to win three and draw four subsequent games.
Sedol appears to face longer odds of staging a comeback. Unlike Deep Blue, AlphaGo can play numerous games against itself during the 24 hours until Game Two (to be streamed live tonight at 11 pm EST, 4 am GMT). The machine can study ceaselessly, unclouded by worry, ambition, fear, or hope.
Sedol, the king of the Go world, must spend much of his time sleeping—if he can. Uneasy lies the head that wears a crown.
From self-driving cars to environment-sensing robots, deep learning is tackling some of the world’s toughest technological challenges. But it’s not just for gadgets and gizmos; it’s also aiming to fix your grammar.
In honor of National Grammar Day – it’s today – take a look at these sentences, which are guaranteed to rattle even your ninth grade English teacher: Its a scandal! Seven people was arrested at they’re National Grammar Day party, after they set a stack of mispelled word’s on fire.
Can you spot the errors? Don’t worry, you don’t have to. GPU-accelerated deep learning and an automated grammar checker called Grammarly can find the flubs in a split-second.
Grammarly, which is consistently one of the top-ranked grammar checkers, is available as a Chrome or Safari extension, and can be used for Outlook, Word and social media. Like many of the automated editors, it comes in a free and premium version.
Deep Learning Gets Smarter with More Data
Although deep learning is one of many machine learning techniques Grammarly uses to detect and correct errors, it’s a powerful one. Traditional machine learning requires a human expert (or experts) to define all of the factors the computer should evaluate in the data — how to use a comma, for example. This is usually a slow and challenging process.
With GPU-accelerated deep learning, non-experts can feed raw data into the computer, and the neural network automatically discovers which patterns are important. In the case of grammar, it could be the myriad patterns that are important to writing correctly.
“By virtue of having read through and corrected millions of documents and made billions of suggestions, we’ve been able to really refine error-correction algorithms,” said Nikolas Baron, online marketing manager at San Francisco-based Grammarly.
Most of the tools use natural language processing and some form of machine learning to analyze and understand text. At least one other company, Austin-based startup Deep Grammar, uses deep learning to fix your grammar.
“The more phrases you feed it, the more it learns,” said Jonathan Mugan, co-founder and creator of Deep Grammar.
Don’t Throw Away Your Style Book
Grammarly isn’t perfect. Neither were any of the other free tools I tested.
The company let me try a premium version, which caught all six errors above and even recommended avoiding the passive voice in “were arrested.” Its free online version missed just one mistake, which was the best performance of any of the grammar fixers. But that was only after three other sentences stumped both the free and premium versions.
Few of the free online proofreaders would score even a C in a high school English class. The only way to catch every mistake in the botched sentences above was to combine the results of all 10 tested tools.
Neural networks learn to recognize objects in images and perform other artificial intelligence tasks with a very low error rate. (Just last week, a neural network built by Google’s Deep Mind lab in London beat a master of the complex Go game—one of the grand challenges of AI.) But they’re typically too complex to run on a smartphone, where, you have to admit, they’d be pretty useful. Perhaps no more. At the IEEE International Solid State Circuits Conference in San Francisco on Tuesday, MIT engineers presented a chip designed to use run sophisticated image-processing neural network software on a smartphone’s power budget.
The great performance of neural networks doesn’t come free. In image processing, for example, neural networks like AlexNet work so well because they put an image through a huge number of filters, first finding image edges, then identifying objects, then figuring out what’s happening in a scene. All that requires moving data around a computer again and again, which takes a lot of energy, says Vivienne Sze, an electrical engineering professor at MIT. Sze collaborated with MIT computer science professor Joel Emer, who is also a senior research scientist at GPU-maker Nvidia.
“On our chip we bring the data as close as possible to the processing units, and move the data as little as possible,” says Sze. When run on an ordinary GPU, neural networks fetch the same image data multiple times. The MIT chip has 168 processing engines, each with its own dedicated memory nearby. Nearby units can talk to each other directly, and this proximity saves power. There’s also a larger, primary storage bank farther off, of course. “We try to go there as little as possible,” says Emer. Furthering the limits on moving data, the hardware compresses the data it does send and uses statistics about the data to do fewer calculations on it than a GPU would.
All that means that when running a powerful neural network program the MIT chip, called Eyeriss, uses one-tenth the energy (0.3 watts) of a typical mobile GPU (5 – 10 W). “This is the first custom chip capable of demonstrating a full, state-of-the-art neural network,” says Sze. Eyeriss can run AlexNet, a highly accurate and computationally demanding neural network. Previous such chips could only run specific algorithms, says the MIT group; they chose to test AlexNet because it’s so demanding, and are confident it can run others of arbitrary size, they say.
Besides a use in smartphones, this kind of chip could help self-driving cars navigate and play a role in other portable electronics. At ISSCC, Hoi-Jun Yoo’s group at the Korea Advanced Institute of Science and Technology showed a pair of augmented reality glasses that use a neural network to train a gesture- and speech-based user interface to a particular user’s gestures, hand size, and dialect.
Yoo says the MIT chip may be able to run neural networks at low power once they’re trained, but he notes that the even more computationally-intensive learning process for AlexNet can’t be done on them. The MIT chip could in theory run any kind of trained neural network, whether it analyzes images, sounds, medical data, or whatever else. Yoo says it’s also important to design chips that may be more specific to a particular category of task—such as following hand gestures—and are better at learning those tasks on the fly. He says this could make for a better user experience in wearable electronics, for example. These systems need to be able to learn on the fly because the world is unpredictable and each user is different. Your computer should start to fit you like your favorite pair of jeans.
System learns to play text-based computer game using only linguistic information.
MIT researchers have designed a computer system that learns how to play a text-based computer game with no prior assumptions about how language works. Although the system can’t complete the game as a whole, its ability to complete sections of it suggests that, in some sense, it discovers the meanings of words during its training.
In 2011, professor of computer science and engineering Regina Barzilay and her students reported a system that learned to play a computer game called “Civilization” by analyzing the game manual. But in the new work, on which Barzilay is again a co-author, the machine-learning system has no direct access to the underlying “state” of the game program — the data the program is tracking and how it’s being modified.
“When you play these games, every interaction is through text,” says Karthik Narasimhan, an MIT graduate student in computer science and engineering and one of the new paper’s two first authors. “For instance, you get the state of the game through text, and whatever you enter is also a command. It’s not like a console with buttons. So you really need to understand the text to play these games, and you also have more variability in the types of actions you can take.”
Narasimhan is joined on the paper by Barzilay, who’s his thesis advisor, and by fellow first author Tejas Kulkarni, a graduate student in the group of Josh Tenenbaum, a professor in the Department of Brain and Cognitive Sciences. They presented the paper last week at the Empirical Methods in Natural Language Processing conference.
The researchers were particularly concerned with designing a system that could make inferences about syntax, which has been a perennial problem in the field of natural-language processing. Take negation, for example: In a text-based fantasy game, there’s a world of difference between being told “you’re hurt” and “you’re not hurt.” But a system that just relied on collections of keywords as a guide to action would miss that distinction.
So the researchers designed their own text-based computer game that, though very simple, tended to describe states of affairs using troublesome syntactical constructions such as negation and conjunction. They also tested their system against a demonstration game built by the developers of Evennia, a game-creation toolkit. “A human could probably complete it in about 15 minutes,” Kulkarni says.
To evaluate their system, the researchers compared its performance to that of two others, which use variants of a technique standard in the field of natural-language processing. The basic technique is called the “bag of words,” in which a machine-learning algorithm bases its outputs on the co-occurrence of words. The variation, called the “bag of bigrams,” which looks for the co-occurrence of two-word units.
On the Evennia game, the MIT researchers’ system outperformed systems based on both bags of words and bags of bigrams. But on the homebrewed game, with its syntactical ambiguities, the difference in performance was even more dramatic. “What we created is adversarial, to actually test language understanding,” Narasimhan says.
The MIT researchers used an approach to machine learning called deep learning, a revival of the concept of neural networks, which was a staple of early artificial-intelligence research. Typically, a machine-learning system will begin with some assumptions about the data it’s examining, to prevent wasted time on fruitless hypotheses. A natural-language-processing system could, for example, assume that some of the words it encounters will be negation words — though it has no idea which words those are.
Neural networks make no such assumptions. Instead, they derive a sense of direction from their organization into layers. Data are fed into an array of processing nodes in the bottom layer of the network, each of which modifies the data in a different way before passing it to the next layer, which modifies it before passing it to the next layer, and so on. The output of the final layer is measured against some performance criterion, and then the process repeats, to see whether different modifications improve performance.
In their experiments, the researchers used two performance criteria. One was completion of a task — in the Evennia game, crossing a bridge without falling off, for instance. The other was maximization of a score that factored in several player attributes tracked by the game, such as “health points” and “magic points.”
On both measures, the deep-learning system outperformed bags of words and bags of bigrams. Successfully completing the Evennia game, however, requires the player to remember a verbal description of an engraving encountered in one room and then, after navigating several intervening challenges, match it up with a different description of the same engraving in a different room. “We don’t know how to do that at all,” Kulkarni says.
“I think this paper is quite nice and that the general area of mapping natural language to actions is an interesting and important area,” says Percy Liang, an assistant professor of computer science and statistics at Stanford University who was not involved in the work. “It would be interesting to see how far you can scale up these approaches to more complex domains.”
Future versions of an algorithm from the Computer Science and Artificial Intelligence Lab could help with teaching, marketing, and memory improvement.
Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have created an algorithm that can predict how memorable or forgettable an image is almost as accurately as humans — and they plan to turn it into an app that subtly tweaks photos to make them more memorable.
For each photo, the “MemNet” algorithm — which you can try out online by uploading your own photos — also creates a heat map that identifies exactly which parts of the image are most memorable.
“Understanding memorability can help us make systems to capture the most important information, or, conversely, to store information that humans will most likely forget,” says CSAIL graduate student Aditya Khosla, who was lead author on a related paper. “It’s like having an instant focus group that tells you how likely it is that someone will remember a visual message.”
Team members picture a variety of potential applications, from improving the content of ads and social media posts, to developing more effective teaching resources, to creating your own personal “health-assistant” device to help you remember things.
Part of the project the team has also published the world’s largest image-memorability dataset, LaMem. With 60,000 images, each annotated with detailed metadata about qualities such as popularity and emotional impact, LaMem is the team’s effort to spur further research on what they say has often been an under-studied topic in computer vision.
The paper was co-written by CSAIL graduate student Akhil Raju, Professor Antonio Torralba, and principal research scientist Aude Oliva, who serves as senior investigator of the work. Khosla will present the paper in Chile this week at the International Conference on Computer Vision.
How it works
The team previously developed a similar algorithm for facial memorability. What’s notable about the new one, besides the fact that it can now perform at near-human levels, is that it uses techniques from “deep-learning,” a field of artificial intelligence that use systems called “neural networks” to teach computers to sift through massive amounts of data to find patterns all on their own.
Such techniques are what drive Apple’s Siri, Google’s auto-complete, and Facebook’s photo-tagging, and what have spurred these tech giants to spend hundreds of millions of dollars on deep-learning startups.
“While deep-learning has propelled much progress in object recognition and scene understanding, predicting human memory has often been viewed as a higher-level cognitive process that computer scientists will never be able to tackle,” Oliva says. “Well, we can, and we did!”
Neural networks work to correlate data without any human guidance on what the underlying causes or correlations might be. They are organized in layers of processing units that each perform random computations on the data in succession. As the network receives more data, it readjusts to produce more accurate predictions.
The team fed its algorithm tens of thousands of images from several different datasets, including LaMem and the scene-oriented SUN and Places (all of which were developed at CSAIL). The images had each received a “memorability score” based on the ability of human subjects to remember them in online experiments.
The team then pitted its algorithm against human subjects by having the model predicting how memorable a group of people would find a new never-before-seen image. It performed 30 percent better than existing algorithms and was within a few percentage points of the average human performance.
For each image, the algorithm produces a heat map showing which parts of the image are most memorable. By emphasizing different regions, they can potentially increase the image’s memorability.
“CSAIL researchers have done such manipulations with faces, but I’m impressed that they have been able to extend it to generic images,” says Alexei Efros, an associate professor of computer science at the University of California at Berkeley. “While you can somewhat easily change the appearance of a face by, say, making it more ‘smiley,’ it is significantly harder to generalize about all image types.”
The research also unexpectedly shed light on the nature of human memory. Khosla says he had wondered whether human subjects would remember everything if they were shown only the most memorable images.
“You might expect that people will acclimate and forget as many things as they did before, but our research suggests otherwise,” he says. “This means that we could potentially improve people’s memory if we present them with memorable images.”
The team next plans to try to update the system to be able to predict the memory of a specific person, as well as to better tailor it for individual “expert industries” such as retail clothing and logo design.
“This sort of research gives us a better understanding of the visual information that people pay attention to,” Efros says. “For marketers, movie-makers and other content creators, being able to model your mental state as you look at something is an exciting new direction to explore.”
The work is supported by grants from the National Science Foundation, as well as the McGovern Institute Neurotechnology Program, the MIT Big Data Initiative at CSAIL, research awards from Google and Xerox, and a hardware donation from Nvidia.
AliCloud will work with NVIDIA to broadly promote its cloud-based GPU offerings to its customers — primarily fast-growing startups – for AI and HPC work.
“Innovative companies in deep learning are one of our most important user communities,” said Zhang Wensong, chief scientist of AliCloud. “Together with NVIDIA, AliCloud will use its strength in public cloud computing and experiences accumulated in HPC to offer emerging companies in deep learning greater support in the future.”
The two companies will also create a joint research lab, providing AliCloud users with services and support to help them take advantage of GPU-accelerated computing to create deep learning and other HPC applications.
The NVIDIA Deep Learning SDK brings high-performance GPU acceleration to widely used deep learning frameworks such as Caffe, TensorFlow, Theano, and Torch. The powerful suite of tools and libraries are for data scientists to design and deploy deep learning applications.
Columbia University researchers have created a robotic system that detects wrinkles and then irons the piece of cloth autonomously.
Their paper highlights the ironing process is the final step needed in their “pipeline” of a robot picking up a wrinkled shirt, then laying it on the table and lastly, folding the shirt with robotic arms.
A GeForce GTX 770 GPU was used for their “wrinkle analysis algorithm” which analyzes the cloth’s surface using two surface scan techniques: a curvature scan that uses a Kinect depth sensor to estimate the height deviation of the cloth surface, and a discontinuity scan that uses a Kinect RGB camera to detect wrinkles.
Their solution was a success – check out their video below.
- Resume Full name Sayed Ahmadreza Razian Nationality Iran Age 36 (Sep 1982) Website ahmadrezarazian.ir Email ...
- معرفی نام و نام خانوادگی سید احمدرضا رضیان محل اقامت ایران - اصفهان سن 33 (متولد 1361) پست الکترونیکی firstname.lastname@example.org درجات علمی...
- Nokte – نکته نرم افزار کاربردی نکته نسخه 1.0.8 (رایگان) نرم افزار نکته جهت یادداشت برداری سریع در میزکار ویندوز با قابلیت ذخیره سازی خودکار با پنل ساده و کم ح...
- Tianchi-The Purchase and Redemption Forecasts 2015 Special Prize – Tianchi Golden Competition (2015) “The Purchase and Redemption Forecasts” in Big data (Alibaba Group) Among 4868 teams. Introd...
- Shangul Mangul HabeAngur Shangul Mangul HabeAngur (City of Goats) is a game for child (4-8 years). they learn how be useful in the city and respect to people. Persian n...
- Tianchi-Brick and Mortar Store Recommendation with Budget Constraints Ranked 5th – Tianchi Competition (2016) “Brick and Mortar Store Recommendation with Budget Constraints” (IJCAI Socinf 2016-New York,USA)(Alibaba Group...
- Drowning Detection by Image Processing In this research, I design an algorithm for image processing of a swimmer in pool. This algorithm diagnostics the swimmer status. Every time graph sho...
- 1st National Conference on Computer Games-Challenges and Opportunities 2016 According to the public relations and information center of the presidency vice presidency for science and technology affairs, the University of Isfah...
- 3rd International Conference on The Persian Gulf Oceanography 2016 Persian Gulf and Hormuz strait is one of important world geographical areas because of large oil mines and oil transportation,so it has strategic and...
- 2nd Symposium on psychological disorders in children and adolescents 2016 2nd Symposium on psychological disorders in children and adolescents 2016 Faculty of Nursing and Midwifery – University of Isfahan – 2 Aug 2016 - Ass...
- Optimizing raytracing algorithm using CUDA Abstract Now, there are many codes to generate images using raytracing algorithm, which can run on CPU or GPU in single or multi-thread methods. In t...
- My City This game is a city simulation in 3d view. Gamer must progress the city and create building for people. This game is simular the Simcity.
- Deep Learning for Computer Vision with MATLAB and cuDNN Deep learning is becoming ubiquitous. With recent advancements in deep learning algorithms and GPU technology...
- AMD Ryzen Downcore Control AMD Ryzen 7 processors comes with a nice feature: the downcore control. This feature allows to enable / disabl...
- کودا – CUDA کودا به انگلیسی (CUDA) که مخفف عبارت انگلیسی Compute Unified Device Architecture است یک سکوی پردازش موازی و مد...
- Head-mounted Displays (HMD) Head-mounted displays or HMDs are probably the most instantly recognizable objects associated with virtual rea...
- Using Machine Learning to Optimize Warehouse Operations With thousands of orders placed every hour and each order assigned to a pick list, Europe’s leading online fas...
- Detecting and Labeling Diseases in Chest X-Rays with Deep Learning Researchers from the National Institutes of Health in Bethesda, Maryland are using NVIDIA GPUs and deep learni...
- Unity – What’s new in Unity 5.3.3 The Unity 5.3.3 public release brings you a few improvements and a large number of fixes. Read the release not...
- Unity – What’s new in Unity 5.3.4 The Unity 5.3.4 public release brings you a few improvements and a large number of fixes. Read the release not...
- Real-Time Pedestrian Detection using Cascades of Deep Neural Networks Google Research presents a new real-time approach to object detection that exploits the efficiency o...
- NVIDIA TITAN Xp vs TITAN X NVIDIA has more or less silently launched a new high end graphics card around 10 days ago. Here are some pictu...
- Automatic Colorization Automatic Colorization of Grayscale Images Researchers from the Toyota Technological Institute at Chicago and University of Chicago developed a fully aut...
- IBM Watson Chief Technology Officer Rob High to Speak at GPU Technology Conference Highlighting the key role GPUs will play in creating systems that understand data in human-like ways, Rob High...
- Diagnosing Cancer with Deep Learning and GPUs Using GPU-accelerated deep learning, researchers at The Chinese University of Hong Kong pushed the boundaries...
- About CUDA – More Than A Programming Model The CUDA compute platform extends from the 1000s of general purpose compute processors featured in our GPU's c...
- Artificial intelligence to amplify digital transformation: Vishal SikkaThe digital transformation can best be achieved by adopting automation …
- New trailer introduces the historical figures of Assassin’s Creed: SyndicateLast month we got a look at the gang of …
- MSI VR One: a Pascal-based Gaming PC for VR in a BackpackMSI VR One is a gaming PC designed for virtual …
- Global Impact: How GPUs Help Eye Surgeons See 20/20 in the Operating RoomEditor’s note: This is one in a series of profiles …
- Deep Learning Helps Robot Learn to Walk the Way Humans DoUniversity of California, Berkeley researchers are using deep learning and …
- Accelerating Microsoft Cortana and Skype TranslatorAlexey Kamenev, Software Engineer at Microsoft Research talks about their …
- Open-Access Visual Search Tool for Satellite ImageryA new project by Carnegie Mellon University researchers provides journalists, …
- AI Build Smart Home Hub Smart Home Hub Brings Artificial Intelligence Into Your HomeA new AI-powered device will be able to replace all …
- Using Virtual Reality to Optimize User Experience Share Your Science: Using Virtual Reality to Optimize User ExperienceEASE VR Co-Founders Prithvi Kandanda, CEO and Fred Spencer, CTO …
- GeForce GTX 980 Notebooks a VR Developer’s DreamVirtual reality takes immense amounts of computing horsepower. Creating VR …