Facebook used billions of public Instagram photos to train artificial intelligence (AI) algorithm to categorise images for itself. The photos contained 17,000 hashtags that were given by Instagram users.
Training computers to do things that humans normally do — such as identifying what is in a photo — typically involves feeding them a great deal of data. But this data has to be labelled by humans, which takes time and costs money. Facebook has essentially developed a new technique that means it doesn’t have to get employees to sit down and categorise each image.
“The biggest limiting factor to making progress in computer vision — as in many fields of AI — is that we rely almost entirely on hand-labelled, human-curated, data sets,” said Mike Schroepfer, Facebook’s chief technology officer, at Facebook’s F8 developer conference in San Jose, California, on Wednesday. “This means if a person hasn’t spent the time to label something specific in an image, even the most advanced computer vision systems won’t be able to detect it at runtime because it hasn’t seen it in the training set.”
Schroepfer added: “We built some breakthrough technology that takes publicly available hashtagged images at an unprecedented scale. We have trained on 3.5 billion training images using a public set of images without any human curated images in that data set.”
The Instagram dataset is 10x bigger than a giant cache of photos that Google used to train image algorithms, according to Wired.
Srinivas Narayanan, an engineering director within Facebook’s applied machine learning group, added: “We have now created the world’s best computer vision system. It achieves the highest score ever of 84.5% accuracy on ImageNet — a dataset widely used for benchmarking.”
Schroepfer said Facebook is already using the computer vision system across its platform to spot “bad content” that needs to be removed. That likely includes things like nudity and terrorism-related content.