THE world, like the World Wide Web before it, is about to be hyperlinked. Soon, you may be able to find information about almost any physical object with the click of a smartphone.
This vision, once the stuff of science fiction, took a significant step forward this month when Google unveiled a smartphone application called Goggles. It allows users to search the Web, not by typing or by speaking keywords, but by snapping an image with a cellphone and feeding it into Google’s search engine.
How tall is that mountain on the horizon? Snap and get the answer. Who is the artist behind this painting? Snap and find out. What about that stadium in front of you? Snap and see a schedule of future games there.
Goggles, in essence, offers the promise to bridge the gap between the physical world and the Web.
Computer scientists have been trying to equip machines with virtual eyes for decades, and with varying degrees of success. The field, known as computer vision, has resulted in a smattering of applications and successes in the lab. But recognizing images at what techies call “scale,” meaning thousands or even millions of images, is hugely difficult, partly because it requires enormous computing power. It turns out that Google, with its collection of massive data centers, has just that.
“The technology exists and was developed by other people,” said Gary Bradski, a computer vision expert and a consulting professor of computer science at Stanford. “The breakthrough is doing this at scale. There are not many entities that could do that.”
Goggles is not the first application to try to create a link between the physical and virtual worlds via cellphones. A variety of so-called augmented-reality applications like World Surfer and Wikitude allow you to point your cellphone or its camera and find information about landmarks, restaurants and shops in front of you. Yet those applications typically rely on location data, matching information from maps with a cellphone’s GPS and compass data. Another class of applications reads bar codes to link objects or businesses with online information about them.
Goggles also uses location information to help identify objects, but its ability to recognize millions of images opens up new possibilities. “This is a big step forward in terms of making it work in all these different kinds of situations,” said Jason Hong, a professor at the Human Computer Interaction Institute at Carnegie Mellon University.
When you snap a picture with Goggles, Google spends a few seconds analyzing the image, then sends it up to its vast “cloud” of computers and tries to match it against an index of more than a billion images. Google’s data centers distribute the image-matching problem among hundreds or even thousands of computers to return an answer quickly.
Google says Goggles works best with certain categories of objects, including CDs, movie posters, products, wine labels, artwork, buildings and landmarks. It can read business cards and book covers. It doesn’t do so well with trees, cars or objects whose shape can change, like a towel. And it has trouble recognizing objects in less than ideal lighting conditions.
“Today, Google Goggles is limited because it recognizes certain objects in certain categories,” said Vic Gundotra, a vice president at Google in charge of its mobile phone applications. “But our goal is for Goggles to recognize every image. This is really the beginning.”
For now, Goggles is part of the “labs” section of Google’s Web site, which indicates that the product remains experimental. So it is not surprising that it has quirks and flaws.
Goggles had trouble recognizing the San Francisco-Oakland Bay Bridge, for example, when the image was shot with several trees in the way of its suspension span. But it did recognize it when the picture was snapped with fewer obstacles in the way. Faced with a picture of a Yahoo billboard shot in San Francisco, the search results showed Times Square, presumably because of the huge Yahoo billboard there.
But the service can also delight and amaze. It had no trouble recognizing an Ansel Adams photograph of Bridalveil Fall in Yosemite, returning search results for both the image and a book that used that image on its cover. It also correctly identified a BlackBerry handset, a Panasonic cordless phone and a Holmes air purifier. It stumbled with an Apple mouse, perhaps because there was a bright reflection on its plastic surface.
It’s not hard to imagine a slew of commercial applications for this technology. You could compare prices of a product online, learn how to operate that old water heater whose manual you have lost or find out about the environmental record of a certain brand of tuna. But Goggles and similar products could also tell the history of a building, help travelers get around in a foreign country or even help blind people navigate their surroundings.
It is also easy to think of scarier possibilities down the line. Google’s goal to recognize every image, of course, includes identifying people. Computer scientists say that it is much harder to identify faces than objects, but with the technology and computing power improving rapidly, improved facial recognition may not be far off.
Mr. Gundotra says that Google already has some facial-recognition capabilities, but that it has decided to turn them off in Goggles until privacy issues can be resolved. “We want to move with great discretion and thoughtfulness,” he said.
Google搜尋升級 有照片就夠了
Google推出名為Goggles的新智慧型手機服務,使用者無需口說、不用手寫,只要用手機拍下任何一件物體,就可透過Google搜尋引擎找到一切相關資訊。
透過Goggles,使用者拍下一座山,就可知道它的高度,拍下一幅畫,就能知道畫家是誰;拍下一座球場,甚至可以看到未來的賽程表。
電腦科學家數十年來一直試著讓機器具備虛擬視覺,也就是所謂「電腦視覺」領域,也在實驗室中取得了不同程度的成就,推出許多應用服務。然而電腦大規模影像辨識的技術極為困難,需要強大的電腦運算能力;擁有大量資料中心的Google剛好具備這個條件。
現任史丹福大學電腦科學顧問教授的電腦視覺專家布拉德斯基說,「這項技術是由其他人開發的,Google的突破在於辨識規模,能夠在這方面突破的公司並不多。」
Goggles並非第一個試圖利用手機連結實體與虛擬世界的服務。
許多「實境導航」服務,如World Surfer與Wikitude,都提供使用者利用手機或相機找出地標、餐廳、店家資訊,然而這些服務大多利用手機的GPS與指南資料核對地圖上的資訊,或利用物體的條碼搜尋線上資料。
當使用者拍下照片時,Google會花費數秒分析圖片,接著將它送到雲端運算中心,從數十億張圖片中尋找符合的圖片。透過Google資料中心成百上千部電腦分工,使用者可以很快得到答案。
Google表示,Goggles對某些類別的物件辨識能力較佳,包括光碟、電影海報、商品、酒標、藝術品、建築物與地標等,也可辨識名片與書本封面;對於樹木、汽車與可改變形體的物體如毛巾,Goggles的辨識能力較差;光線不佳的物體也難以辨識。
目前Goggles仍位於Google網站的“labs”類別,表示它仍處於實驗階段,有瑕疵與缺陷並不令人意外。
不難想像未來運用此技術的服務類別將很廣,例如,使用者可以在線上比價,遺失電器的使用手冊仍可找出操作方式;此外Goggles也可提供建築物的歷史、遊客可以透過它遨遊陌生國度、視障者能利用它偵測周遭情況。
但這類技術也有令人擔憂的一面。Google企圖辨識每一張圖片,當然也包括人。
電腦科學家說臉部較物體更難辨識,但以辨識技術與電腦運算能力的進步速度之快,臉部辨識能力的提升也許已不遠了。
負責Google行動軟體開發的副總裁崗多特拉說,Google已有部分臉部辨識技術,但在隱私權問題解決前不會在Goggles上啟用此功能。「我們要很謹慎,並在深思熟慮後才會行動。」
原文參照
http://www.nytimes.com/2009/12/20/business/20ping.html
2009/12/21 經濟日報 編譯吳柏賢