Data Sensed and Crunched Quickly, Showing New Trends
蒐集分析在地資料 指點新趨勢
By Quentin Hardy
SAN FRANCISCO – David Soloff is recruiting an army of “hyperdata” collectors.
大衛.索洛夫正在招募一支「超數據(hyperdata)」的蒐集大軍。
The company he co-founded, Premise, created a smartphone application that is now used by 700 people in 25 developing countries. These people, mostly college students and homemakers, photograph food and everyday items in public markets.
他參與創辦的Premise公司設計了一款智慧型手機的應用軟體,目前在25個開發中國家供700人使用。這批人主要是大學生和家庭主婦,在市場上拍攝食物和日用品的照片。
By analyzing the photos of prices and the placement of produces like tomatoes and shampoo, then matching that to other data, Premise is building a real-time inflation index to sell to companies and traders who are hungry for insightful data.
藉由分析這些產品,像是番茄和洗髮精的價格和陳列地點的照片,再與其他資料比對,Premise正建立一種即時的通貨膨脹指數,賣給對這些深入資料需求若渴的公司和交易員。
“Within five years, I’d like to have 3,000 or 4,000 people doing this,” said Mr. Soloff, who is also Premise’s chief executive. “It’s a useful global inflation monitor, a way of looking at food security, or a way a manufacturer can judge what kind of shelf space he is getting.”
Premise執行長索洛夫說:「我希望在五年內能有三、四千人來做這件事。它可有效監測全球通貨膨脹,檢視糧食供給安全的狀況,或是讓製造商判斷他可占到什麼樣的貨架空間。」
Collecting data from all sorts of odd places and analyzing it quickly has become one of the hottest areas of the technology industry. The idea is simple: With all that processing power and a little creativity, researchers should be able to find novel patterns and relationships among different kinds of information.
從各種奇特的地方蒐集資料,並快速分析,已成為科技產業最炙手可熱的領域之一。構想本身很簡單:藉由強大的處理能力和少許的創造力,研究人員應能在不同種類資訊之間找出新的模式和關聯性。
For the last few years, insiders have been calling this sort of analysis Big Data. Now Big Data is evolving, becoming more “hyper” and including all sorts of sources. Start-ups like Premise and ClearStory Data, as well as larger companies like General Electric, are getting into the act.
過去幾年業界稱這類分析為「巨量(或海量)資料(或數據)」。現在「巨量資料」正在演進,成為更加「超大量」,包含各種資料來源。像Premise 和ClearStory Data這樣的新創公司,以及奇異這樣的大公司,也加入行動。
Standard statistics might project next summer’s ice cream sales. The aim of people working on newer Big Data systems is to collect seemingly unconnected information like today’s heat and cloud cover, and a hometown team’s victory over the weekend, compare that with past weather and sports outcomes, and figure out how much mint chip ice cream mothers would buy today.
標準的統計數字也許能預測下個夏天冰淇淋的銷量。更新的「巨量資料」系統從業人員的目標,則為蒐集看似不相關的資料,如今天的氣溫和雲層,以及地主球隊周末的勝利,與過去的天氣和運動比賽結果比較,計算出媽媽們今天會買多少薄荷巧克力脆片冰淇淋。
There are early signs it could work. Premise claims to have spotted broad national inflation in India months ahead of the government by looking at onion prices in a couple of markets.
初步跡象顯示此計可行。Premise宣稱藉由檢視印度幾個市場的洋蔥價格,比印度政府早數月得知廣泛的全國通膨率。
The photographers working for Premise receive up to 10 cents a picture. Premise also gathers time and location information from the phones, plus a few notes on things like whether the market was crowded.
為Premise拍照的攝影師,每張最高可獲10美分酬勞。Premise也透過手機照片蒐集到時間和地點的資料,以及市場是否擁擠之類的註記。
Price data from the photos gets blended with prices from 30,000 websites. Premise then builds national inflation indexes and price maps for markets in places like Shanghai and Rio de Janeiro.
Premise把從照片得到的價格資料與三萬個網站上的價格資料綜合起來,再建立全國的通膨指數,以及上海、里約熱內盧等等地方的市場價格地圖。
Premise’s subscribers include Wall Street hedge funds and Procter & Gamble. Subscriptions to the service range from $1,500 to more than $15,000 a month, though there is also a version that offers free data to schools and nonprofit groups.
Premise的訂戶包括華爾街的避險基金和寶鹼公司。月費從1500美元到1萬5000美元以上,不過也有一種版本提供學校和非營利組織免費資料。
The new Big Data connections are also benefiting from the increasing amount of public information available. According to research from the McKinsey Global Institute, 40 national governments now offer data on matters like population and land use.
可取得的公共資料量愈來愈大,對新的「巨量資料」連結也有益。根據麥肯錫全球研究所的研究,現在有40個國家政府提供人口和土地使用之類資料。
That government data can be matched with sensors on smartphones, jet engines, even bicycle stations, that are uploading data into the supercomputers of cloud computing systems.
政府的資料可與智慧型手機、噴射引擎,甚至自行車站上的感應器比對,這些感應器將資料上傳到雲端運算系統的超級電腦。
Until a few years ago, data was expensive to get and hard to load into computers. As sensor prices have dropped, however, and things like Wi-Fi have enabled connectivity, that has changed.
直到幾年前,資料的取得一直很昂貴,也難以下載到電腦裡。隨著感應器價格下滑,以及Wi-Fi之類科技使得連結變為可能,情況已經改觀變。
In the world of computer hardware, in-memory computing – an advance that allows data to be crunched without being stored in a different location – has increased computing speeds, allowing for real-time data crunching.
在電腦的硬體世界裡,記憶體內的運算-讓大量資料不需另存他處即可予以處理的先進技術-已使電腦運算速度加快,得以即時處理大量資料。
General Electric, for example, which has over 200 sensors in a single jet engine, has worked with Accenture to build a business analyzing aircraft performance the moment the jet lands.
例如,奇異公司在單一一個噴射引擎上就有200多個感應器,與埃森哲諮詢公司合作做起一門生意,在噴射機降落的那一刻分析飛機的性能。
Traditional data analysis was built on looking at information, like payroll stubs, that could be loaded into a spreadsheet. With the explosion of the Web, however, companies like Google, Facebook and Yahoo were faced with unprecedented volumes of “unstructured” data, like how people cruised the Web or comments they made. New hardware and software cut the time it takes to analyze this information.
傳統的資料分析建立在檢視可下載到試算表的資訊,如薪資單。而在網路高度發達後,Google、臉書和雅虎之類公司面臨空前大量的「非結構化」資料,像是人們如何在網路上巡弋,或是他們的評論。新的硬體和軟體能縮減分析這種資料所需的時間。
ClearStory Data, a start-up based in Palo Alto, California, has introduced a product that can look at data of the moment from different sources. Data on movie ticket sales, for example, might be mixed with information on weather, even Twitter messages, and presented as a shifting bar chart or a map, depending on what the customer is trying to figure out.
總公司設在加州帕羅奧圖的新創公司ClearStory Data已推出一種產品,可從不同的來源檢視即時的資料。例如,電影票銷售的資料,也許能與有關天氣和推特訊息的資訊混合,視顧客的需求,用長條圖或地圖來呈現。
Sharmila Shahani-Mulligan, ClearStory’s co-founder and chief executive, said the tricks were developing a way to quickly and accurately find data sources and figuring out how to present data in a way that was useful.
ClearStory的共同創辦人兼執行長夏米拉.夏哈尼─穆里根說,訣竅在於發展一種能快速準確找到資料來源的方法,並想出如何以有用的方式呈現資料。
“That way,” she said, “a coffee shop can tell if customers will drink Red Bull or hot chocolate.”
她說:「這麼一來,咖啡店就能知道客人要喝紅牛還是熱巧克力。」
原文參照:
http://www.nytimes.com/2013/11/11/technology/gathering-more-data-faster-to-produce-more-up-to-date-information.html
2013-12-03聯合報/G9版/UNITEDDAILYNEWS 田思怡譯 原文參見紐時週報七版上