My latest illustration entitled ‘Search Engine Champions’ featuring female warriors wearing armor featuring respective top ‘Search Engines’ logos — Google, Bing, Baidu, Microsoft, Yandex, Yahoo.
Ever since Google burst onto the scene with its pristine white homepage and barebones results page, search has remained essentially the same. There have been improvements: instant search, location- and history-based results, images, news clippings, video. But any query still begins with a user typing in a set of keywords into a text field and hoping that it serves up the appropriate answer. It is little surprise then that fully one quarter of search results fail, writes Stefan Weitz in his book, Search: How the Data Explosion Makes us Smarter.
Weitz is no doubt a legitimate observer and proven expert on search engines. Since 2009, he has served as a senior director of search at Bing, Microsoft’s Google competitor. During his time in that job, Weitz’s job involved figuring out what was coming in search, what new models exist, and travelling the world to meet people and bring back to Redmond ideas about where search is going. His book is a distillation of his five years of thinking about search.
Too much information
That the internet has overloaded our senses is a common refrain. We now have more information than we can make sense of.
The web has grown from 441 million pages in 1996 to more than 10 trillion pages today.
But web users have faced the problem of too much information since the early days of the mainstream internet. “As the growth of the Internet continues at an unprecedented rate—recent industry figures estimate that 1.5 million pages are added to the Internet each day—the average search returns an overwhelming number of results for users to sort through,” Google noted in a press release announcing its official launch in September 1999.
Google solved that problem with a fantastic search engine. But search as envisioned in the 1990s is no longer fit for purpose. The challenge of providing users with the appropriate answers to queries has grown much harder. There are two main reasons for this.
First, there is simply much more internet to sift through. In 1996, the web consisted of some 100,000 sites, totalling an estimated 441 million pages. Today, search engines routinely index more than 10 trillion pages. And that’s just web pages. Previously unrecorded activities like exercising, adjusting the thermostat, commuting, turning lights on and off, or watching television can all generate data now. And data creates data: Give any half decent data scientist two bits of information and she should be able to come up with a new insight.
The second reason is all these data are not available in one, public, indexable space like the web. Social networks keep their data closed off to all but themselves. Internet-connected devices laden with sensors have no means of talking to each other. Even a smartphone very likely has no idea what its owner was doing on his desktop computer.
A “hinge” between man and machine
What does a 21st-century search system look like? Weitz sees it something that is proactive rather than reactive.”Formerly, a stimulus was a query you entered into a search box,” writes Weitz. “In systems like Google Now and Microsoft’s Cortana, the stimulus is no longer a keyword but rather a change in state.”
The selling point of products like Google Now and Cortana is that they can provide answers before their users have had a chance to ask the question. About to finish work and head home? Google Now tells you your usual metro line is running with severe delays so maybe you should try another route. About to pop out for lunch? Cortana helpfully suggests carrying an umbrella as rain is forecast. Such utopian ideas of an optimised modern existence aided by technology lie at the heart of much tech research today.
But it’s not just mundane chores. What if you stopped reading an article halfway on your computer but were able to pick up where you left off on your tablet? What if you machine knew you were interested in stories about, for example, Ukraine and that you already had plenty of background on the conflict there. Could it strip out all the things you already know from the latest piece you’re reading? Or add in context that it knows you don’t have?
The evolution of search
Weitz sees a more evolved version of this as the future of search. Just as Google became a convenient bridge between existing input technology (text entry using keyboards), the future of search will be a “hinge” that connects the physical and virtual worlds. To get there, technology firms are racing to build analogues to human senses: machine vision, machine hearing, rationale. It is search, but not as we know it.
Weitz writes: We must think of search as the omniscient watcher in the sky, aware of everything that is happening on the ground below. For this to happen, search itself needs to be deconstructed into its component tasks: indexing and understanding the world and everything in it; reading senses, so search systems can see and hear (and eventually smell and touch) and interact with us in more natural ways; and communicating with us humans in contextually appropriate ways, whether that’s in text, in speech, or simply by talking to other machines on our behalf to make things happen in the real world.
Such ideas sound far-fetched, even alarming. But the manner in which the largest tech companies gobble up every last bit of data they can get their hands on suggests that Weitz is far from the only one thinking such thoughts.
For search to become “omniscient,” it requires data. Lots and lots of data. That’s why Google, in 2012, combined all of its privacy policies into one—so that it could also combine disparate data troves into one giant pot. But even mighty Google is restricted to data coming from its own websites and services it provides to third parties. It cannot access data on Facebook—Google doesn’t know what you “liked”—or Amazon or within iOS apps. For search to become truly useful in the way that Weitz and his peers in the industry foresee, all these data sources need to cooperate. That’s not going to happen; Google’s data is its competitive advantage. The same applies to its competitors.
“The islands of information are a huge challenge,” Weitz tells Quartz. “If you think about the forces that hold a lot of this back, that notion of data sharing across stores, no one’s figured it out.” Companies are not going to share customer profiles. Nor do users want them to. Tech industry commentators complain about “silos” and “closed ecosystems” created by the reluctance of, for example, Apple to share information with Google. But these silos are what protect users from having their data widely shared.
Here is my Work-In-Progress sketch before the coloring. This art was inspired by the online game — League of Angels.
Control of networks
It’s true that with neural nets, you lose some control. But you don’t lose all of it, says Chris Nicholson, the founder of the deep learning startup Skymind. Neural networks are really just math—linear algebra—and engineers can certainly trace how the numbers behave inside these multi-layered creations. The trouble is that it’s hard to understand why a neural net classifies a photo or spoken word or snippet of natural language in a certain way.
“People understand the linear algebra behind deep learning. But the models it produces are less human-readable. They’re machine-readable,” Nicholson says. “They can retrieve very accurate results, but we can’t always explain, on an individual basis, what led them to those accurate results.”
What this means is that, in order to tweak the behavior of these neural nets, you must adjust the math through intuition, trial, and error. You must retrain them on new data, with still more trial and error. That’s doable, but complicated.
In any event, deep learning has arrived on Google Search. And the company may have used other forms of machine learning in recent years, as well. Though these technologies sacrifice some control, Google believes, the benefits outweigh that sacrifice.
The race for the roads
Google recently unveiled a new series of prototype vehicles built from the ground up to be self-driving cars, having previously modified Lexus sport utility vehicles and Toyota hybrid cars for testing purposes.
Its new car aims to completely replace human control with artificial intelligence, reducing controls to a destination selector and a start/stop button. A version with a human driver will be tested on public roads in the near future.
Baidu is taking a more traditional route to the self-driving car. Its head of deep learning, Kai Yu, said last year that the technology it was developing was designed to assist drivers rather than replace them.
The Chinese firm has its own data-mapping service, which is a prerequisite to any automotive robotics project, and invested $10m in a Finnish mapping startup IndoorAtlas in September last year.
It also has undertaken extensive artificial intelligence research, including machine learning and the technologies needed for computer vision for cars and other robotics, rivalling those of Google.
But Baidu has one major advantage over its American rivals. Many of the driving-assisted vehicles on the road today, including the Tesla Model S, are technically capable of driving themselves.
During the coloring, I made sure to utilise the respective top ‘Search Engines’ logos’ corporate colors — Google, Bing, Baidu, Microsoft, Yandex, Yahoo.
Where Yahoo fell off the grid in the search engine race
Google’s sharp contrast with Yahoo on infrastructure offers powerful lessons about building a sustainable business, especially in the rapidly transforming technology landscape.
At the beginning of the new millennium, Google and Yahoo started down very different paths to attain the enormous scale that the growing size and demands of the Internet economy (search, email, maps, etc.) required. For Yahoo, the solution came in the form of NetApp filers, which allowed the company to add server space at a dizzying rate. Almost every service that Yahoo offered ultimately ran on NetApp’s purpose-built storage appliances, which were quick to set up and easy to use, giving Yahoo a fast track to meet market demand (and soon made the company NetApp’s largest customer).
But in nearby Mountain View, Google began work on engineering its own software-defined infrastructure, ultimately known as the Google File System, which would function as a platform that could serve a diverse range of use cases for all the services Google would offer as part of its future ecosystem. Instead of using the latest storage appliances as a foundation, the Google File System used commodity servers to support a flexible and resilient architecture that could solve scalability and resiliency issues once and for all, simplifying and accelerating the future rollout of a wide range of web-scale applications, from maps to cloud storage.
It took four years of ongoing development, and enormous amounts of engineering resources, before the Google File System reached the point where the company used it for mission-critical operations. Meanwhile, Yahoo had been able to add NetApp filers almost immediately to keep up with growing demands for its services. In the race to dominate the Internet landscape, it appeared Yahoo had pulled far ahead.
However, Yahoo’s rapid go-to-market approach also began to show some cracks. As demand continued to expand and diversify, downsides to the appliance-based infrastructure emerged in the form of redundant engineering work, increasingly complex and inefficient environments and finally, mounting vendor costs. When Yahoo added a new service, it needed to re-engineer the NetApp platform for that specific use case.
As a result, identical challenges for separate services, such as Yahoo Search and Yahoo Mail, had to be solved multiple times on different infrastructures. The fragmented infrastructure also exposed greater resource inefficiencies, as each use case required separate server space and compute power that couldn’t be shared across the platform. On top of that, the cost to run NetApp appliances grew as fast as Yahoo did, taking a significant bite out of the company’s revenue.
On the other hand, Google built its file system in anticipation of these challenges, so that adding new use cases or fixing underlying architecture challenges could be done efficiently. After the purchase of YouTube, for example, Google could simply say, “throw away your back-end and we’ll put you on our platform.” Engineers could make upgrades to the underlying architecture once, and the solution would apply across all of Google’s services.
Finally, the flexible platform allowed resources and compute power to be shared across different use cases, so that when servers weren’t busy on search they could be used to process email. It didn’t hurt that all this was built on commodity hardware, which offered costs that decreased in line with Moore’s Law.
As the cost and complexity of Yahoo’s underlying infrastructure mounted, the company simply could not afford to match Google’s pace in developing and deploying major new applications.
Russia as a superpower launching a new search engine order?
Yandex NV, Russia’s biggest technology company, has figured out how to avoid nationalisation or a foreign ownership ban. Big Tech in the U.S. should pay attention: The governance scheme Yandex appears to have worked out in consultation with the Russian government could be a good solution for companies that are de facto public utilities under private control.
Yandex, set up in 2000 to monetize a search engine developed in the 1990s by the team of co-founder Arkady Volozh, is as close as it gets in Russia to a Silicon Valley-style internet giant. For a long time, it mainly aped Google’s services for the Russian market, but it has grown into a conglomerate that developed or bought up other businesses, from marketplaces to delivery projects. It’s not just Russia’s Google but Russia’s Amazon and Russia’s Uber, too (it first outcompeted Uber’s Russian operation, then swallowed it up). In fact, when Russian President Vladimir Putin signed a “sovereign internet” law earlier this year, officially meant to keep web services functioning inside Russia should the U.S. cut the country off from the worldwide computer network, many said Yandex would be that “sovereign internet.”
Yandex’s size and its ability to match the tech giants have made the company strategic for the Russian government. As early as 2009, Volozh had to protect Yandex from nationalization or from being taken over by one of Putin’s billionaire friends by issuing a “golden share,” which could block the sale of more than 25% of the company’s stock, to state-controlled Sberbank.
This may read like a distinctively Russian story, in which a group of business founders is trying to avoid a state takeover and the Kremlin prefers not to establish formal control over the national tech champion while keeping a close eye on it. The schools provide a convenient smokescreen both for the government and for investors.
But what Yandex has done isn’t only relevant within the context of Putin’s Russia. It could be seen as a template for Big Tech, even though Yandex’s market capitalization, at $13.2 billion, is only a fraction of Alphabet Inc.’s ($910.6 billion) or Facebook Inc.’s ($562.9 billion).
Obviously, the tech firms are opposed to such heavy-handed regulation, but what they do on their own only brings them closer to a confrontation with governments, both in the U.S. and in Europe. Facebook’s refusal to police misleading political advertising and Google’s data-sharing practices scream for some kind of state interference.
Like Yandex, the companies could act preemptively to set up governance structures that would veto business ideas viewed as damaging to society’s interests. Vesting veto powers in councils made up of the representatives of top universities and nongovernmental organizations could accomplish that purpose. If such a structure can win approval even from an authoritarian regime such as the Russian one (with the caveat that academic institutions in Russia aren’t as independent as those in the West), it could probably satisfy most Big Tech critics in democracies, too.
The alternative, as in Yandex’s case, could be far more restrictive.
In my final completed illustration, you can see the full colored detailing compared to the one above. Whom is your favorite female ‘Search Engine’ warrior?
How Baidu will steal the show despite onlookers not giving them high hopes
It’s a common trope to attribute the failure of foreign tech companies in China to some form of government intervention. Observers often readily assume that the ruling Chinese Communist Party will throw hurdles at overseas firms in order to ensure local counterparts triumph. In the case of Google and Baidu, however, that’s only partially true. While Google indeed exited the market due to political grievances, it wasn’t those grievances that dinged its popularity among Chinese consumers.
As authors Sherman So and J. Christopher Westland explain in Red Wired, Baidu successfully won over users in the 2000s largely due to its strategic execution, though the government’s influence still loomed over the rivalry. Broadly, the Beijing-based company’s success can be attributed to the following factors.
Many of China’s successful early web companies reached consumers by inking deals with internet cafe operators. Paying a fee would ensure that a firm’s program appeared on a PC’s desktop, or as the homepage in the default browser. Baidu, keen to increase visibility, paid cafe chains to place its search engine prominently on machines.
Google did this as well, but not as aggressively. As Steven Levy writes in his book In the Plex, internet companies would often pay franchise operators to switch out a rival company’s software with their own. Google refused to engage in this practice and play dirty—which ceded an edge to Baidu on reaching China’s first-time internet users.
The nascency of China’s advertising industry in the 2000s also gave Baidu a leg up over Google. When Google launched AdWords in 2000, it primarily intended it to be a self-service way to advertise brands online—long-tail small businesses could place bids for ads to appear at the top of results for certain keywords, and pay via credit cards.
In China, however, credit card penetration was relatively low, and the internet was a much newer concept to the small businesses that make the ideal AdWords customers. In order to convince such firms to buy ads on Baidu (its system closely mimicked Google’s AdWords), the company employed legions of workers, both in-house and through agents, to make cold calls across China. “Agents taught potential customers the keyword bidding process step-by-step, and when that was too difficult they simply did the work for them. They also collected payment on Baidu’s behalf,” writes So.
While Google also worked with third-party agents to boost ad sales, it had no in-house team of its own. In Baidu’s 2008 annual report (pdf, p. 86), the company reported that 3,855 of its 6,387 employees worked in sales and marketing. By comparison, So notes, Google in China employed only about 800 people overall.
For luring consumers, Baidu also employed tactics that Google typically shunned. The company spent sizable amounts promoting its brand through traditional offline advertising, most notably in 2008 when it paid to become the main sponsor of state broadcaster CCTV’s annual New Year’s gala (imagine Google spending money to sponsor NBC’s New Year’s Eve ball drop). It even aired an advertisement (video) depicting a Chinese wordsmith outwitting a foreigner donned in a Western suit and top-hat, in an obvious jab at its foreign rival.
By the end of 2015, Baidu’s AI algorithms will have surpassed humans in Chinese speech recognition, a feat many observed to be way ahead of the game than Microsoft.
The large cap companies — Baidu, Alibaba and Tencent combined — are estimated to represent a market capitalisation of US $400 billion. This signals international business leaders’ cue to realise Chinese players of the digital ecosystem are at their forefront. The future of digital commerce and innovation, whether through search engines or not, is undoubtedly China’s to race against.
Rethinking the way the web works
Perhaps it is possible to have data privacy as well as what Weitz calls “a more capable web.” A future in which, to use Weitz’s example, you implicitly “like” something when your pupils dilate and your smart glasses note that or where your phone’s microphones are always on, helping companies record noise levels to figure out whether a venue is empty or full—and as tech veterans, we could be ok with that.
This have-your-cake-and-eat-it-too future will not happen without a thorough rethinking of how the web works. Business models need to change. Users’ ownership and delegation of data-use rights needs to change. New models and frameworks will need to be created to make all of this work—and that’s without even going into the technological difficulties of making such a future possible.
One idea is to create “attention banks,” where watching ads earns users services in return. (A small-scale version of this exists: Companies like Jana offer mobile phone users in the poor world free airtime in exchange for viewing ads.) Another idea could be to adopt the Hollywood rights model for data use: People can license a company to use their data for a particular purpose for a limited period of time. When the rights expire, the company must delete that data as well as any secondary data created from it.
The future of search engines looms large with a myriad of potential setbacks that can follow. Legislation arguably represents the key battleground for determining the future of search engines. Professor Helen Margetts of the Oxford Internet Institute believes that the real question is whether existing laws and regulations on issues such as fraud, copyright, libel, data protection and freedom of expression can be effectively enforced online. In some areas new bilateral agreements are emerging which may amount to international agreement, such as consensus around measures tackling child abuse images online.
Companies of a certain size, are hampered in growth, as they inevitably mature and stay still. The investor base changes as does the workforce. Top talent leaves to chase the next big thing or even hit the beach. Google’s and Microsoft’s business, to stay truly cutting edge, needs the best engineers to keep flying through the doors and needs shareholders to stay patient. Everyone else should stay tuned to see what the next huge global phenomenon will be when it comes to search engines.