Home > Hadoop Summit: Yahoo Gathers the Stuffed Elephant Crowd

News

Hadoop Summit: Yahoo Gathers the Stuffed Elephant Crowd

3/28/2008

Yahoo hosted the first-ever Apache Hadoop Summit this week in Santa Clara, CA. The day-long event presented a program of speakers from the Hadoop developer and user communities, including representatives from Yahoo, IBM, Microsoft, Facebook, Google, and University of California, Berkeley, among others.

The event drew around 500 attendees, but event organizers were unsure of the exact number. They were, in fact, caught off guard by the turnout and had to change venues to accommodate a standing-room-only crowd.

"We organized the summit because we've been investing a lot in Hadoop ourselves, and we knew there was a large community of Hadoop users out there that mostly haven't met each other," said Yahoo Technical Evangelist Jeremy Zawodny. "I guess it was larger than we thought."

The Hadoop Framework is an open source, Java-based distributed computing platform designed to allow implementations of MapReduce to run on large clusters of commodity hardware. Google's MapReduce is a programming model for processing and generating large data sets. It supports parallel computations over large data sets on unreliable computer clusters.

Yahoo hired Hadoop's creator, Doug Cutting, early last year to work full-time on the framework. Cutting created the Lucene open source information retrieval library with Mike Cafarella, and the Nutch open source search engine based on it. Both projects are now managed through the Apache Software Foundation.

"The momentum around Hadoop is growing every day," Cutting said. "It's really exciting to watch."

Cutting called Yahoo's resource commitment to the Hadoop framework "considerable," but offered no details. Yahoo has made a very public commitment to Hadoop. In February, it launched what company representatives claimed to be the world's largest Hadoop production application. Called the Yahoo Webmap, the application runs a 10,000-plus-core Linux cluster and produces data used in every Yahoo Web search query, according to company literature.

The initial intended use of Hadoop within Yahoo was to support Web search, Cutting said, by building the Web search index and maintaining that massive collection of data. But although it is making the Yahoo search engine more easily scalable and reliable, he said, the majority of in-house users are actually employing Hadoop for data exploration.

"It turns out that there are all these other people within the company who want to be able to access and analyze these massive data sets -- access logs, event logs, Web and geographic data -- and use them to improve the Web search software itself," Cutting said. "So they're using Hadoop for analysis to improve the software, as opposed to actually implementing the Web search. That's where we're seeing the big payoff."



Recommended Reading
  • Microsoft Mends Breach in Open Source Sandcastle

    Microsoft has released all of the source code used in its Sandcastle project, which is now published at the CodePlex open source developer's Web site, according to a blog. Sandcastle helps developers of managed class libraries create uniform documentation on their projects, using MSDN style.

  • Lumens Debuts SXGA Document Camera

    Lumens Integration this week debuted a new document camera and presentation system called the DC260 SXGA Digital Visual Presenter. The new gooseneck-style system is the first in Lumens' document camera lineup to support HD output via HDMI.

  • U Liverpool Deploys iSCSI in Virtualized SAN

    The University of Liverpool Department of Computer Science is moving away from direct-attached RAIDs to a virtualized SAN environment using StorMagic's SM Series iSCSI Storage Area Network.

  • Indiana U, Wayne State Teams Capture Wins in Imagine Cup 2008

    Winners of the 2008 Imagine Cup technology competition were announced Tuesday in Paris. Student teams from American universities took top honors in two categories and earned achievement awards in other areas. Microsoft, which hosted the event, said it was the most successful run for American teams in the Cup's six-year history.

  • IE Is Least-Patched Browser, Report Says

    According to a report released last Tuesday, more than 40 percent of Internet surfers don't use browsers with up-to-date security patches--and Internet Explorer users are the biggest culprits.

  • Ballmer Wants Board Change at Yahoo

    Microsoft's executives have been talking with investor and corporate raider Carl Icahn about renewed plans for Microsoft to acquire part or all of Yahoo, provided that Yahoo's board is replaced. The details were described in an open letter issued Monday by Icahn, which is addressed to Yahoo's shareholders.