Startups Drive Crowdsourcing - WSJ, 2/28/2012

“The risk is that someone getting paid for just a few hours of work isn’t going to do a great job,” he says. “You need to build in quality control.”

ENTER: HUMANOID. Scroll down to read our case study - Fraud and Accuracy on mTurk.

Accuracy and Fraud on mTurk (Case Study)

Background

Humanoid uses proprietary quality assurance and cost optimization technology to provide scalable, high-quality crowdsourced labor and outsourcing services to businesses. The company first built SpeakerText, a successful video transcription service built atop Amazon Mechanical Turk, then repurposed the core workforce management technology to do tasks beyond transcription. The result is Humanoid. 

*****************************************************************************************

Overview

Humanoid was recently contracted to perform 3,000 distinct data entry tasks for Fooducate, a mobile app company. As a result, Humanoid hired 751 unique workers from Amazon’s Mechanical Turk labor marketplace.

Through algorithmic analysis, Humanoid’s software identified 61 workers (8% of the workforce) engaged in fraud and banned them as a result. Once fraudsters were weeded out, mTurk workers generated 78% “per data field” accuracy.

Through applying various quality assurance and improvement techniques, Humanoid ultimately delivered an accuracy of 98% to Fooducate.

*****************************************************************************************

Main Narrative

Fooducate, a mobile food nutrition app company, approached the Humanoid team with a problem: They had a backlog of 500,000 user-generated food label photos. Fooducate needed these labels transcribed and entered into their food database. The Fooducate team had tried alternative solutions and found them wanting.

We accepted the job.

After an initial phone call, Fooducate sent over a sample dataset. We designed a task template for them and ran it through our QA system. This did not work. The way we had structured and formatted the data did not match Fooducate’s formatting needs. Fail.

In November 2011, we gave the Fooducate team access to Humanoid’s newly launched drag ‘n drop task template creator. Using our tools, the Fooducate team designed their own version of a task template. The results were mostly good, but the task instructions turned out to be incomplete, resulting in improper labeling of some ingredients (e.g. capitalization of proper nouns that they wanted to be input as lowercase).

[Side Note: Improper or unclear directions, not a failure of technology, is the most common issue our customers have.] 

By January 2012, Fooducate had worked out all the kinks in its task instructions and put through a batch of 3,000 food labels. Each label included 3 photos from different angles. 

*****************************************************************************************

Data

Humanoid turned Fooducate’s 3,000 tasks into 15,169 mTurk HITs. 751 unique mTurk workers accepted the work. We priced tasks at $0.03 per HIT for label transcription and $0.01 per HIT for transcript verification and review.

  • 3,000 tasks
  • 15,169 mTurk HITs
  • $0.035 per label transcription HIT
  • $0.015 per transcript review HIT

For some perspective, this task pricing is deemed “very low” and, as the kind folks at Amazon have warned us, tends to attract lots of scammers. 

The results? Humanoid automatically banned 61 workers––8% of the 751 workers––for deliberate fraud. This, mind you, is on top of mTurk’s own non-trivial efforts to filter out scammers. Fraud is a huge problem on Mechanical Turk, and based on conversations we’ve had with team there, one they’d making heoric efforts to combat.  

  • 751 unique mTurk workers
  • 8% engaged in fraud (61 workers)

With the fraudsters removed, a first pass of Mechanical Turk workers produced a “per field” accuracy level of 78%. This means that 22 out of every 100 ingredients entered into the Fooducate database would have been wrong.

Using Humanoid’s combined system of worker reputation and result verification on top of Mechanical Turk, we improved Fooducate’s per field accuracy to 98%. The results were returned to them within 36 hours of submission.

  • 78% mTurk accuracy after fraud filtering 
  • 98% final Humanoid accuracy
Importantly, the initial mTurk results would have been much, much worse had fraudsters not been weeded out—probably in the 50% accuracy range. 

*****************************************************************************************

Comparative Accuracy: In-House Temp Workers

How good are Humanoid’s results? Back in November 2011, we hired a team of temp workers from 3 local staffing agencies to work out of our San Francisco office on Fooducate and related tasks.  

On average, the temps worked 8 hours a day, earning $12-15/hour (we paid the temp agency $21-23/hour). These workers had been pre-screened and tested by the temp agency for computer, language and typing skills.

  • 15 temp workers
  • $22/hour paid to temp agency
  • $14/hour paid to workers
  • 86% average temp worker accuracy
  • 20% dismissed for performance
  • 22% lower productivity in the afternoon

Ultimately, we dismissed  20% of the temps were for poor performance––one for gross incompetence (didn’t know how to use the internet) and two for poor performance. Amongst the remaining workers, they produced an average per data field accuracy of 86%.

Interestingly, the temps completed 22% fewer tasks in the afternoon (on average) than in the morning, clearly demonstrating the effects of fatigue and/or boredom on productivity. 

Conclusion

With algorithmic supervision and quality assurance in place, Mechanical Turk can actually produce results superior to a pre-screened, human-supervised workforce. However, without an intelligent system of algorithmic supervision, Mechanical Turk customers can expect low accuracy, fraud-filled results.

Buyer beware.

Interested in giving Humanoid a spin? Get started now.

Like this post? Submit it to Hacker News.

“Big Firms Try Crowdsourcing,” 1/17/12, WSJ

Revenues of business-focused crowdsourcing firms grew 74% between 2010 and 2011, and 53% a year earlier, according to preliminary data from 14 large crowdsourcing firms with total revenues of about $50 million, by Crowdsourcing.org, a research firm which tracks the industry. Companies that have assigned work to the crowd say it is generally cheaper and faster than hiring temps or traditional outsourcing firms.

Douglas Rushkoff, http://snip.it/s/12jt

As we move into an increasingly digital reality, we must learn not just how to use programs but how to make them. In the emerging, highly programmed landscape ahead, you will either create the software or you will be the software. It’s really that simple: Program, or be programmed.

Our CTO, Matt Swanson

Matthew Swanson had a passion for both artificial intelligence and entrepreneurship. Now just two years out of school, the Carnegie Mellon alum’s startup, SpeakerText, has spawned Humanoid, a revolutionary new venture with backing from Google Ventures.

“I’m fascinated with modeling the human brain,” said Swanson, who earned his masters at CMU’s Robotics Institute (RI). “I naturally chose Carnegie Mellon because it has the top researchers and facilities.”

“Entrepreneurship is in my blood. I knew I was going to start a tech company and CMU was the bridge in getting me there.”

from “Cloud Work”

“Return of the Human Computers,” The Economist

Over the past few years, human computing has been reborn. The new generation of human computers carry out different tasks, but they mirror their predecessors in many other ways. They are being drafted in to perform tasks that computers cannot. They are employed in large numbers and are organised into streamlined workflows. And, as was the case in the age before electronic computers, their output is combined to generate results that could not easily be produced in any other way.”

Want a Human Computer of your own? Humanoid is at your service. 


from “Return of the Human Computers” - The Economist, 3 December 2011

Mentor ship time w/ Humanoid co-founder Tyler Kieft and Netflix co-founder Marc Randolph.

Mentor ship time w/ Humanoid co-founder Tyler Kieft and Netflix co-founder Marc Randolph.

”We launched SpeakerText, and it took us about a year and a half to get actual quality results from Mechanical Turk,” Mireles told VentureBeat. The problem, Mireles said, is that it was extremely difficult to ensure the quality of an anonymous, distributed workforce. For every dollar the team spent on labor on Mechanical Turk, it had to spend two dollars on quality assurance and cleanup. Everyone is trying to game the system, because there’s no accountability.

Mechanical Turk is a marketplace with no sheriff,” Mireles said.

-from Humanoid puts Human Brainpower to Work in the Cloud

Software can now be truly intelligent. Think about that.

Humanoid is alive!

SAN FRANCISCO  –– Humanoid launches the first human brainpower API that actually works. The Internet service, built by a team of Carnegie Mellon robotics researchers, offers computer programmers a reliable way to put human intelligence into software applications.

 
Humanoid offers a drag-and-drop interface that allows engineers to create tasks and send instructions to humans. Proprietary software then breaks the tasks into small pieces and routes them to workers across the globe. Meanwhile, Humanoid’s artificially intelligent workforce manager ensures accuracy and prevents fraud without human oversight. Developers send and receive the resulting data through a simple programming interface, known as an API.

Humanoid rents out its robot-supervised army of 20,000+ workers for $4.99/hour.

In the past, if an engineer wanted to integrate human labor into an app––to transcribe handwriting in a photo, for example––the developer would not only have to hire human workers, but also hire human managers to ensure quality. It was time consuming, expensive, and hard to do. As a result, few developers tried, and the ones that did found scaling impossible.
“Through software, we’re replicating what good bosses have done since the beginning of time,” says Matt Mireles, Humanoid’s co-founder and CEO. “We look at each person’s track record and we ask the question: ‘How well do I know this worker? Should I trust him, or do I need to verify the work?’ Newbies get more scrutiny; people with a strong track record get less––bad people get blocked.”
Artificial Intelligence Goes Mainstream
Humanoid is part of a new trend: Companies in California are bringing artificial intelligence out of the research labs and into the mainstream. Where Humanoid has invented the automated workforce manager,  Apple recently launched Siri, an automated personal assistant for the iPhone.

Humanoid uses statistics to predict the accuracy of completed tasks by analyzing worker history and a host of variables like time of day, location and indicators of worker fatigue. If Humanoid’s algorithms suspect that the result is not up to the company’s exacting standards, the task is routed to a more trusted worker for review and, if applicable, fixing.

Founding Story
Humanoid had its roots in SpeakerText, an application to transcribe YouTube videos using a combination of speech recognition and Internet workers. Ensuring accuracy turned out to be a huge problem for the company and led to cost overruns and countless headaches.

“We were spending twice as much money hiring people off the street to fix mistakes as we were paying out to our Internet workers” says Mireles. “We tried Mechanical Turk. We tried oDesk. We even looked at CrowdFlower. Nothing could give us the quality we needed, so we built the solution ourselves.”

In the end, it took the company eighteen months to build an autonomous workforce manager. “We are at the frontier of machine learning,” explains Humanoid’s Chief Technology Officer Matt Swanson, a graduate of the Robotics Institute at Carnegie Mellon University. “The technology problem is very difficult. Web search was in the same position in 1996. Until now, no one had solved this.”

Once Humanoid’s founders realized the general applicability of their solution, they sensed opportunity and immediately transitioned the company away from simply transcribing videos to opening up their human brainpower API to the masses.

“We saw all these companies at the stage we were at a year ago, seduced by this awesome idea of ‘labor in the cloud,’ but stymied by the disappointing reality of existing technologies,” explains Mireles. “We said to ourselves: ‘They shouldn’t have to go through everything that we did. We’ve found a better way.’ And so we decided to pivot.”

Not an Easy Problem
Humanoid is not the first company to try this idea. In 2006, Amazon launched Mechanical Turk, an experimental marketplace for micro-work that it dubbed “artificial artificial intelligence.” Though the service created much excitement amongst the techie crowd, it has remained a fringe offering due to endemic fraud and quality issues.

In 2009, CrowdFlower launched with a promise to create order from the chaos of Mechanical Turk by giving the same tasks to several workers and checking for agreement. However, the company has since targeted a niche market of enterprise customers and now focuses almost exclusively on high-end consulting.

Similarly, oDesk, an outsourcing marketplace started in 2003, started to grow like a weed after the financial crisis. However, like mTurk, oDesk leaves quality management to the customer, causing widespread frustration. In academia, for example, many grad students use oDesk to enter data into spreadsheets. The standard practice is to hire three oDesk workers to do a task, and only accept the result if they all produce the same result.

“It’s a massive headache,” says Stanford PhD candidate Jennifer Doleac. “oDesk sounds great on paper, but to get good results you need to invest a ton of time and energy. I wish there was a better way.”

Humanoid is initially available on an invite-only basis. Sign up now at: http://getHumanoid.com

For more information, send an email to: info at get humanoid.com