Background
Humanoid uses proprietary quality assurance and cost optimization technology to provide scalable, high-quality crowdsourced labor and outsourcing services to businesses. The company first built SpeakerText, a successful video transcription service built atop Amazon Mechanical Turk, then repurposed the core workforce management technology to do tasks beyond transcription. The result is Humanoid.
*****************************************************************************************
Overview
Humanoid was recently contracted to perform 3,000 distinct data entry tasks for Fooducate, a mobile app company. As a result, Humanoid hired 751 unique workers from Amazon’s Mechanical Turk labor marketplace.
Through algorithmic analysis, Humanoid’s software identified 61 workers (8% of the workforce) engaged in fraud and banned them as a result. Once fraudsters were weeded out, mTurk workers generated 78% “per data field” accuracy.
Through applying various quality assurance and improvement techniques, Humanoid ultimately delivered an accuracy of 98% to Fooducate.
*****************************************************************************************
Main Narrative
Fooducate, a mobile food nutrition app company, approached the Humanoid team with a problem: They had a backlog of 500,000 user-generated food label photos. Fooducate needed these labels transcribed and entered into their food database. The Fooducate team had tried alternative solutions and found them wanting.
We accepted the job.
After an initial phone call, Fooducate sent over a sample dataset. We designed a task template for them and ran it through our QA system. This did not work. The way we had structured and formatted the data did not match Fooducate’s formatting needs. Fail.
In November 2011, we gave the Fooducate team access to Humanoid’s newly launched drag ‘n drop task template creator. Using our tools, the Fooducate team designed their own version of a task template. The results were mostly good, but the task instructions turned out to be incomplete, resulting in improper labeling of some ingredients (e.g. capitalization of proper nouns that they wanted to be input as lowercase).
[Side Note: Improper or unclear directions, not a failure of technology, is the most common issue our customers have.]
By January 2012, Fooducate had worked out all the kinks in its task instructions and put through a batch of 3,000 food labels. Each label included 3 photos from different angles.
*****************************************************************************************
Data
Humanoid turned Fooducate’s 3,000 tasks into 15,169 mTurk HITs. 751 unique mTurk workers accepted the work. We priced tasks at $0.03 per HIT for label transcription and $0.01 per HIT for transcript verification and review.
- 3,000 tasks
- 15,169 mTurk HITs
- $0.035 per label transcription HIT
- $0.015 per transcript review HIT
For some perspective, this task pricing is deemed “very low” and, as the kind folks at Amazon have warned us, tends to attract lots of scammers.
The results? Humanoid automatically banned 61 workers––8% of the 751 workers––for deliberate fraud. This, mind you, is on top of mTurk’s own non-trivial efforts to filter out scammers. Fraud is a huge problem on Mechanical Turk, and based on conversations we’ve had with team there, one they’d making heoric efforts to combat.
- 751 unique mTurk workers
- 8% engaged in fraud (61 workers)
With the fraudsters removed, a first pass of Mechanical Turk workers produced a “per field” accuracy level of 78%. This means that 22 out of every 100 ingredients entered into the Fooducate database would have been wrong.
Using Humanoid’s combined system of worker reputation and result verification on top of Mechanical Turk, we improved Fooducate’s per field accuracy to 98%. The results were returned to them within 36 hours of submission.
- 78% mTurk accuracy after fraud filtering
- 98% final Humanoid accuracy
Importantly, the initial mTurk results would have been much, much worse had fraudsters not been weeded out—probably in the 50% accuracy range.
*****************************************************************************************
Comparative Accuracy: In-House Temp Workers
How good are Humanoid’s results? Back in November 2011, we hired a team of temp workers from 3 local staffing agencies to work out of our San Francisco office on Fooducate and related tasks.
On average, the temps worked 8 hours a day, earning $12-15/hour (we paid the temp agency $21-23/hour). These workers had been pre-screened and tested by the temp agency for computer, language and typing skills.
- 15 temp workers
- $22/hour paid to temp agency
- $14/hour paid to workers
- 86% average temp worker accuracy
- 20% dismissed for performance
- 22% lower productivity in the afternoon
Ultimately, we dismissed 20% of the temps were for poor performance––one for gross incompetence (didn’t know how to use the internet) and two for poor performance. Amongst the remaining workers, they produced an average per data field accuracy of 86%.
Interestingly, the temps completed 22% fewer tasks in the afternoon (on average) than in the morning, clearly demonstrating the effects of fatigue and/or boredom on productivity.
Conclusion
With algorithmic supervision and quality assurance in place, Mechanical Turk can actually produce results superior to a pre-screened, human-supervised workforce. However, without an intelligent system of algorithmic supervision, Mechanical Turk customers can expect low accuracy, fraud-filled results.
Buyer beware.
Interested in giving Humanoid a spin? Get started now.
Like this post? Submit it to Hacker News.