The secret of the Chinese team winning the Microsoft COCO Challenge

The team from China has won the Microsoft COCO Challenge for two consecutive years, and this article will analyze the real secrets of their success.

Introduction to Microsoft COCO

Microsoft Common Objects in Context (COCO) is a competition with Microsoft’s 2014 Microsoft COCO dataset, which is the one of the most popular and authoritative game in computer vision. After the closure of the ImageNet competition, the COCO competition has become the most authoritative and important Benchmark in the field of object recognition and detection. This is the only competition that can bring together technology companies such as Google, Microsoft, Facebook or other innovative companies and top universities in many countries.

The specific tasks of the COCO 2018 Challenge include Object Detection Task, Panoptic Segmentation Task, Person Keypoints Detection Task, DensePose Task.

1. COCO Object Detection Task

The COCO Object Detection Task aims to advance advanced technologies in the field of target detection. With the maturity of object detection technology in recent years, COCO is no longer limited by bounding detection tasks. While the leaderboard remains open, the bounding box detection task is not the main challenge; instead, the competition encourages researchers to focus on instance segmentation tasks that are more challenging and visually informative.

2. COCO Panoptic Segmentation Task

The goal of the COCO Panoptic Segmentation Task is to advance the state of the art in scene segmentation. Panorama segmentation needs to deal with object classes and event classes, it unifies two typical semantic and instance segmentation tasks. The definition of “Panoptic” refers to “including everything that is visible in a view”, which means a unified, globally split view.

3. COCO Person Keypoints Detection Task

The COCO Person Keypoints Detection Task needs to locate “person keypoints” under challenging and uncontrolled conditions. The key point detection task here is need without giving the person’s position to simultaneously detect and locate the key points of the person. 

4. COCO DensePose Task

The COCO DensePose Task needs to locate intensive key points in challenging and uncontrolled conditions. The DensePose mission involves simultaneously detecting and locating people’s dense key points, and mapping all character pixels to the 3D surface of the human body.

The incredible challenge scores of the Chinese team

Not long ago, the results of the latest COCO competition in 2018 were released on the ECCV official website: The Chinese team won the championship in all six tasks!

  • Megvii Technology Limited (Megvii) won the following four tasks: Object Detection Task, Panoptic Segmentation Task, Person Keypoints Detection Task and Mapillary Panoptic Task.
  • Beijing University of Posts and Telecommunications Automation School (BUTP-PRIV) won the DensePose Task.
  • DiDi Map Vision (Didi) won the  Mapillary Detection Task.

Why the Chinese team is so 'powerful'?

It’s really amazing and curious to get such a result in the Microsoft COCO Challenge. Is it because the Chinese are getting smarter? I don’t think so.

Access to data is not subject to privacy restrictions

China is a weak privacy-protected country and the region with the most serious privacy leaks in the world because Chinese users do not have the right to dispose of privacy. Prior to 2015, there were no practical privacy regulations in China, so personal privacy could be obtained and utilized by any company without personal consent. It was not until 2015 that new clauses were added to the corresponding law, indicating that part of the privacy was not available to the company, at least not without the permission of the government.

The authorization and transfer of the user’s personal information are usually agreed to be transferred to the enterprise under forced circumstances. Essentially, these users have no way to prevent this from happening. Because if you disagree with the company collecting your data, it will refuse to provide you with any services, include cannot run their APPs. It sounds incomprehensible, but it is a fact in China.

While Facebook was exposed to the illegal use of 50 million user data at this time, but Robin Li, founder of Baidu, a technology company based in China, delivered such the following speech at a high-level forum:

Robin Li

“Chinese people are more open, or less sensitive to this privacy issue. If they are willing to use privacy, or exchange convenience or efficiency, they are willing to do so in many cases.”

Therefore, it is very easy for Chinese technology companies to obtain data samples for training machine learning. All text, pictures, audio and video materials generated by hundreds of millions of users every day are taken away by others for free like the goods in an open warehouse, which is totally impossible in many countries.

A lot of cheap manual annotation data

We all know that China is the foundry of many products, where there is a large amount of cheap labor to guarantee the production of cheap goods for consumers in many countries around the world. However what you don’t know is that there are a lot of “factories” are processing data, they provide a large amount of data processing services for Chinese artificial intelligence companies.

These data factories employ hundreds to thousands of people. Their daily job is to manually mark various images (including strange angles or low pixels) that are crawled from global social networks. For example: here It is a “ladder”, here is “eyes”, here is a “sofa” and so on. Or use a mobile phone to record some customer-specified voices, such as a few words phrases or some long sentences, and the customer requires strict confidentiality of the recorded content. In the eyes of these workers, the job is as simple as chatting in social software. Each of them needs to process thousands of pictures or record hundreds of voices a day, and they are very satisfied to do this to earn about 4,000 RMB per month.

They don’t know what these tagged images or original recordings are used for, in fact they are handed over to the technology companies like Baidu, Alibaba, and Megvii to train their artificial intelligence models. By manually marking hundreds of mark points on the photo, let the computer know where the inner corner of the eye is, the outer corner of the eye, and make the computer can calculate the position of the eye soon. After being split and marked, the voice information they enter can allow the smart speaker to know “shutdown” “And “call my husband ten minutes later” What does it mean? In the future, the reason why Chinese autonomous vehicles can stop at the intersection is that because the workers have marked red lights, zebra crossings and the image of one frame by one frame pedestrians moving.

There are as many as hundreds of data label factories in China. They are the end of capillaries in the artificial intelligence industry system, so the lowest status and lowest profit make them very difficult to survive. But the irony is that they are supporting the “great” achievements of Chinese artificial intelligence companies.

You may also like...

2 Responses

  1. Kaniel Chan says:

    No, you are wrong, we Chinese are always the smartest and most diligent.

  2. Douglas says:

    It surprises me a little that Chinese companies can access data without permission. But, in USA, Amazon do the same, pay people to mark and label thousand images for cents.

Leave a Reply