REPRESENTATIVE ACHIEVEMENTS | Multimedia Learning Group

The team has made extraordinary contributions on the compression and analysis of visual big data, neuromorphic vision, and brain-inspired computation, with over 200 peer reviewed top-tier journal and conference publications, more than 80 patents granted and 20 patents pending, two competitive best paper awards, and six national/ministerial awards. Dozens of core techniques have been adopted into international and national standards, and commercial products. Four representative achievements are briefly presented as follows:

1 Neuromorphic Vision Chips and Devices

To address the visual imaging challenges in high-speed motion, the team invented a retina-like integral visual imaging and reconstruction technology, inspired by the findings from the simulation of the ten-million-scale neural networks of a monkey retinal fovea on a supercomputer. Moreover, their team also creatively developed a high-efficient compression framework of visual spike streams and designed a fulltime fovea-like visual sensing chip with the sampling frequency of 40,000 Hz and the spike camera with the same photosensitive devices as traditional cameras. That is, by utilizing only consumer-level CMOS sensors and integrated circuits, a camera with the fovea-like visual sensing chip, called Vidar, is 1,000x faster than a conventional camera.

The Vidar camera is able to reconstruct the image at any given moment with considerable flexibility in dynamic range. Currently, the team is developing a large field-of-view neuromorphic vision measurement device that can effectively and efficiently detect the high-speed moving objects such as tennis, bullets, and missiles.

2 Visual Big Data Technologies

The team has made a seminal contribution to developing innovative video analytic algorithms and systems for visual big data applications. They pioneered the visual saliency computation approach uniquely from a machine learning perspective and creatively designed a fine-grained object recognition scheme at the very early stage, both inspired by the neurobiological mechanisms of human vision system. This work promoted the learning-based visual saliency computing and fine-grained object recognition to become the mainstream research topics in this field. With these innovative video analytic algorithms, he invented a new scalable visual computing framework to tackle the visual big data challenges, called digital retina, to revolutionize the old one that was formed fifteen years ago due to cloud computing. The team also developed a digital retina server and a city brain system to enable the framework. This framework is shown practically to be effective to address the challenge when aggregating video streams from hundreds of thousands of cameras distributed geographically into the cloud for big data analysis.

Some of their algorithms and systems have been commercially transferred and then applied to urban video surveillance systems and transportation systems in some large and medium-sized cities such as Shenzhen, Qingdao and Guiyang. For instance, their system successfully assisted the police to track down a case in which a faked-plate Volvo car maliciously violated the traffic regulations in Qingdao 433 times one year. Moreover, their system could provide more precise and more timely sensing of the traffic status of the city so that the city brain can adopt the corresponding optimization strategy to reduce the traffic congestion and improve the traffic flow.

3 Scene-based Video Coding Technology and Standards

In conventional research, video coding was often treated as a signal processing problem with ideal mathematical settings, which inevitably ignored many realistic characteristics of scene videos, e.g., the redundant background in surveillance video and conference video. To incorporate such scene characteristics, the team creatively embedded a low-complexity and high-efficiency background modeling module into the video coding loop and established a novel standard-compatible scene-based video coding framework. This framework can achieve approximately twice the coding efficiency on surveillance videos while remarkably reducing the encoding complexity, for all the standard reference software, including H.264/AVC and H.265/HEVC. More importantly, foreground objects can be extracted simultaneously and represented as regions of interest (ROIs). These data are directly used by intelligent analysis tasks such as object detection and tracking. In other words, the video coding becomes more analysis-friendly with enhanced support to visual content interaction and more suitable for video analysis.

In the last several years, the scene-based video coding technology became the core of IEEE 1857 standard and China’s AVS standard. Hikvision, the leading supplier of video surveillance equipment worldwide, and Hisense, the No. 1 vendor for urban traffic and public transport in China, have embedded this standardized technology into their smart camera products and intelligent transportation systems, making them earn billions of sales revenue in the rapidly-growing market.

4 Pengcheng CloudBrain and its AI Open-Source Platform

In recent two years, as the Chief Architector, Dr. Tian is leading the development of the Pengcheng Cloudbrain, one of the leading AI supercomputers in China for academic research. The Cloudbrain-I consists of 1000+ NOVIDA V100 GPUs and self-developed resource management and scheduling software (called Octopus). While the Cloudbrain-II contains 2048 Kunpeng CPU and 4096 Ascent NPUs developed by Huawei, with totally 2 PFlops for FP 64 and 1EFlops for FP 16. In Nov 2020, it ranked the first place for both the full-system and 10-nodes configurations at the IO500, a comprehensive benchmark suite that enables comparison of high-performance storage and IO systems. Meanwhile, it also won the AIPerf benchmark in Nov 2020, an end-to-end benchmark suite utilizing automated machine learning (AutoML) that represents real AI scenarios for supercomputers.

The Cloudbrain has been currently used in different challenging applications, e.g., to train large-scale pre-train NLP models such as GPT-3 and its extended versions, or to implement the city-level traffic situation awareness from thousands of traffic cameras. It has also been constructed as an open source infrastructure and shall serve the needs of AI researchers and technologists nationwide within China and eventually shall open to the worldwide community.