PKU-DAVIS-SOD Dataset

INTRODUCTION

The PKU-DAVIS-SOD dataset is a large-scale multimodal neuromorphic object detection dataset with some challenging scenarios (i.e., low-light and high-speed motion blur) included. It is constructed by the National Engineering Research Center for Visual Technology, Peking University.

Collection Setup. This dataset is recorded using DAVIS346 which is one of the novel event cameras. As shown in Fig.1(a), we install a DAVIS346 camera on the front windshield of the driving car. For the convenience of acquiring high-speed objects meanwhile providing a comprehensive perspective of the objects, we additionally provide some sequences in which the camera is set at the side of the road, recording objects from the flanks. The DAVIS346 camera shown in Fig.1(b) can simultaneously outputs high temporal resolution asynchronous events and conventional RGB frames with the resolution of 346 ´ 260.

This image has an empty alt attribute; its file name is image-2.png
(a) Recording platform(b) DAVIS346 camera
Fig.1 Setup of data collection

Data recordings and Annotation. Our PKU-DAVIS-SOD dataset contains 3 traffic scenarios by considering velocity distribution, light condition, category diversity and object scale (see Fig. 2), etc. We use the DAVIS346 to record 220 sequences including RGB frames and DVS events. In each sequence, we collect approximately 1 min as the raw data pool with 25 FPS of RGB frames. To provide manual bounding boxes in challenging scenarios (e.g., high-speed and low-light), grayscale images are reconstructed from asynchronous events using E2VID at 25 FPS when RGB frames are of low quality. After the temporal calibration, we first select three common and important object classes (i.e., car, pedestrian, and two-wheeler) in our daily life. Then, all bounding boxes are annotated from RGB frames or synchronized reconstructed images by a well-trained professional team.

(a) Category diversity(b) Light Change(c) Object scale(d) Velocity distribution
Fig.2 Representative examples.

Data Statistics. Manual annotations in the recordings are provided at a frequency of 25 Hz. As a result, this dataset has 276k labeled timestamps and 1080.1k labels in total. Afterward, we split them into 671.3k for training, 194.7k for validation, and 214.1k for testing. The precise numbers can be found in Table 1.

Table 1 The details of the PKU-DAVIS-SOD dataset.

LICENCE

  1. DVS events, APS frames, and the corresponding annotation results can only be used for ACADEMIC PURPOSES. No COMERCIAL USE is allowed.
  2. Copyright @ National Engineering Research Center for Visual Technology and Institute of Digital Media, Peking University (PKU-IDM). All rights reserved.

DOWNLOAD

You can download directly from here.

Address: Room 2604, Science Building No.2, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing, P.R.China.

Fax: +86-10-62755965.

PKU-Vidar-DVS Dataset

INTRODUCTION

The PKU-Vidar-DVS dataset is a large-scale multimodal neuromorphic object detection dataset with temporally continuous labels. It is constructed by the National Engineering Research Center for Visual Technology, Peking University.

Collection Steps and Calibration. This dataset is recorded using our hybrid camera system, which includes a Vidar (resolution 400*250) and a DAVIS346. As shown in Fig.1, the input light is equally divided into Vidar and DAVIS346 via a beam splitter (i.e., Thorlabs CCM1-BS013). On this basis, we design the spatio-temporal calibration procedures to synchronize two cameras within the shared view at the same time.

Fig.1 The hybrid camera system.

Data recordings and Annotation. Our PKU-Vidar-DVS dataset contains 9 indoor and outdoor challenging scenarios (see Fig. 2) by considering velocity distribution, illumination change, category diversity, and object scale, etc. We use the hybrid camera system to record 490 sequences including Vidar spikes and DVS events. In each sequence, we collect approximately 5 seconds as the raw data pool. To provide bounding boxes from asynchronous visual streams, frames are reconstructed from Vidar spikes at 50 FPS. After spatio-temporal calibration, all labels are provided by a well-trained professional annotation team.

Fig.2 Representative examples.

Data Statistics. Manual annotations in the recordings are provided at a frequency of 50 Hz. As a result, this dataset has 103.3k labeled timestamps and 229.5k labels in total. Afterward, we split them into three subsets for training, validation, and testing. Notably, this is the first work to build a neuromorphic multimodal object detection dataset involving high-speed and low-light scenarios. Besides, more details can be found in Table 1.

Type

Sequence number

Classes

Timestamps

Labels

Training set

263

9

55.0k

133.2k

Validation set

111

9

23.7k

47.3k

Testing set

116

9

24.6k

48.9k

All

490

9

103.3k

229.5k

Table 1 The details of the PKU-Vidar-DVS dataset.

LICENCE

  1. Vidar spikes, DVS events, and the corresponding annotation results can only be used for ACADEMIC PURPOSES. No COMERCIAL USE is allowed.
  2. Copyright @ National Engineering Research Center for Visual Technology and Institute of Digital Media, Peking University (PKU-IDM). All rights reserved.

All publications using the PKU-Vidar-DVS dataset should cite the paper below:

Jianing Li, Xiao Wang, Lin Zhu, Jia Li, Tiejun Huang, Yonghong Tian. Retinomorphic object detection in asynchronous visual streams. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2022.

DOWNLOAD

You can download directly from here .

Email: pkuml at pku.edu.cn

Address: Room 2604, Science Building No.2, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing, P.R.China.

PKU-RETINA-RECON dataset

The PKU-Retina-Recon dataset is constructed by National Engineering Laboratory for Video Technology (NELVT), Peking University. The goals to create the PKU-Retina-Recon dataset include:

  1. providing the worldwide researchers of the neuromorphic vision community a spike/event dataset for evaluating their algorithms;
  2. facilitating the development of reconstruction technologies by providing several spike sequences with different motion speed or at different light conditions.

Therefore, the PKU-Retina-Recon dataset is now partly made available for the academic purpose only on a case-by-case basis

The PKU-Retina-Recon dataset is constructed by National Engineering Laboratory for Video Technology (NELVT), Peking University, sponsored by the National Basic Research Program of China and Chinese National Natural Science Foundation. The NELVT at Peking University is serving as the technical agent for distribution of the dataset and reserves the copyright of all the sequences in the dataset. Any researcher who requests the PKU-Retina-Recon dataset must sign this agreement and thereby agrees to observe the restrictions listed in this document. Failure to observe the restrictions will result in access being denied for the request of the future version of the PKU-Retina-Recon dataset and being subject to civil damages in the case of publication of sequences that have not been approved for release.

LICENSE

  •  The spike sequences for download are part of the PKU-Retina-Recon dataset.
  •  The sequences can only be used for ACADEMIC PURPOSES. NO COMERCIAL USE is allowed.
  •  Copyright © National Engineering Laboratory for Video Technology (NELVT) and Institute of Digital Media, Peking University (PKU-IDM). All rights reserved.

All publications using PKU-Retina-Recon should cite the papers below:

Lin Zhu, Jianing Li, Xiao Wang, Tiejun Huang, Yonghong Tian; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 2400-2409

Download

You can download the agreement(pdf) by clicking the DOWNLOAD link . After filling it, please send the electrical version to our Email: pkuml at pku.edu.cn (Subject: PKU-Retina-Recon Agreement) .

Spiking Neural Networks

Networks of Spiking Neurons: The Third Generation of Neural Network Models

A review of learning in biologically plausible spiking neural networks

Rethinking the performance comparison between SNNS and ANNS

Long Short-Term Memory Spiking Networks and Their Applications

Towards spike-based machine intelligence with neuromorphic computing

Event-driven Random Backpropagation: Enabling Neuromorphic Deep Learning Machines

S4NN: temporal backpropagation for spiking neural networks with one spike per neuron

Visualizing a joint future of neuroscience and neuromorphic engineering

Surrogate Gradient Learning in Spiking Neural Networks

Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks

SLAYER: Spike Layer Error Reassignment in Time

Inherent Adversarial Robustness of Deep Spiking Neural Networks: Effects of Discrete Input Encoding and Non-Linear Activations

Exploring Adversarial Attack in Spiking Neural Networks with Spike-Compatible Gradient

REPRESENTATIVE ACHIEVEMENT

The team has made extraordinary contributions on the compression and analysis of visual big data, neuromorphic vision, and brain-inspired computation, with over 200 peer reviewed top-tier journal and conference publications, more than 80 patents granted and 20 patents pending, two competitive best paper awards, and six national/ministerial awards. Dozens of core techniques have been adopted into international and national standards, and commercial products. Four representative achievements are briefly presented as follows:

1 Neuromorphic Vision Chips and Devices

To address the visual imaging challenges in high-speed motion, the team invented a retina-like integral visual imaging and reconstruction technology, inspired by the findings from the simulation of the ten-million-scale neural networks of a monkey retinal fovea on a supercomputer. Moreover, their team also creatively developed a high-efficient compression framework of visual spike streams and designed a fulltime fovea-like visual sensing chip with the sampling frequency of 40,000 Hz and the spike camera with the same photosensitive devices as traditional cameras. That is, by utilizing only consumer-level CMOS sensors and integrated circuits, a camera with the fovea-like visual sensing chip, called Vidar, is 1,000x faster than a conventional camera.

This image has an empty alt attribute; its file name is image.png

The Vidar camera is able to reconstruct the image at any given moment with considerable flexibility in dynamic range. Currently, the team is developing a large field-of-view neuromorphic vision measurement device that can effectively and efficiently detect the high-speed moving objects such as tennis, bullets, and missiles.

2 Visual Big Data Technologies

The team has made a seminal contribution to developing innovative video analytic algorithms and systems for visual big data applications. They pioneered the visual saliency computation approach uniquely from a machine learning perspective and creatively designed a fine-grained object recognition scheme at the very early stage, both inspired by the neurobiological mechanisms of human vision system. This work promoted the learning-based visual saliency computing and fine-grained object recognition to become the mainstream research topics in this field. With these innovative video analytic algorithms, he invented a new scalable visual computing framework to tackle the visual big data challenges, called digital retina, to revolutionize the old one that was formed fifteen years ago due to cloud computing. The team also developed a digital retina server and a city brain system to enable the framework. This framework is shown practically to be effective to address the challenge when aggregating video streams from hundreds of thousands of cameras distributed geographically into the cloud for big data analysis.

This image has an empty alt attribute; its file name is image-1.png

Some of their algorithms and systems have been commercially transferred and then applied to urban video surveillance systems and transportation systems in some large and medium-sized cities such as Shenzhen, Qingdao and Guiyang. For instance, their system successfully assisted the police to track down a case in which a faked-plate Volvo car maliciously violated the traffic regulations in Qingdao 433 times one year. Moreover, their system could provide more precise and more timely sensing of the traffic status of the city so that the city brain can adopt the corresponding optimization strategy to reduce the traffic congestion and improve the traffic flow.

3 Scene-based Video Coding Technology and Standards

In conventional research, video coding was often treated as a signal processing problem with ideal mathematical settings, which inevitably ignored many realistic characteristics of scene videos, e.g., the redundant background in surveillance video and conference video. To incorporate such scene characteristics, the team creatively embedded a low-complexity and high-efficiency background modeling module into the video coding loop and established a novel standard-compatible scene-based video coding framework. This framework can achieve approximately twice the coding efficiency on surveillance videos while remarkably reducing the encoding complexity, for all the standard reference software, including H.264/AVC and H.265/HEVC. More importantly, foreground objects can be extracted simultaneously and represented as regions of interest (ROIs). These data are directly used by intelligent analysis tasks such as object detection and tracking. In other words, the video coding becomes more analysis-friendly with enhanced support to visual content interaction and more suitable for video analysis.

This image has an empty alt attribute; its file name is image-4.png

In the last several years, the scene-based video coding technology became the core of IEEE 1857 standard and China’s AVS standard. Hikvision, the leading supplier of video surveillance equipment worldwide, and Hisense, the No. 1 vendor for urban traffic and public transport in China, have embedded this standardized technology into their smart camera products and intelligent transportation systems, making them earn billions of sales revenue in the rapidly-growing market.

4 Pengcheng CloudBrain and its AI Open-Source Platform

In recent two years, as the Chief Architector, Dr. Tian is leading the development of the Pengcheng Cloudbrain, one of the leading AI supercomputers in China for academic research. The Cloudbrain-I consists of 1000+ NOVIDA V100 GPUs and self-developed resource management and scheduling software (called Octopus). While the Cloudbrain-II contains 2048 Kunpeng CPU and 4096 Ascent NPUs developed by Huawei, with totally 2 PFlops for FP 64 and 1EFlops for FP 16. In Nov 2020, it ranked the first place for both the full-system and 10-nodes configurations at the IO500, a comprehensive benchmark suite that enables comparison of high-performance storage and IO systems. Meanwhile, it also won the AIPerf benchmark in Nov 2020, an end-to-end benchmark suite utilizing automated machine learning (AutoML) that represents real AI scenarios for supercomputers.

This image has an empty alt attribute; its file name is image-3.png

The Cloudbrain has been currently used in different challenging applications, e.g., to train large-scale pre-train NLP models such as GPT-3 and its extended versions, or to implement the city-level traffic situation awareness from thousands of traffic cameras. It has also been constructed as an open source infrastructure and shall serve the needs of AI researchers and technologists nationwide within China and eventually shall open to the worldwide community.

PKU-AIR300 dataset

The PKU-AIR300 Dataset is a new large-scale challenging aircraft dataset. It contains 320,000 annotated color images from 300 different classes in total. Each category contains 100 images at least, and a maximum of 10,000 images, which leads to the long tail distribution. According to the number of images in each category, it is divided all classes into two parts with 180 known classes for training and 120 novel unknown classes for testing respectively.

LICENSE

  • The images the corresponding annotation results can only be used for ACADEMIC PURPOSES. NO COMERCIAL USE is allowed.
  • Copyright National Engineering Laboratory for Video Technology (NELVT) and Institute of Digital Media, Peking University. All rights reserved.

All publications using Air-300 Dataset should cite the paper below:

Guangyao Chen, Limeng Qiao, Yemin Shi, Peixi Peng, Jia Li, Tiejun Huang, Shiliang Pu and Yonghong Tian. Learning Open Set Network with Discriminative Reciprocal Points. ECCV 2020.

DOWNLOAD

  • You can download the agreement (pdf) by clicking the DOWNLOAD link.
  • Contact E-mail: pkuml at pku.edu.cn and mail the scanned vision.

PKU-Masked-Face Dataset

The PKU-Masked-Face Dataset is constructed by National Engineering Laboratory for Video Technology (NELVT), Peking University.

The dataset contains 10,301 face images of 1,018 identities. Each identity has masked and common face images with various orientations, lighting conditions and mask types. Most identities have 5 holistic face images and 5 masked face images with 5 different views: front, left, right, up and down.

LICENSE

  • The videos and the corresponding annotation results can only be used for ACADEMIC PURPOSES. NO COMERCIAL USE is allowed.
  • Copyright © National Engineering Laboratory for Video Technology (NELVT) and Institute of Digital Media, Peking University (PKU-IDM). All rights reserved.

All publications using PKU-Masked-Face Dataset should cite the paper below:

  • Feifei Ding, Peixi Peng, Yangru Huang, Mengyue Geng and Yonghong Tian. Masked Face Recognition with Latent Part Detection. ACM Multimedia 2020.

Note: The dataset is different from MFI and MFV proposed in the paper. The facial photos in MFI and MFV have potential illegal risks of privacy leakage. The PKU-Masked-Face Dataset is larger and harder than MFI and MFV.

The experimental results of our model on this dataset are shown in the following table. We use the masked face images as the query set and the holistic face images as the gallery set. ResNet50 is used as the baseline model.

rank1rank5rank10mAP
Baseline65.7386.2790.0128.25
Baseline+MG94.0097.5198.1736.74
LPD95.5097.8298.4441.41

DOWNLOAD

You can download the agreement (pdf) from here. Please make sure that a permanent/long-term responsible person (e.g., professor, PI) fills in the agreement with a handwriting signature.  After filling it, please send the electrical version to our Email: pkuml at pku.edu.cn (Subject: PKU-Masked-Face-Agreement) .

Please send it through an academic or institute email-addresses such as xxx at xxx.edu.xx. Requests from free email addresses (outlook, gmail, qq etc) will be kindly refused.

After confirming your information, we will send the download link and password to you via Email. You need to follow the agreement.

Usually we will reply in a week. But sometimes the mail does not arrive and display successfully for some unknown reason. If this happened, please change the content or title and try sending again.

PKU-Spike-Recon Dataset

The PKU-Spike-Recon dataset is constructed by National Engineering Laboratory for Video Technology (NELVT), Peking University. The goals to create the PKU-Spike-Recon dataset include:

  1. providing the worldwide researchers of the neuromorphic vision community a spike dataset for evaluating their algorithms;
  2. facilitating the development of reconstruction technologies by providing several spike sequences with different motion speed or at different light conditions.

Therefore, the PKU-Spike-Recon dataset is now partly made available for the academic purpose only on a case-by-case basis. The dataset contains Class A (normal speed) and Class B (high speed), which are recorded by spike camera with 40000 FPS. We also provide a spike player for playback spike sequences and related decoding code. Additionally, more details are illustrated in the following table.

Sequence

Time length (s)

Data size (kb)

Total spike number

Class A

(normal speed)

Office

0.1

48829

4298138

Gallery

0.1

48829

20706245

Lake

0.1

48829

25188950

Flower

0.1

48829

29821121

Class B

(high speed)

Car

0.1

48829

103324641

Train

0.1

48829

42898223

Rotation1

0.1

48829

24251431

Rotation2

0.1

48829

39766013

The PKU-Spike-Recon dataset is constructed by National Engineering Laboratory for Video Technology (NELVT), Peking University, sponsored by the National Basic Research Program of China and Chinese National Natural Science Foundation. The NELVT at Peking University is serving as the technical agent for distribution of the dataset and reserves the copyright of all the sequences in the dataset. Any researcher who requests the PKU-Spike-Recon dataset must sign this agreement and thereby agrees to observe the restrictions listed in this document. Failure to observe the restrictions will result in access being denied for the request of the future version of the PKU-Spike-Recon dataset and being subject to civil damages in the case of publication of sequences that have not been approved for release.

 

LICENSE

  •  The spike sequences for download are part of the PKU-Spike-Recon dataset.
  •  The sequences can only be used for ACADEMIC PURPOSES. NO COMERCIAL USE is allowed.
  •  Copyright © National Engineering Laboratory for Video Technology (NELVT) and Institute of Digital Media, Peking University (PKU-IDM). All rights reserved.

All publications using PKU-Spike-Recon should cite the papers below:

  1. L. Zhu, S. Dong, J. Li, T. Huang, Y. Tian, Retina-like Visual Image Reconstruction via Spiking Neural Model, CVPR 2020
  2. L. Zhu, S. Dong, T. Huang, Y. Tian, A retina-inspired sampling method for visual texture reconstruction, ICME 2019

Download(The dataset is available soon)

You can download the agreement(pdf) by clicking the DOWNLOAD link .

After filling it, please send the electrical version to our Email: pkuml at pku.edu.cn (Subject: PKU-Spike-Recon Agreement)

 

PKU-Spike-High-Speed Dataset

The PKU-Spike-High-Speed Dataset is constructed by National Engineering Laboratory for Video Technology (NELVT), Peking University.

The goals to create this dataset includes: (1) reconstructing high-speed moving target; (2) Capturing natural scenes for high-speed moving camera. Therefore, the first available part contains Class A (moving target) and Class B (moving camera) with eight spike sequences, which are recorded by a retina-based camera, namely fovea-like sampling model (FSM). In particular, our FSM has high temporal resolution (40k FPS) and can reconstruct texture image with 400*250 pixel. We also provide two spike players (jspikeplayer.jar and SpikePlayer.exe) for playback spike sequences. Additionally, more details are illustrated in the following table.

Sequence

Scene reconstruction

Length (s)

Spike numbers

Class A

moving target

car-100km/h

0.2

102206031

bus

0.42

211084678

ratation1-2600r/min

2

396742501

ratation2-2600r/min

2

407620564

Class B

moving camera

train-350km/h

0.2

42898223

forest

0.22

93319068

viaduct bridge

0.22

136859111

railway

0.22

87866720

Note: Class A is captured by the fixed FSM, and class B is recorded by the FSM in high railway with the speed of 350 km/h.

 

The PKU-Spike-High-Speed Dataset is now partly made available for the academic purpose only on a case-by-case basis. More sequences are under evaluation which will be public in the future.

 

LICENSE

  • The sequences can only be used for ACADEMIC PURPOSES. NO COMERCIAL USE is allowed.
  • Copyright National Engineering Laboratory for Video Technology (NELVT) and Institute of Digital Media, Peking University.

All publications using PKU-Spike-High-Speed Dataset should cite the papers below:

Lin Zhu, Siwei Dong, Tiejun Huang, Yonghong Tian. A retina-inspired sampling method for visual texture reconstruction, IEEE International Conference on Multimedia and Expo (ICME), 2019.

DOWNLOAD

  • You can download the agreement (pdf) by clicking the DOWNLOAD link.
  • Contact E-mail: pkuml at pku.edu.cn

PKU-DDD17-CAR Dataset

INTRODUCTION

The PKU-DDD17-CAR Dataset is constructed by National Engineering Laboratory for Video Technology (NELVT), Peking University.

The dataset contains 3155 hybrid sequences in driving scenes, which consist of images, event streams and handed car labels. Specially, we first collect from DDD17 dataset, which has over 400GB and 12 hours of 346*260 pixel DAVIS sensor recording highway and city driving in daytime and night-fall conditions. Then, we provide the hand-label dataset by synchronizing frames and event streams. As shown in Figure 1, four representative scenes are motion blur, overexposure, low-night and normal-light. In addition, more details are illustrated in Table 1.

Figure 1: Four representative scenes in driving scenes.

LICENCE

All publications using PKU-DDD17-CAR dataset should cite the papers below:

  • Jianing Li, Siwei Dong, Zhaofei Yu, Yonghong Tian, Tiejun Huang. Event-based vision Enhanced: A Joint Detection Framework in Autonomous Driving, IEEE International Conference on Multimedia and Expo (ICME), 2019.
  • Janathan Blnas, Danlel Nill, Shih-Chil Liu, Tobi Delbruck. DDD17: End-To-End DAVIS Driving Dataset. Proceedings of 34th International Conference on Machine Learning (ICML), 2017.

 

DOWNLOAD

You can download directly from here. password: pkumlgb

Email: pkuml at pku.edu.cn

Address: Room 2604, Science Building No.2, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing, P.R.China.