0
点赞
收藏
分享

微信扫一扫

IMDB-WIKI 500k 人脸图像、年龄性别数据


Since the publicly available face image datasets are often of small to medium size, rarely exceeding tens of thousands of images, and often without age information we decided to collect a large dataset of celebrities. For this purpose, we took the list of the most popular 100,000 actors as listed on the IMDb website and (automatically) crawled from their profiles date of birth, name, gender and all images related to that person. Additionally we crawled all profile images from pages of people from Wikipedia with the same meta information. We removed the images without timestamp (the date when the photo was taken). Assuming that the images with single faces are likely to show the actor and that the timestamp and date of birth are correct, we were able to assign to each such image the biological (real) age. Of course, we can not vouch for the accuracy of the assigned age information. Besides wrong timestamps, many images are stills from movies - movies that can have extended production times. In total we obtained 460,723 face images from 20,284 celebrities from IMDb and 62,328 from Wikipedia, thus 523,051 in total.

译:

由于公开的人脸图像数据集通常是中小型的,很少超过数万张图像,而且通常没有年龄信息,我们决定收集一个大型的名人数据集。为此,我们在IMDb网站上列出了最受欢迎的10万名演员名单,并(自动)从他们的个人资料中提取出生日期、姓名、性别和所有与此人有关的图片。此外,我们从维基百科的用户页面中抓取了相同的元信息的所有个人资料图片。我们删除了没有时间戳(照片拍摄日期)的图像。假设单面图像很可能显示演员,并且时间戳和出生日期是正确的,那么我们就可以为每个这样的图像指定生物(真实)年龄。当然,我们不能保证指定年龄信息的准确性。除了错误的时间戳之外,许多图像都是电影的静像——可以延长制作时间的电影。我们总共从IMDb的20284位名人和维基百科的62328位名人那里获得了460723张人脸图像,总共523051张。

大家可以到官网地址下载数据集,我自己也在百度网盘分享了一份。可关注本人公众号,回复“2020091001”获取下载链接。

举报

相关推荐

0 条评论