YOLO-World环境搭建&推理测试-CFANZ编程社区

一、引子

CV做了这么多年，大多是在固定的数据集上训练，微调，测试。突然想起来一句话，I have a dream！就是能不能不用再固定训练集上捣腾，也就是所谓的开放词汇目标检测（OVD）。偶尔翻翻AI新闻，发现现在CV领域有在卷开集目标检测的趋势。刚好翻到，YOLO-World这一开源项目。OK，让我们开始吧。

二、模型介绍

目标检测一直是计算机视觉中一个长期而基础性的挑战，在图像理解、机器人学和自动驾驶车辆等领域有着众多的应用。随着深度神经网络的发展，大量的研究工作在目标检测领域取得了显著的突破。尽管这些方法取得了成功，但它们仍然有限，因为它们只处理具有固定词汇量的目标检测，例如，COCO 数据集中的80个类别。一旦定义并标记了目标类别，训练出的检测器只能检测那些特定的类别，这样就限制了在开放场景中的能力和适用性。

YOLO-World在大规模数据集上的预训练展示了强大的零样本性能，在LVIS上达到35.4 AP的同时，还能保持52.0 FPS的速度。预训练的YOLO-World可以轻松适应下游任务，例如，开集实例分割和指代目标检测。此外，YOLO-World的预训练权重和代码将开源，以促进更多实际应用。

三、安装环境

官方YOLO-World是基于mmyolo, mmdetection实现的，但U1S1，mm系列对于入门确实不错，但对于新开源算法上手测试真心难用，听说ultralytics支持YOLO-World了，可以直接通过ultralytics库来玩YOLO-world了使用方式简单到了极致，几行命令即可，还不需要安装一大堆的mm包，不需要编译各种无关op。

拉取镜像

docker pull ultralytics/ultralytics:latest

docker run -it --rm -v /datas/work/zzq:/workspace ultralytics/ultralytics:latest bash

四、推理测试

cd /workspace/YOLO-World

1、普通检测

from ultralytics import YOLOWorld  
  
# Initialize a YOLO-World model  
model = YOLOWorld('yolov8s-world.pt')    

# Execute inference with the YOLOv8s-world on the specified image  
results = model.predict('bus.jpg')  
  
# Show results  
results[0].show() 
results[0].save("result.jpg")

python test_yolo_world.py

2、行人检测

from ultralytics import YOLOWorld  
  
# Initialize a YOLO-World model  
model = YOLOWorld('yolov8s-world.pt')    

Define custom classes  
model.set_classes(["person"]) 

# Execute inference with the YOLOv8s-world on the specified image  
results = model.predict('bus.jpg')  
  
# Show results  
results[0].show() 
results[0].save("result.jpg")

安装CLIP

GitHub - ultralytics/CLIP: CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

pip install ftfy regex tqdm

cd CLIP-main

pip install . -i Simple Index

python test_yolo_world.py

from ultralytics import YOLOWorld # Initialize a YOLO-World model model = YOLOWorld('yolov8s-world.pt') Define custom classes model.set_classes(["person"]) # Execute inference with the YOLOv8s-world on the specified image results = model.predict('bus.jpg') # Show results results[0].show() results[0].save("result.jpg")

运行过程中需要下载模型

最终结果：