AI/ML Project

No Training Animal Detection Using Vision Language Model

Ananimal detection system powered by Vision-Language Models. Uses GroundingDINO for open-vocabulary object detection and LLaVA for refined classification.

Project Overview

This project introduces a 'zero-shot' approach to wildlife monitoring, allowing users to detect and identify animals in images without any prior training on specific species datasets.

The pipeline integrates GroundingDINO to locate objects based on text prompts (e.g., 'animal') and then passes cropped detections to LLaVA (Large Language-and-Vision Assistant) for refined classification and descriptive analysis.

Designed for flexibility and rapid deployment, the system can adapt to new environments and rare species instantly by simply modifying the text prompts, bypassing the traditional bottlenecks of computer vision workflows.

Key Features

Zero-shot animal detection using open-vocabulary GroundingDINO
Refined species classification and behavioral description via LLaVA
Automated image cropping and preprocessing for multi-stage analysis
Support for diverse environments (forest, urban, domestic) without retraining
Visual bounding box annotation

Technologies Used

PythonPyTorchGroundingDINOLLaVAHugging Face TransformersOpenCVPIL

View Source Code

Project Details

Client

Personal Project

Timeline

2025

Role

Solo Project

More Projects

Fake News Detection Using LLM

AI/ML Project

Suspicious Activity Detection Using CNN

Computer Vision