Skip to content
View ilseong827's full-sized avatar

Block or report ilseong827

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Python 769 32 Updated Sep 30, 2025
G-code 123 23 Updated Sep 29, 2025

PyTorch implementation of Wasserstein Adversarial Proximal Policy Optimization(WAPPO).

Python 7 Updated Jun 22, 2022

Code for Reinforcement Learning from Vision Language Foundation Model Feedback

C++ 124 20 Updated May 22, 2024

Official Implementation of RoboCLIP (NeurIPS 2023)

Python 50 13 Updated Aug 12, 2024
Python 23 4 Updated Aug 19, 2024
Python 62 14 Updated Jun 25, 2024

Parameter-efficient Fine Tuning for Clinical LLMs

Jupyter Notebook 13 2 Updated Apr 23, 2024

Multi-modality pre-training

Python 506 35 Updated May 8, 2024

Note: DO NOT USE IT! THIS CODE IS PROVEN TO CONTAIN DATA LEAKAGE! Archive version of "Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval (CVPR 2024 Highlight)"

Python 18 5 Updated May 1, 2025
Jupyter Notebook 30 5 Updated Dec 13, 2023

Official implementation for VIOLA

Python 120 7 Updated Jun 18, 2023

An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"

Python 174 19 Updated Apr 6, 2024
Python 5 Updated May 26, 2025

[CVPR 2024] TeachCLIP for Text-to-Video Retrieval

Python 40 Updated May 7, 2025

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Python 876 48 Updated Aug 8, 2025

Frontier Multimodal Foundation Models for Image and Video Understanding

Jupyter Notebook 996 70 Updated Aug 14, 2025

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 3,983 470 Updated Mar 23, 2025
Python 4,278 407 Updated Sep 14, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 13,485 1,027 Updated Sep 28, 2025

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 554 56 Updated Sep 2, 2025

Sentence Embeddings using Siamese SKT KoBERT

Python 142 31 Updated Jan 6, 2023

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 2,062 127 Updated Aug 7, 2025

Democratization of RT-2 "RT-2: New model translates vision and language into action"

Python 516 65 Updated Jul 26, 2024

Mastering Diverse Domains through World Models

Python 2,176 371 Updated Sep 23, 2025

Codebase for Automated Creation of Digital Cousins for Robust Policy Learning

Python 229 20 Updated Mar 31, 2025

A generative and self-guided robotic agent that endlessly propose and master new skills.

Python 1,079 101 Updated May 31, 2024

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.

Python 1,368 233 Updated Jul 31, 2024
Next