Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
-
Updated
Nov 10, 2025 - Python
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
Graphormer is a general-purpose deep learning backbone for molecular modeling.
A Python toolkit for fine-tuning Geospatial Foundation Models (GFMs).
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
PaddleScience is SDK and library for developing AI-driven scientific computing applications based on PaddlePaddle.
The resources of LucaOne, including: the model code, training scripts, embedding inference code, and trained checkpoints.
freephdlabor: customizing personalized multiagent systems that researchs 24/7 on your own scientific problem
A language agent gym with challenging scientific tasks
Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024
[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation Discovery and Symbolic Regression with Large Language Models
IntelliFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction.
Geom3D: Geometric Modeling on 3D Structures, NeurIPS 2023
A minimalist multi-agent framework for rubost automation of scientific analysis workflows, such as gene expression analysis.
[ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
A general AI agent framework that can be adapted to various tasks and environments.
A Unified Evaluation Suite for Protein Design
Official code repo for the paper "LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset"
A Text-guided Protein Design Framework, Nat Mach Intell 2025 (https://www.nature.com/articles/s42256-025-01011-z)
Add a description, image, and links to the ai4science topic page so that developers can more easily learn about it.
To associate your repository with the ai4science topic, visit your repo's landing page and select "manage topics."