nvfp4

Here are 54 public repositories matching this topic...

NVlabs / Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

reinforcement-learning transformers pytorch streaming-video diffusion dit video-generation world-models sana text-to-video linear-transformer text-to-image-generation system-algorithm-deisgn nvfp4

Updated Jun 25, 2026
Python

NVlabs / LongLive

Star

LongLive 2.0: Infra - Long Video Gen

real-time parallel infra long video-generation nvfp4

Updated Jun 13, 2026
Python

intel / auto-round

Star

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

transformers rounding quantization omni int4 diffusers llms vllm gguf vlms sglang mxfp4 nvfp4

Updated Jun 30, 2026
Python

NVIDIA / cudnn-frontend

Star

cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

Updated Jun 29, 2026
Python

Avarok-Cybersecurity / atlas

Star

Pure Rust Inference Engine

rust cuda transformers ssm mamba dgx openai-api llm-inference speculative-decoding gb10 nvfp4 dgx-spark

Updated Jun 30, 2026
Rust

AEON-7 / Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash

Star

Fully uncensored, capability-enhanced abliteration of Qwen3.6-27B. NVFP4 + z-lab DFlash speculative decoding (n=12) on the unified ghcr.io/aeon-7/aeon-vllm-ultimate:latest container, tuned for long-context draft acceptance on DGX Spark. 6 HF variants (BF16/NVFP4/MTP/MTP-XS), docker-compose, and QuickStart.

quantization uncensored blackwell llm vllm qwen speculative-decoding abliteration qwen3 nvfp4 dgx-spark dflash

Updated Jun 28, 2026
Python

0xSero / glm-5.2-sm120

Sponsor

Star

GLM-5.2-NVFP4-REAP-469B serving on SM120 (4× RTX PRO 6000 Blackwell) — one-command vLLM launch recipe, 250K context, DeepSeek Sparse Attention + MTP speculative decode

moe glm reap blackwell vllm llm-inference sm120 nvfp4 rtx-pro-6000

Updated Jun 19, 2026
Shell

BenChaliah / NVFP4-on-4090-vLLM

Star

AdaLLM is an NVFP4-first inference runtime for Ada Lovelace (RTX 4090) with FP8 KV cache and custom decode kernels. This repo targets NVFP4 weights and keeps the entire decode path in FP8

gpu-acceleration gpu-computing inference-engine nvidia-gpu llm nvfp4

Updated Feb 15, 2026
Python

taishan1994 / LLM-Quantization

Star

记录量化LLM中的总结。

quantization llm gptq quarot qwen3 nvfp4

Updated Jan 8, 2026
Python

AEON-7 / comfyui-aeon-spark

Star

Bleeding-edge ComfyUI for NVIDIA DGX Spark (GB10/Blackwell/sm_121a). CUDA 13 + SageAttention v3 (sm_121a) + NVFP4 + 14 custom-node packs + Flux 2 Dev / LTX 2.3 22B / ACE-Step v1.5 XL Turbo pre-bundled with abliterated text-encoder paths.

docker flux blackwell comfyui sageattention ltx-video ace-step nvfp4 dgx-spark sm-121a

Updated Jun 28, 2026
Shell

AEON-7 / vllm-dflash

Star

DFlash vLLM for DGX Spark — Plug & Play Block-Diffusion Speculative Decoding

docker inference nvidia blackwell llm vllm qwen speculative-decoding block-diffusion nvfp4 dgx-spark dflash

Updated Jun 28, 2026
Python

ChiefNakor / comfyui-blackwell-docker

Star

A production-ready Docker setup for ComfyUI that unlocks the full potential of NVIDIA Blackwell GPUs (RTX 50 series) through 4-bit quantization with NVFP4.

docker pytorch nvidia image-generation nvidia-cuda ai-art stable-diffusion comfyui flux-ai nvidia-blackwell nvfp4

Updated Jan 28, 2026
Dockerfile

kekzl / imp

Star

From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoning and concurrent sub-agents on top of the fastest single-stream decode on the 5090 (beats llama.cpp, at-or-ahead of vLLM on NVFP4). 100% written by Claude Code.

Updated Jun 30, 2026
Cuda

actypedef / ARCQuant

Star

[ACL 2026 Main] Code for the paper "ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs"

quantization mixed-precision blackwell llm llm-inference microscaling nvfp4

Updated Jun 1, 2026
Cuda

croll83 / llama.cpp-dgx

Star

llama.cpp fork optimized for NVIDIA DGX Spark / GB10 (Blackwell, SM 12.1) — TurboQuant weights + KV, NVFP4, DFlash MTP

blackwell llama-cpp speculative-decoding gb10 nvfp4 dflash turboquant

Updated May 26, 2026
C++

waybarrios / dgx-spark-finetune-llm

Star

LLM fine-tuning with LoRA + NVFP4/MXFP8 on NVIDIA DGX Spark (Blackwell GB10)

deep-learning pytorch nvidia lora quantization fine-tuning blackwell llm nvfp4 dgx-spark transformer-engine mxfp8

Updated Dec 22, 2025
Python

lna-lab / blackwell-geforce-nvfp4-gemm

Star

NVFP4 inference on Blackwell GeForce (RTX 5090/5080/5070 Ti/RTX PRO 6000) — SM120 patches for vLLM + FlashInfer + CUTLASS. 175 tok/s on Qwen3.6-35B MoE.

gpu-computing quantization cutlass gemm geforce blackwell vllm llm-inference flashinfer rtx-5090 sm120 nvfp4

Updated Apr 27, 2026
Python

Sggin1 / DGX-SPARK

Star

DGX Spark research and tests - containers, benchmarks, and investigation notes for running models on GB10 (SM 12.1)

aarch64 blackwell kv-cache vllm nvfp4 dgx-spark mamba-ssm sm121 turboquant

Updated Jun 6, 2026
Python

lna-lab / gemma4-12b-vllm-sm120

Star

Reproducible recipe: serve abliterated Gemma-4-12B (gemma4_unified) at 50-118 tok/s on no-NVLink Blackwell (SM120) via vLLM nightly + ModelOpt FP8/NVFP4 + MTP spec-decode.

quantization gemma blackwell fp8 vllm speculative-decoding sm120 nvfp4 abliterated gemma-4 modelopt tensor-parallel

Updated Jun 7, 2026
Python

sayakpaul / diffusers-blackwell-quants

Star

Easy recipes to speed up latency of Flux, QwenImage, and LTX-2 with NVFP4 and MXFP8 on Blackwell.

pytorch image-gen diffusers video-gen torchao blackwell-gpu nvfp4 mxfp8

Updated Apr 10, 2026
Python

Improve this page

Add a description, image, and links to the nvfp4 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nvfp4 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvfp4

Here are 54 public repositories matching this topic...

NVlabs / Sana

NVlabs / LongLive

intel / auto-round

NVIDIA / cudnn-frontend

Avarok-Cybersecurity / atlas

AEON-7 / Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash

0xSero / glm-5.2-sm120

BenChaliah / NVFP4-on-4090-vLLM

taishan1994 / LLM-Quantization

AEON-7 / comfyui-aeon-spark

AEON-7 / vllm-dflash

ChiefNakor / comfyui-blackwell-docker

kekzl / imp

actypedef / ARCQuant

croll83 / llama.cpp-dgx

waybarrios / dgx-spark-finetune-llm

lna-lab / blackwell-geforce-nvfp4-gemm

Sggin1 / DGX-SPARK

lna-lab / gemma4-12b-vllm-sm120

sayakpaul / diffusers-blackwell-quants

Improve this page

Add this topic to your repo