TensorSharp

A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides both a console application and a web-based chatbot interface for multi-turn conversations with multimodal models.

Supported Model Architectures

Architecture	Models	Multimodal
Gemma 4	gemma-4-E4B, gemma-4-31B	Image, Video, Audio
Gemma 3	gemma-3-4b, etc.	Image
Qwen 3	Qwen3-4B, etc.	Text only
Qwen 3.5	Qwen3.5-9B, etc.	Image

Compute Backends

Backend	Flag	Description
GGML Metal	`--backend ggml_metal`	GPU-accelerated via Apple Metal (macOS). Recommended for Apple Silicon.
GGML CPU	`--backend ggml_cpu`	CPU inference using the native GGML library with optimized kernels.
Pure C# CPU	`--backend cpu`	Portable CPU inference with no native dependencies.

Project Structure

TensorSharp/
├── TensorSharp/                 # Core tensor library (CPU operations, SIMD)
├── TensorSharp.GGML/            # GGML backend bindings (Metal/CPU via native library)
├── TensorSharp.GGML.Native/     # Native C++ bridge to ggml (builds libGgmlOps)
├── AdvUtils/                    # Utility library
├── InferenceEngine/             # Model loading, tokenization, and inference logic
│   ├── Models/
│   │   ├── Gemma3/
│   │   ├── Gemma4/              # Includes vision encoder, audio encoder
│   │   ├── Qwen3/
│   │   └── Qwen35/
│   ├── GgufReader.cs            # GGUF file parser
│   ├── ModelBase.cs             # Base class for all model architectures
│   ├── ChatTemplate.cs          # Chat template rendering
│   └── MediaHelper.cs           # Video frame extraction (OpenCV)
├── InferenceConsole/            # CLI application
├── InferenceWeb/                # Web chatbot application (ASP.NET Core)
└── ExternalProjects/            # Third-party dependencies (ggml)

Prerequisites

.NET 8.0 SDK or later
macOS (Metal backend): CMake 3.20+ and Xcode command-line tools for building the native GGML library
GGUF model files (e.g., from Hugging Face)

Building

Build the entire solution

dotnet build TensorSharp.slnx

Build individual applications

# Console application
dotnet build InferenceConsole/InferenceConsole.csproj

# Web application
dotnet build InferenceWeb/InferenceWeb.csproj

Build the native GGML library (macOS)

The native library is built automatically during the first dotnet build if it doesn't exist. To build it manually:

cd TensorSharp.GGML.Native
bash build-macos.sh

This compiles libGgmlOps.dylib with Metal GPU support. The build output is automatically copied to the application's output directory.

Usage

Console Application

cd InferenceConsole/bin

# Text inference
./InferenceConsole --model <model.gguf> --input input.txt --output output.txt \
    --max-tokens 200 --backend ggml_metal

# Image inference (Gemma 3/4, Qwen 3.5)
./InferenceConsole --model <model.gguf> --image photo.png --backend ggml_metal

# Video inference (Gemma 4)
./InferenceConsole --model <model.gguf> --video clip.mp4 --backend ggml_metal

# Audio inference (Gemma 4)
./InferenceConsole --model <model.gguf> --audio speech.wav --backend ggml_metal

Command-line options:

Option	Description
`--model <path>`	Path to a GGUF model file (required)
`--input <path>`	Text file containing the user prompt
`--output <path>`	Write generated text to this file
`--image <path>`	Image file for vision inference
`--video <path>`	Video file for video inference
`--audio <path>`	Audio file (WAV, MP3, OGG) for audio inference
`--mmproj <path>`	Path to the multimodal projector GGUF file
`--max-tokens <N>`	Maximum number of tokens to generate (default: 100)
`--backend <type>`	Compute backend: `cpu`, `ggml_cpu`, or `ggml_metal`
`--test`	Run built-in test suite

The multimodal projector file is auto-detected if placed alongside the model file with a recognized name (e.g., gemma-4-mmproj-F16.gguf).

Web Application

cd InferenceWeb/bin

# Set environment variables and run
MODEL_DIR=./models BACKEND=ggml_metal ./InferenceWeb

Open http://localhost:5000 in your browser. The web interface supports:

Multi-turn chat conversations
Model selection from available GGUF files in MODEL_DIR
Image, video, and audio uploads for multimodal inference
Streaming token generation via Server-Sent Events

Environment variables:

Variable	Description
`MODEL_DIR`	Directory containing GGUF model files
`BACKEND`	Compute backend (default: `ggml_metal`)
`PORT`	HTTP port (default: `5000`)

Multimodal Support

Gemma 4

Gemma 4 models support image, video, and audio inputs. Place the multimodal projector (gemma-4-mmproj-F16.gguf) in the same directory as the model file for automatic loading.

Images: PNG, JPEG
Video: MP4 (extracts up to 8 frames at 1 fps using OpenCV)
Audio: WAV (16kHz mono), MP3, OGG Vorbis

Gemma 3 / Qwen 3.5

These models support image inputs with their respective multimodal projector files.

Architecture

TensorSharp is structured as a layered system:

TensorSharp provides the core Tensor type, storage abstraction, and an extensible operation registry (Ops). CPU implementations use System.Numerics.Vectors for SIMD acceleration.
TensorSharp.GGML registers GPU-accelerated implementations of the same operations via a native C++ bridge (libGgmlOps) that links against ggml. On macOS, this provides Metal GPU compute.
InferenceEngine implements model-specific logic: GGUF parsing, tokenization (SentencePiece BPE), chat template rendering, and the forward pass for each architecture. Models are loaded via ModelBase.Create() which auto-detects the architecture from GGUF metadata.
InferenceConsole and InferenceWeb are thin application layers that handle I/O and user interaction.

Author

Zhongkai Fu

License

See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorSharp

Supported Model Architectures

Compute Backends

Project Structure

Prerequisites

Building

Build the entire solution

Build individual applications

Build the native GGML library (macOS)

Usage

Console Application

Web Application

Multimodal Support

Gemma 4

Gemma 3 / Qwen 3.5

Architecture

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
AdvUtils		AdvUtils
ExternalProjects/ggml		ExternalProjects/ggml
InferenceConsole		InferenceConsole
InferenceEngine		InferenceEngine
InferenceWeb		InferenceWeb
TensorSharp.GGML.Native		TensorSharp.GGML.Native
TensorSharp.GGML		TensorSharp.GGML
TensorSharp		TensorSharp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TensorSharp.slnx		TensorSharp.slnx

Folders and files

Latest commit

History

Repository files navigation

TensorSharp

Supported Model Architectures

Compute Backends

Project Structure

Prerequisites

Building

Build the entire solution

Build individual applications

Build the native GGML library (macOS)

Usage

Console Application

Web Application

Multimodal Support

Gemma 4

Gemma 3 / Qwen 3.5

Architecture

Author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages