All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog,
and this project adheres to Semantic Versioning
from version 5.0.0 onward. Pre-fork releases (1.x–4.2.0) were authored by
kherud/java-llama.cpp.
CODE_OF_CONDUCT.md(Contributor Covenant 2.0).docs/RELEASE.mdcapturing the maintainer-facing release procedure (moved out of CHANGELOG).- OpenSSF Best Practices badge (project 12862) on README.
- Unified
CONTRIBUTING.mdandSECURITY.mdstructure with sibling repositories in the project family. - Reconciled Java baseline to 11+ across
pom.xml, README badge,CLAUDE.md, andCONTRIBUTING.md. - README license badge corrected from "Apache 2.0" to "MIT" (matches
LICENSEfile andpom.xml). pom.xmlSCM URL:tree/master→tree/main(default branch renamed).- Upgraded llama.cpp from b9151 to b9172.
- Reasoning-budget tests (Qwen3-0.6B).
5.0.1 - 2026-05-14
InferenceParameters.setContinueFinalMessage(boolean)for the vLLM/transformers-compatible prefill-assistant heuristic (llama.cpp b9134+).- Tests for
setContinueFinalMessage. - Comprehensive Javadoc on public APIs (PR #129).
- Maven Central badge on README (PR #130).
- Bumped project version to 5.0.1-SNAPSHOT (PR #127), then released as 5.0.1 (PR #135).
- Refactored GitHub release workflow to parallelise snapshot and release jobs (PR #128).
- Removed snapshot build documentation and badge (PR #131).
- Upgraded Windows CI to
windows-2025with Visual Studio 2026 (PR #132). - Switched Windows MSVC runtime from dynamic (
/MD) to static (/MT) to eliminate themsvcp140.dllruntime dependency (PR #133). - Upgraded llama.cpp from b9106 to b9134 (PR #134), then to b9150 (PR #136), then to b9151 (PR #139).
- Refactored CI workflow with explicit snapshot/tag check gates (PR #137).
- Removed
setCtxSizeDraft()— the underlying CLI flag was deleted upstream in llama.cpp b9106.
fix(publish):quoted gate job names to avoid YAML colon-in-scalar parse errors (PR #138).- Release routing in the publish workflow now correctly distinguishes snapshot vs. tag pushes.
5.0.0 - 2026-05-11
First release of the fork under the net.ladenthin:llama Maven coordinates. ~100 merged pull requests since baseline 49be664 (the last pre-fork upstream commit).
- First publish to Maven Central under
net.ladenthin:llama. - Pre-built native libraries for Linux (x86-64, aarch64), macOS (x86-64, arm64), and Windows (x86-64, x86).
- Java API surface:
LlamaModel,ModelParameters,InferenceParameters,LlamaIterator/LlamaIterablefor streaming, chat completion (chatComplete,generateChat,chatCompleteText), embeddings, reranking, infilling, raw JSON endpoint handlers, slot management (saveSlot,restoreSlot,eraseSlot), andgetModelMeta(). chatComplete()for OpenAI-compatible chat completions, re-implemented from scratch based on a patch by @vaiju1981 (PR #61; seedocs/history/CHAT_INTEGRATION_SUMMARY.md).mmproj, reasoning-budget, sigma, and sleep-idle parameters added toModelParameters.- JaCoCo code-coverage reporting integrated with Coveralls and Codecov (PR #124).
- CodeQL static-analysis workflow on push, PR, and a weekly schedule.
- Automated Claude Code review workflow on pull requests.
- Dependabot for Maven and GitHub Actions dependency updates.
- Automatic snapshot release workflow on
mainpush (PR #105) publishing to the Sonatype Central snapshot repository. - CUDA, Metal, and Vulkan build support via local CMake build.
- Android integration documented in README.
- All system properties (
net.ladenthin.llama.*) andLogLevelvalues documented. CLAUDE.mdmaintainer guide covering upstream upgrade procedure and the b5022→b9172 breaking-change table.
- Migrated Maven group and artifact from
de.kherud:java-llama.cpptonet.ladenthin:llama(PR #101). - Migrated Maven Central publishing from OSSRH (Legacy) to the Sonatype Central Publisher Portal.
- Deleted the hand-ported
server.hppfork (~3,780 lines) and linked the upstreamllama.cppserver source files directly intojllama. ~4,100 C++ lines removed in total; future upstream upgrades become a CMake version bump. The Java API is unchanged. Seedocs/history/REFACTORING.md. - Compiled upstream server-context / queue / task / models directly into jllama (PR #96).
- Unified CI into a single
publish.ymlworkflow with cross-compilation, testing, coverage, and release stages. - Upgraded CUDA from 12.1 to 13.2 (PR #50).
- Upgraded llama.cpp from b8913 through b9106 across multiple incremental upgrades.
setDraftMax/setDraftMinnow emit the canonical--spec-draft-n-max/--spec-draft-n-minflags (llama.cpp b9016 removed the old aliases).- Bumped CI GitHub Actions:
actions/checkoutv4 → v6,actions/upload-artifactv6 → v7,actions/download-artifactv6 → v8,codeql-actionv3 → v4.
- Javadoc warnings resolved across the public API by adding missing comments.
cache_idle_slotsslot-parameter handling aligned with the upstream rename (b8841 → b8854).
Releases 1.1.1 through 4.2.0 were authored by @kherud on the upstream repository. The full upstream release notes are at
https://github.com/kherud/java-llama.cpp/releases. The fork's baseline is upstream commit 49be664 (tagged v4.2.0, 2025-06-20).
For an architecture-level diff between the pre-fork baseline (49be664) and the first 5.0.0 candidate (24918e4), see docs/history/49be664_24918e4.md. For the server-fork-deletion refactor that culminated in 5.0.0, see docs/history/REFACTORING.md. For the chat-completion integration that landed in 5.0.0, see docs/history/CHAT_INTEGRATION_SUMMARY.md.