kreuzberg

kreuzberg is a polyglot document intelligence framework with a Rust core; extracts text, metadata, and structured data from 88+ formats including PDF, Office, images, and archives; available as a library (Java, Python, TypeScript, Go, Ruby, C#, PHP, Elixir), CLI, REST API, or MCP server
parsing, content-extraction, rust
tika