Beyond PDF: How We Built the Fastest Office-to-Image Converter for AI Document Processing

If you have ever tried to automate document workflows, you know the first question everyone asks: "Does it handle PDFs?" The answer is almost always yes. PDF is the lingua franca of document automation. Every tool, every API, every library speaks it fluently.

But here is the thing: offices do not run on PDFs. They run on Word documents, Excel spreadsheets, and PowerPoint decks. The contract your legal team drafted is a DOCX. The financial model your CFO built is an XLSX. The board presentation your CEO is reviewing is a PPTX. These are the files that pile up in shared drives, clog inboxes, and resist every attempt at organization. And when it comes to AI-powered automation, they are second-class citizens.

The reason is technical. Large language models and vision models cannot read Office formats directly. A DOCX file is a ZIP archive containing XML fragments, embedded images, style definitions, and relationship metadata. An XLSX file is a collection of shared string tables, sheet XML, and calculation chains. To make AI understand these documents, you need to convert them into something a model can process: either extracted text (lossy, misses layout and visual context) or rendered images (preserves the document as a human sees it).

The conversion options are not great. LibreOffice in headless mode takes 2-5 seconds per document, requires a 500MB+ Docker image, and crashes on complex formatting. Cloud conversion APIs add network latency and per-page costs that make batch processing expensive. Browser-based rendering with Playwright or Puppeteer works but carries the overhead of a full browser runtime: memory hungry, slow to start, and difficult to scale in containerized environments.

At the volumes we process at Renamed.to, these bottlenecks are disqualifying. When a user uploads 50 Word documents expecting AI-powered filenames in seconds, a 3-second-per-file conversion step means two and a half minutes of waiting before the AI even starts. That is not a product. That is a loading screen.

What We Shipped

Starting today, Renamed.to accepts DOCX, DOC, PPTX, XLSX, and XLS files alongside the PDFs, JPGs, PNGs, and TIFFs you already know. Upload any Office document through the web renamer, and you get AI-powered naming in seconds. The same confidence preview that shows you exactly what the AI understood. The same folder organization that routes files into the right directory structure. The same naming templates that enforce your team's conventions.

The file size limit is 100 MB per file. We render up to 3 pages, slides, or sheets per document and send those rendered images to our vision AI for analysis. For most business documents, 3 pages is more than enough context for accurate naming: the title page of a presentation, the header rows of a spreadsheet, the first page of a contract.

This is available on the web renamer today. Cloud connectors (Google Drive, Dropbox) continue to support PDFs and images for now, with Office format support coming in a future update.

Who This Is For

We built this for the teams who spend hours every week manually renaming and filing Office documents. Here are three workflows we see constantly.

HR Teams: Resumes and Cover Letters

Recruiting generates a firehose of Word documents. Candidates submit files named whatever they feel like, and HR assistants rename them manually to maintain any semblance of order.

Before:
Document(3).docx
JohnSmith_resume_final_v2.docx
my resume 2026.docx
Cover Letter - Copy.docx

After:
2026-02-15_John-Smith_Resume_Senior-Engineer.docx
2026-02-15_John-Smith_Cover-Letter_Senior-Engineer.docx
2026-02-14_Maria-Garcia_Resume_Product-Manager.docx
2026-02-14_Maria-Garcia_Cover-Letter_Product-Manager.docx

Finance Teams: Spreadsheet Reports

Finance teams export Excel files from accounting systems, ERP platforms, and internal dashboards. The exports arrive with system-generated names that tell you nothing about what is inside.

Before:
Q4_numbers_FINAL.xlsx
Copy of Budget 2026.xlsx
export_20260131_report.xlsx
Sheet1.xlsx

After:
2026-01-31_Finance_Budget-Report_Q4-2025.xlsx
2026-01-31_Finance_Revenue-Summary_Q4-2025.xlsx
2026-02-01_Accounting_Expense-Report_January-2026.xlsx
2026-01-15_Finance_Cash-Flow-Forecast_2026.xlsx

Marketing Teams: Presentation Decks

Marketing produces PowerPoint decks for every campaign, pitch, and internal review. Decks multiply through versioning ("v4_APPROVED"), and within weeks nobody can find the final version of anything.

Before:
pitch_deck_v4_APPROVED.pptx
client_presentation_NEW.pptx
Q1 Campaign Review FINAL FINAL.pptx
deck_for_monday.pptx

After:
2026-02-20_Acme-Corp_Campaign-Pitch_Q1-2026.pptx
2026-02-18_Widget-Inc_Product-Demo_February-2026.pptx
2026-02-15_Internal_Q1-Campaign-Review_2026.pptx
2026-02-10_Board_Strategy-Presentation_Q1-2026.pptx

In each case, the AI reads the document content, extracts the key entities (names, dates, document types, organizations), and assembles a filename that follows your naming convention. The confidence preview lets you verify every suggestion before committing. If the AI gets something wrong, you correct it once and it learns for next time.

How It Works: The Office-to-Image Pipeline

When you upload an Office document, it passes through a four-phase pipeline that converts it into rendered images suitable for vision AI analysis. The entire pipeline runs in TypeScript with native bindings, no external processes, no browser runtime, and no cloud conversion APIs.

Phase 1: Parse

Modern Office formats (DOCX, PPTX, XLSX) are ZIP archives containing XML. We unzip the archive in memory using a streaming decompressor and parse the XML with fast-xml-parser, which gives us a semantic model of the document: paragraphs, runs, table cells, slide shapes, sheet data. For legacy binary formats (DOC, XLS), we use specialized binary parsers that extract the same semantic model from Microsoft's older compound document format.

Phase 2: Layout

The parsed semantic model needs to be positioned on a virtual page. This is where most of the complexity lives. For Word documents, we compute line breaks, paragraph spacing, table column widths, and page boundaries. For presentations, we resolve shape positions, text anchoring, and slide dimensions. For spreadsheets, we calculate column widths from cell content, apply conditional formatting rules, and determine row heights.

A critical optimization here is our use of cumulative width arrays for text layout. Instead of calling measureText() for every character combination during line breaking, we precompute the width of every character in a run and store cumulative sums. Finding the break point for a line becomes a binary search over the array instead of repeated text measurement. This single optimization cut our text layout time by roughly 40%.

Phase 3: Paint

With positions calculated, we render each page to an HTML5 canvas using @napi-rs/canvas, a native Node.js binding to the Skia graphics engine. Each page, slide, or sheet is rendered independently, which means we can parallelize rendering across available CPU cores. A 24-page Word document renders 8-10 pages simultaneously on a typical server, rather than sequentially one at a time.

Phase 4: Encode

The rendered canvases are compressed to JPEG (for photographic content) or PNG (for text-heavy documents) using sharp, the high-performance image processing library built on libvips. Sharp handles encoding 7.6x faster than the native canvas.encode() method for PNG output. We cap encoding concurrency to prevent CPU thrashing when processing large batches in server mode.

Once the first 3 pages are rendered, they are sent to our vision AI, which analyzes the visual content, reads text, identifies document types, extracts entities, and generates a structured filename plus folder suggestion. The entire process, from DOCX upload to AI-generated filename, takes a few seconds.

Why We Built It From Scratch

We did not start by writing our own converter. We started by evaluating every existing option, hoping to find something that met our requirements: sub-second per document, minimal Docker image footprint, TypeScript-native for our stack, and no external service dependencies.

LibreOffice headless was the first candidate. It renders Office documents with near-perfect fidelity because it is, well, a full office suite. But it takes 1-2 seconds just to start up, 2-5 seconds per document to render, and the Docker image balloons to 500MB+. In a serverless or containerized environment, cold starts are brutal. And LibreOffice is written in C++, which means debugging rendering issues requires diving into a foreign codebase with limited documentation.

Aspose offers excellent fidelity and a proper API, but it is a commercial library targeting .NET and Java. Integrating it into a TypeScript/Bun stack would require a sidecar process, adding operational complexity. The licensing costs are also significant at scale.

Playwright / Puppeteer can render Office documents by opening them in a browser. The fidelity is excellent (browsers are good at rendering things), but the overhead is substantial: a headless browser process consuming 200-400MB of memory per instance, slow startup times, and the fragility of browser automation in production.

“We do not need pixel-perfect rendering. We need the AI to understand what the document is about. An 85-90% faithful image that renders in 25 milliseconds per page beats a 100% faithful image that takes 3 seconds. The AI does not care about subtle kerning differences or exact gradient stops. It cares about text content, document structure, and visual layout.”

— Internal architecture decision record

That insight unlocked the entire approach. By accepting 85-90% visual fidelity instead of demanding pixel-perfect reproduction, we could build a dramatically simpler and faster converter. We parse the Office XML directly with fast-xml-parser, render to native canvas via @napi-rs/canvas, and encode with sharp. No browser. No LibreOffice. No external service. The entire converter is pure TypeScript with native bindings, runs in the same process as our application, and deploys in a Docker image that adds less than 50MB to our base.

Over 279 commits, we built parsers for DOCX, DOC, PPTX, XLSX, and XLS. Each format has its own quirks: Word's complex paragraph numbering system, PowerPoint's shape inheritance hierarchy, Excel's shared string table optimization. But the architecture is consistent across all formats: parse to semantic model, lay out on virtual pages, paint to canvas, encode to image.

Performance

Numbers matter more than claims, so here is what we measured on a standard 4-core server instance (the same hardware our production workers run on).

24-page Word document (typical contract): ~600ms end-to-end (parse through JPEG encode)
100-page Word document (large report): ~1.5 seconds end-to-end
50-slide PowerPoint deck: ~800ms end-to-end
Multi-sheet Excel workbook (5 sheets, 10,000 rows): ~400ms end-to-end

For context, the same 24-page Word document takes 3.2 seconds in LibreOffice headless and 4.8 seconds through a browser-based renderer. Our converter is 5-8x faster for the documents that matter most in business workflows.

The key optimizations that got us here:

Cumulative width arrays for text layout eliminated thousands of measureText() calls per document, cutting layout time by ~40%.
Parallel page rendering distributes canvas work across CPU cores, yielding 8-10x speedup on multi-page documents compared to sequential rendering.
Sharp encoding replaces the native canvas PNG encoder with libvips, producing identical output 7.6x faster.
Encode concurrency caps prevent CPU thrashing when multiple documents are being processed simultaneously. We found that capping encode parallelism at 4 concurrent operations (regardless of available cores) produced the best throughput under load.
Streaming decompression for ZIP archives means we never hold the entire uncompressed document in memory. For a 50MB PPTX with embedded images, peak memory usage stays under 200MB.

In production, the converter runs as a worker pool with subprocess isolation. Each worker processes one document at a time, with configurable concurrency at the pool level. If a document triggers an out-of-memory error or a parsing crash, only that worker restarts. The pool continues serving other requests without interruption.

Supported Formats and Known Limitations

We believe in being transparent about what works and what does not. Here is the current state of format support.

Full Support

DOCX (Word 2007+): Text, paragraphs, tables, lists, inline images, styles, headers (document headers, not page headers), numbering.
PPTX (PowerPoint 2007+): Slides, text shapes, tables, charts (as rendered images), background fills, slide layouts.
XLSX (Excel 2007+): Cell data, formulas (calculated values), conditional formatting (basic rules), column/row sizing, merged cells, number formats.

Legacy Format Support

DOC (Word 97-2003): Text extraction and basic formatting. Tables and images are partially supported. Binary format parsing covers the most common structures but does not handle every edge case in Microsoft's legacy specification.
XLS (Excel 97-2003): Cell data and basic formatting. Conditional formatting and charts are not supported in legacy format. If you have XLS files, we recommend converting them to XLSX for best results.

Known Limitations

Page headers and footers: Not rendered. These typically contain repeating content (page numbers, company logos) that does not affect AI naming accuracy.
Anchored and floating images: Images positioned with text wrapping (anchored to paragraphs or pages) are not yet rendered. Inline images work correctly.
Advanced conditional formatting: Complex rules (data bars, icon sets, color scales) are partially supported. Simple rules (cell highlighting based on value) work correctly.
SmartArt and complex diagrams: These are rendered as placeholder shapes rather than fully resolved graphics.
Macros and VBA: Ignored entirely. We parse document content only, never execute code.

Here is the important context for these limitations: none of them affect AI naming accuracy. The AI generates filenames based on document content, structure, and context. A missing page footer or an unrendered floating image does not change whether the AI can identify the document as "Q4 2025 Budget Report" or "John Smith Resume". We optimized for the information that matters to naming, not for print-ready reproduction.

What Comes Next

Office format support on the web renamer is step one. Here is what we are working on next:

Cloud connector support: Google Drive and Dropbox connectors will gain Office format support, so your automated workflows can process Word, Excel, and PowerPoint files without manual upload.
API access: The REST API will accept Office formats with the same interface you already use for PDFs and images. Upload a DOCX, get back a suggested filename and folder path.
Floating image rendering: Our most requested improvement for document fidelity. Anchored images with text wrapping are complex to lay out correctly, but we are making progress.
Header and footer rendering: For documents where repeating content carries meaningful information (letterheads with different branch offices, for example).

We ship these improvements incrementally. You do not need to change anything in your workflow. As we improve the converter, your existing uploads benefit automatically.

The Bigger Picture

When we started Renamed.to, the pitch was simple: upload a file, get a good filename. But as our users grew, we kept hearing the same request. "This is great for PDFs. What about my Word documents?" "Can I upload Excel files?" "We have hundreds of PowerPoint decks that need organizing."

The answer was always "not yet" because the conversion technology was not there. The existing tools were too slow, too heavy, or too expensive for the experience we wanted to deliver. So we built our own.

279 commits. Parsers for 5 formats. A layout engine that handles paragraphs, tables, slides, and spreadsheets. A rendering pipeline that processes 24 pages in 600 milliseconds. All in TypeScript, all running in the same process as our application, all deployable in a minimal container.

Was it the right trade-off? We think so. The alternative was telling our users to convert their files to PDF first, which is the same answer every other tool gives. We wanted to do better.

Upload a Word document. Upload an Excel spreadsheet. Upload a PowerPoint deck. Get an AI-powered filename in seconds. Your existing workflow just got wider.