|
| 1 | +# Openize.MarkItDown |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +Openize.MarkItDown is a Python package that converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with LLMs for extended processing. |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +- Convert `.docx`, `.pdf`, `.xlsx`, and `.pptx` to Markdown. |
| 12 | +- Save Markdown files locally or send them to an LLM for processing. |
| 13 | +- Structured with the **Factory & Strategy Pattern** for scalability. |
| 14 | +- Works with Windows and Linux-compatible paths. |
| 15 | +- Command-line interface for easy use. |
| 16 | + |
| 17 | +## Requirements |
| 18 | + |
| 19 | +This package depends on the Aspose libraries, which are commercial products: |
| 20 | + |
| 21 | +- [Aspose.Words](https://purchase.aspose.com/buy/words/python) |
| 22 | +- [Aspose.Cells](https://purchase.aspose.com/buy/cells/python) |
| 23 | +- [Aspose.Slides](https://purchase.aspose.com/buy/slides/python) |
| 24 | + |
| 25 | +You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms. |
| 26 | + |
| 27 | +## Installation |
| 28 | + |
| 29 | +### From TestPyPI |
| 30 | + |
| 31 | +```sh |
| 32 | +pip install -i https://test.pypi.org/simple/ openize-markitdown |
| 33 | +``` |
| 34 | + |
| 35 | +### From Source |
| 36 | + |
| 37 | +```sh |
| 38 | +git clone https://github.com/openize-com/Openize.MarkItDown.git |
| 39 | +cd Openize.MarkItDown |
| 40 | +pip install -e . |
| 41 | +``` |
| 42 | + |
| 43 | +## Usage |
| 44 | + |
| 45 | +### Command Line Interface |
| 46 | + |
| 47 | +```sh |
| 48 | +# Convert a file and save locally |
| 49 | +markitdown document.docx |
| 50 | + |
| 51 | +# Specify output directory |
| 52 | +markitdown document.docx -o output_folder |
| 53 | + |
| 54 | +# Process with an LLM (requires OPENAI_API_KEY environment variable) |
| 55 | +markitdown document.docx --llm |
| 56 | +``` |
| 57 | + |
| 58 | +### Python API |
| 59 | + |
| 60 | +```python |
| 61 | +from openize.markitdown import DocumentProcessor |
| 62 | + |
| 63 | +# Initialize with custom output directory |
| 64 | +processor = DocumentProcessor(output_dir="my_markdown_files") |
| 65 | + |
| 66 | +# Convert files and save locally |
| 67 | +processor.process_document("document.docx") |
| 68 | +processor.process_document("presentation.pptx") |
| 69 | +processor.process_document("spreadsheet.xlsx") |
| 70 | +processor.process_document("sample.pdf") |
| 71 | + |
| 72 | +# Send to LLM for processing (requires OPENAI_API_KEY environment variable) |
| 73 | +processor.process_document("document.docx", insert_into_llm=True) |
| 74 | +``` |
| 75 | + |
| 76 | +## Environment Variables |
| 77 | + |
| 78 | +- `OPENAI_API_KEY`: Required when using the `insert_into_llm=True` option or the `--llm` flag. |
| 79 | + |
| 80 | +## Running Tests |
| 81 | + |
| 82 | +```sh |
| 83 | +# Install test dependencies |
| 84 | +pip install pytest pytest-mock |
| 85 | + |
| 86 | +# Run the tests |
| 87 | +pytest |
| 88 | +``` |
| 89 | + |
| 90 | +## Contributing |
| 91 | + |
| 92 | +We appreciate your interest in contributing to this project! To ensure a smooth collaboration, please follow these steps when submitting a pull request: |
| 93 | + |
| 94 | +1. **Fork & Clone** – Fork the repository and clone it to your local machine. |
| 95 | +2. **Create a Branch** – Use a new branch for your contribution. |
| 96 | +3. **Sign the Contributor License Agreement (CLA)** – Before your first contribution can be accepted, you must sign our CLA via [CLA Assistant](https://cla-assistant.io). You will be prompted to sign it when submitting your first pull request. You can also review the CLA here: [https://cla.openize.com/agreement](https://cla.openize.com/agreement). |
| 97 | +4. **Submit a Pull Request (PR)** – Once your changes are ready, open a PR with a clear description. |
| 98 | +5. **Review & Feedback** – Our maintainers will review your PR and provide feedback if needed. |
| 99 | + |
| 100 | +By contributing, you agree to the terms of the CLA and confirm that your changes comply with the project's licensing policies. |
| 101 | + |
| 102 | +## License |
| 103 | + |
| 104 | +This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries. |
| 105 | + |
| 106 | +⚠️ Users must obtain a valid license for Aspose libraries separately. This repository does not include or distribute any proprietary components. |
0 commit comments