This program sorts large text files that may not fit entirely into memory. It reads lines from standard input, sorts them alphabetically (byte-wise, ASCII), and prints the result to standard output. It utilizes the external sort algorithm to handle files larger than the available RAM by using temporary disk storage.
This project was implemented as a university assignment focusing on system programming concepts, C++ programming, and handling memory constraints. It demonstrates a practical application of the external sort algorithm.
- Sorts lines from standard input.
- Outputs sorted lines to standard output.
- Supports ascending order (default) and descending order (using the
-r
flag). - Handles large files efficiently using external sort, designed to work within a limited memory budget (e.g., tested with a 128MiB limit).
- Assumes ASCII encoded input and performs byte-wise comparison (respecting
LC_ALL=C
locale behavior).
- C++ (using standard library features like
<vector>
,<string>
,<fstream>
,<algorithm>
,<queue>
)
To compile the program, you need a C++ compiler (like g++ or clang) that supports C++17 features. You can use the provided Makefile:
# Compile the program (creates the executable 'sort')
make
# Alternatively, compile manually (example using g++)
g++ -std=c++17 -Wall -O2 -o sort sort.cpp
## Tests
- `./`
- `./target/release`
- `./target/debug`
where the latter two directories are usually used by Rust's build system
[cargo](https://doc.rust-lang.org/cargo/index.html).
After that it runs individual tests coming from the `tests/` folder (test
scripts are prefixed with `test_`).
Each test program is a python3 script and can be run individually, i.e.:
```console
python3 ./tests/test_sort.py
./sort < input-file
./sort -r < input-file
bash -c 'ulimit -v 131072; ./sort < input-file'
(Note: 131072 corresponds to 128 MiB in kilobytes)
The program implements external sort using the following steps:
- Read & Sort Chunks: Reads lines [...]
- Write Temp Files: Sorts the current chunk [...]
- Repeat: Repeats steps 1 and 2 [...]
- Merge: Opens all temporary files. Uses a min-heap (std::priority_queue) [...]
- Cleanup: Deletes all temporary files [...]