Skip to content

C++ implementation of an external sort algorithm capable of handling large files with limited memory.

License

Notifications You must be signed in to change notification settings

ge47kuf/cpp-external-sort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C++ External Sort Implementation

Description

This program sorts large text files that may not fit entirely into memory. It reads lines from standard input, sorts them alphabetically (byte-wise, ASCII), and prints the result to standard output. It utilizes the external sort algorithm to handle files larger than the available RAM by using temporary disk storage.

Motivation

This project was implemented as a university assignment focusing on system programming concepts, C++ programming, and handling memory constraints. It demonstrates a practical application of the external sort algorithm.

Features

  • Sorts lines from standard input.
  • Outputs sorted lines to standard output.
  • Supports ascending order (default) and descending order (using the -r flag).
  • Handles large files efficiently using external sort, designed to work within a limited memory budget (e.g., tested with a 128MiB limit).
  • Assumes ASCII encoded input and performs byte-wise comparison (respecting LC_ALL=C locale behavior).

Libraries

  • C++ (using standard library features like <vector>, <string>, <fstream>, <algorithm>, <queue>)

Build Instructions

To compile the program, you need a C++ compiler (like g++ or clang) that supports C++17 features. You can use the provided Makefile:

# Compile the program (creates the executable 'sort')
make

# Alternatively, compile manually (example using g++)
g++ -std=c++17 -Wall -O2 -o sort sort.cpp

## Tests

- `./`
- `./target/release`
- `./target/debug`

where the latter two directories are usually used by Rust's build system
[cargo](https://doc.rust-lang.org/cargo/index.html).

After that it runs individual tests coming from the `tests/` folder (test
scripts are prefixed with `test_`).
Each test program is a python3 script and can be run individually, i.e.:

```console
python3 ./tests/test_sort.py

Usage

Basic Sorting (Ascending):

./sort < input-file

Reverse Sorting (Descending):

./sort -r < input-file

Testing with Memory Limit (Example: 128 MiB):

bash -c 'ulimit -v 131072; ./sort < input-file'

(Note: 131072 corresponds to 128 MiB in kilobytes)

How it Works (External Sort)

The program implements external sort using the following steps:

  1. Read & Sort Chunks: Reads lines [...]
  2. Write Temp Files: Sorts the current chunk [...]
  3. Repeat: Repeats steps 1 and 2 [...]
  4. Merge: Opens all temporary files. Uses a min-heap (std::priority_queue) [...]
  5. Cleanup: Deletes all temporary files [...]

About

C++ implementation of an external sort algorithm capable of handling large files with limited memory.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published