This is a command-line Ruby script that scrapes the UNECE website to extract UN/LOCODEs (United Nations Codes for Trade and Transport Locations) for countries and territories. The script uses the Kimurai framework for web scraping.
- Scrapes data from the UNECE website for UN/LOCODEs.
- Outputs results to a CSV file.
- Supports appending data to an existing CSV file.
- Can run in test mode without actual scraping or saving data.
- Includes detailed logging for debugging and additional information.
- Performs validation checks on the data while scraping tables.
- Ruby 2.6 or higher.
- The following Ruby gems:
kimurai
(version 1.4.0)
Make sure you have Ruby installed on your system. You can check if Ruby is installed by running:
ruby --version
If Ruby is not installed, download and install Ruby.
Then, install the required dependencies by running:
gem install kimurai
Clone the repository or download the script file location_codes_scraper.rb
from the repository.
git clone https://github.com/your-repo/location-codes-scraper.git
cd location-codes-scraper
Alternatively, if you're just using the script file, download it and place it in your desired directory.
Once the script is downloaded and dependencies are installed, you can run it with various options.
ruby location_codes_scraper.rb [options]
-
Run the script in test mode (no data saved, allowing you to verify the availability and structure of the source tables on the UNECE website):
ruby location_codes_scraper.rb --test
These revisions enhance readability while preserving the original meaning.
-
Run with detailed logging:
ruby location_codes_scraper.rb --verbose
-
Save the results to a specific file:
ruby location_codes_scraper.rb --path /path/to/output.csv
-
Append results to an existing file:
ruby location_codes_scraper.rb --append --path /path/to/output.csv
You can automate the script execution by adding it to a cron job (on Linux/Mac) or Task Scheduler (on Windows) if you want to run the script regularly.
- The
--path
argument specifies the file where the results will be saved. By default, this islocation_codes.csv
. - The script checks and processes tables on the UNECE website. If the structure of the tables changes, an error message will be displayed.
This script is open-source and licensed under the MIT License.