Skip to content

Implementation of msgcat and msgmerge utilities from GNU gettext #1161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

soft-suroleb
Copy link

Implementation of some features for msgcat and msgmerge to work with a compendium in PyBabel

pybabel concat

  • Allows concatenating multiple .po files into one.
  • In case of translation conflicts, the translation from the first file is taken, as if the use-first option is set to true. Flags, locations, and other information in messages are combined.
  • Implemented options:
    • output-file
    • less-than
    • more-than
    • unique
    • use-first
    • no-location
    • width
    • no-wrap
    • sort-output
    • sort-by-file

pybabel merge

  • Enables merging files using a compendium for translation memory.
  • The compendium can be used in two modes:
    • Default mode: Translations from the compendium are taken if absent in the output file.
    • Compendium overwrite mode: With the compendium-overwrite option, translations in the compendium are considered primary and overwrite those in the output file. If a translation is taken from the compendium, a comment is added specifying the source.
  • Implemented options:
    • input-files
    • compendium
    • compendium-overwrite
    • no-compendium-comment
    • update
    • output-file
    • backup
    • suffix
    • no-fuzzy-matching
    • no-location
    • width
    • no-wrap
    • sort-output
    • sort-by-file

…pybabel

 * Define the MessageConcatenation class to mimic the functionality of GNU gettext's msgcat

 * Define the MessageMerge class to mimic the functionality of GNU gettext's msgmerge

 * Implement placeholders for the main interface functions
 * Add validation for main msgcat options - input_files, output_file

 * Temporarily set use_first option to true to avoid handling cases with different translations for the same messages
 * Implement options unique, less-than, and more-than, and validate their dependencies with each other.
   * These options specify which messages to include in the output file.

 * Implement and validate options no-wrap and width.

 * Create a helper function _prepare that collects data on message occurrences across different catalogs.

 * Mark options that are already implemented #
 * Implement basic functionality of msgmerge

 * Use and validate the main options: input-files and output-file

 * Use and validate options: no-wrap and width

 * Use and validate options: sort-output and sort-by-file, both in msgmerge and msgcat

 * In the basic version of working with a compendium, a translation for a message is taken from the compendium only if the resulting catalog lacks a translation.
 * Create basic tests to verify the functionality of msgcat, specifically
   the concatenation of catalogs, merging of message flags, locations, etc.

 * Remove the validation of options sort-output, sort-by-file, unique, use-first,
   as they are initialized in the function initialize_options.
 * Create basic tests to verify the functionality of msgmerge, specifically the merging of messages and their integration with a compendium.

 * Remove the definition of sort-output and sort-by-file, and add an additional check for input-files.
…m handling logic

 * Implement `update` to update the source file instead of writing to the current output file

 * Implement `backup` to save a backup of the source file before making any updates

 * Implement `c_overwrite` to use a new mode of handling the compendium, where translations from the compendium overwrite messages in the output file
 * Implement a test for `msgmerge` that validates the new mode where compendium entries overwrite messages in the output PO file.

 * Include the `no_compendium_comment` option to ensure comments about translations sourced from the compendium are not included.

 * Utilize the `no-location` option to exclude location comments from the output.
 * Implemented a helper function `_get_expected` to standardize the expected PO file structure.
 * Renamed the option `c-overwrite` to `compendium-overwrite`
 * Mark the catalog as fuzzy after msgcat and msgmerge if there is at least one fuzzy message

 * Remove add-location as it's unnecessary
@soft-suroleb
Copy link
Author

Hi guys! I see you marked pr but still haven't left any comments. Will it be reviewed?

Copy link
Member

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Some initial comments within.


def finalize_options(self):
if not self.input_files or len(self.input_files) != 2:
raise OptionError('must be two po files')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the order of the files has some semantics to it? definition file, reference file..?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the first one is a file with obsolete translations, the second one is new actual .pot file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be useful information here in the error message too.

@@ -715,6 +715,431 @@ def test_supports_width(self):
assert expected_content == actual_content


class ConcatanationCatalogTestCase(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test cases could live in new files, I think.

And if you feel like it, they could be Pytest style instead of Unittest?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a new file really needed here? The file is test_frontend.py and all other tests for commands are there and it's logical that they will be there too, no?

soft-suroleb and others added 4 commits March 23, 2025 16:21
* Update _prepare function in ConcatenateCatalog to check conflicting messages and to not parse po-files twice

* Add _conflicts field in Catalog to mark conflicts

* Update tests
* Delete unused options

* Fix multiline options comments

* Replace backup logic in MergeCatalog

* Rename to ConcatenateCatalog
@soft-suroleb
Copy link
Author

Made support for conflicting messages from different po-files during concatenation. If one message has different translations in different files, then the corresponding strings marked with a comment about conflict

@soft-suroleb soft-suroleb requested a review from akx March 30, 2025 13:34
 * Includes .rst file with detailed use cases and practical examples for pybabel's concat and merge utilities, outlining common scenarios, options, and best practices for managing PO files.
Copy link
Member

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round of comments :)

@@ -642,6 +645,37 @@ def _format_comment(comment, prefix=''):
for line in comment_wrapper.wrap(comment):
yield f"#{prefix} {line.strip()}\n"

def _format_conflict_comment(file, project, version, prefix=''):
comment = f"#-#-#-#-# {file} ({project} {version}) #-#-#-#-#"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this standard notation, or are we introducing a Babel-specific comment extension? If the latter, I'd rather not...

Copy link
Author

@soft-suroleb soft-suroleb Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the notation used in gnu gettext

* Rename file_name to filename

* Adding fuzzy flag to message parameterized in 'add_conflict'

* Replace usage scenarious to cmdline.rst

* Rename to ConcatenateCatalog
soft-suroleb and others added 2 commits April 27, 2025 15:38
* Add frozen_time fixture to use freeze_time in every test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants