Skip to content

Enabling rich text comments #1294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rplantiko opened this issue Apr 4, 2025 · 8 comments
Open

Enabling rich text comments #1294

rplantiko opened this issue Apr 4, 2025 · 8 comments

Comments

@rplantiko
Copy link
Contributor

rplantiko commented Apr 4, 2025

This is one followup of issue #1293.

The pure text content of a cell comment is not always completely read by the reader class ZCL_EXCEL_READER_2007, if the comment contains formatted parts.

Even when respecting the current restriction of the reader that rich text formatting is not supported for comments, we should at least get the complete pure text content of the comment (i.e. the concatenation of all its, possibly differently formatted parts).

For example, consider the following comment:

Image

After reading the Excel with class ZCL_EXCEL_READER_2007, the get_text( ) method of the comment object in the comments collection of the worksheet object will yield only the first part of the comment, namely the text Info:

Image

It should give the string Info:\nFür jeden...

Since for achieving this, the parsing of RTF data in the comments part of the Excel file is necessary anyway, I decided to fully implement RTF functionality for comments for reading and writing.

To reproduce the problem:

  • Use notes.xlsx
  • Run ZDEMO_EXCEL37 to read notes.xlsx and write to a new file. Debug the method get_text of ZCL_EXCEL_COMMENT for the comment of cell A1 to see the problem.
@rplantiko
Copy link
Contributor Author

I am currently fixing this on our system. The cause is that method zcl_excel_reader_2007->load_comments does only recognize directly adjacent <t> elements in the comments section: it locates the first <t> element with method find_from_name_ns, but after having processed it, it proceeds to the next sibling element of <t> by calling get_next( ). Instead, method get_elements_by_tag_name_ns should be used to iterate over all child elements of the comment node.

@rplantiko
Copy link
Contributor Author

I am currently adding rich text functionality to the comments.
However, also the positioning and width of the comment boxes is overwritten by always the same default values. So also this positions have to be read in. They are stored in a different file, /xl/drawings/vmlDrawing1.vml which hasn't been read in before.

@rplantiko
Copy link
Contributor Author

rplantiko commented Apr 9, 2025

Hi Sandra, in your comment to PR #1297 you mentioned

There are many modifications in the PR, which make the PR difficult to approve, like Italian texts which we won't merge, improvements not related to the problem like default parameter values as constants, etc.

As for the italian texts: we don't want them either, but they are present in the main branch of abap2xlsx/abap2xlsx , unfortunately. I have closed PR #1297 and work with PR #1298 meanwhile, which doesn't contain these texts. But they should be removed sometime from the main repo, maybe with a dedicated issue.

As for the default parameters as constants: this is indeed part of the development.

Changes concerning the usage of comments

  • Class zcl_excel_comment now contains a structured private attribute gs_box for the eight anchor parameters determining the bounding box of a comment. The structure replaces eight separate attributes that were used before.
  • There is a global public constant gc_default_box containing the former default values for these parameters.
  • Class zcl_excel_comment also contains a new attribute gt_rtf containing the rtf parts of the comment. Each entry consists of a raw text and a component rtf with the style information (like e.g. bold, the font name etc.) This attribute completely replaces the former attribute text.
  • The method zcl_excel_comment->get_text retrieves the plain text which is the concatenation of all the text parts in table gt_rtf.
  • The method zcl_excel_comment->set_text was changed so that it works like before ("add a simple text note and dont't care about styles"), so that the demo report ZDEMO_EXCEL_COMMENTS works as before.
  • The same method zcl_excel_comment->set_text provides a new optional import parameter is_style - if supplied, it will be used to set the style of the supplied raw text ip_text.
  • With successive calls of zcl_excel_comment->set_text, differently styled text parts can be added to the total text of the comment.
  • As an alternative to these successive calls of set_text, an internal table containing all the parts of the text can be passed at once using a new public method zcl_excel_comment->set_text_rtf.
  • Class zcl_excel_comment has a new public static method get_default_style which yields the style data which were formerly the fixed style attached to comments (Tahoma 9pt bold).
  • Class zcl_excel_comment has a new method set_box to define the dimensions of the comment's bounding box.
  • Method get_default_style and constant gc_default_box are public so that a user can get and modify default values for the calls of set_text, set_text_rtf and set_box.

Changes needed for reading and saving comments

  • The changes in classes zcl_excel_reader_2007, zcl_excel_worksheet and zcl_excel_writer_2007 were necessary to propagate the new data from the file into the ABAP layer, and back to the file.
  • In class zcl_excel_comments, there is a new attribute gv_full_vml which is needed to populate the file xl/drawings/vmlDrawing1.vml containing the box information for the comments (but not the styles)
  • For reading this file (which hadn't been read before), the method zcl_excel_reader_2007->load_worksheet had to be extended by a new case when lc_rel_vmldrawing, calling the new method load_comment_boxes to read the box dimensions from there. If the file is not present, the default box dimensions will be used.
  • In load_comment_boxes, the box dimensions for each of the comment will be extracted from the <x:Anchor> elements.
  • The internal table containing the box dimensions for all comments gets passed to the attribute gt_box of the collection class zcl_excel_comments.
  • The document xl/drawings/vmlDrawing1.vml contains further informations beyond the box dimensions which are necessary for identically reproducing the comments. For this reason, the complete VML code of this file is passed as a string to the global attribute gv_full_vml of the collection class zcl_excel_comments.
  • As the instance of zcl_excel_comments is private to zcl_excel_worksheet, the box dimensions and the full vml need to be passed to it through an intermediate method zcl_excel_worksheet->set_boxes. In zcl_excel_comments, the data are stored in the private attributes gt_boxes and gv_full_vml. From there, they will be used later in the writer class to generate the Excel file.
  • The method zcl_excel_reader_2007->load_comments has been refactored and extended to extract the RTF information of the comments file xl/comments1.xml. It results in an internal table lt_rtf which is passed to the newly created zcl_excel_comment object, together with the reference cell lv_ref.
  • Reading the data should be independent of which of the files xl/comments1.xml and xl/drawings/vmlDrawing1.vml is read first. To achieve this, the method set_boxes is called twice:
    • One call is at the end of load_comment_boxes: by which the private attributes gt_boxes and gv_full_vml of zcl_excel_comments will be filled
    • Another call is at the end of load_comments: here, the method is called without passing any parameters. It just serves as a trigger
    • In the implementation set_boxes, the attributes gt_boxes and gv_full_vml are filled if passed. After this, the lines of gt_boxes are passed to the individual comments if present. If successfully passed to the comment, the attribute gt_boxes is reduced line by line until it is empty. The attribute gt_boxes is only an intermediate attribute in this process.
    • This way, the attributes gs_box of the individual comments will always be filled, independent of the order in which the files xl/comments1.xml and xl/drawings/vmlDrawing1.vml are read.

Minor refactorings on the way

  • The part of zcl_excel_reader_2007->load_comments which deals with an individual comment has been factored out into a new method load_single_comment.
  • In zcl_excel_writer_2007->create_xl_comments, a large list of local constants was used for the element names of the XML which has to be generated. These constants are basically useless: not only is their use restricted to this method, but they also obfuscate which elements the create_simple_element calls actually are creating. The reader wants to know this name directly at the point when the element is created, without being referred to a constant. Therefore, I eliminated the constants and used the strings directly in the calls.
  • The same holds for method create_xl_drawing_for_comments.
  • In both of these methods, new elements are created as childs of the document node first (parent = lo_document), and then, at a later point, placed in the DOM tree using a call of append_child. It is clearer to specify the correct parent element from the beginning. This way, also the calls of append_child could be eliminated.

Best regards,
Rüdiger

@sandraros
Copy link
Collaborator

sandraros commented Apr 10, 2025

As I already said, it's difficult to review a PR with many improvements unrelated to the issue. The right way is to keep PRs small, i.e. make one PR for each fix and each enhancement, one PR for each kind of refactoring (no change in feature). If you don't do it, I'll create a PR with the enhancements, and then we'll take care of the fix.

Note that I appreciate a lot your participation, I don't want to refrain you from proposing solutions, it's only to make the work easier for me and other reviewers.

@rplantiko
Copy link
Contributor Author

As I already said, it's difficult to review a PR with many improvements unrelated to the issue. The right way is to keep PRs small, i.e. make one PR for each fix and each enhancement, one PR for each kind of refactoring (no change in feature). If you don't do it, I'll create a PR with the enhancements, and then we'll take care of the fix.

I understood this, and this is why I have closed the PR #1297 . PR #1298 only contains the changes necessary for enabling RTF comments. I have described these changes in the last comment of this issue.

@rplantiko rplantiko changed the title Excel Reader does not read cell comments completely Enabling rich text comments Apr 10, 2025
@abap2xlsx abap2xlsx deleted a comment Apr 10, 2025
@darnoc312
Copy link
Contributor

I guess we should try to gain some benefits from the existing RTF functionality that we know from worksheet's method set_cell.
I suggest a solution that makes it usable even for comments.
The structure ZEXCEL_S_RTF and the table type ZEXCEL_T_RTF should be used for comments, too.
I implemented in my system some methods to be called from both applications (shared strings and comments).
So also the existing code must be changed to call the new methods.
For example:

Even three times in the reader because inlineStr is a tag that could contain richt text, too
Image

Image

Image

And then we can create a ZDEMO_EXCEL48 like this:
You see that the table lt_rtf is used twice for worksheet's set_cell and comment's set_comment (gets a new parameter it_rtf type zexcel_t_rtf)

REPORT  zdemo_excel48.

DATA:
  lo_excel     TYPE REF TO zcl_excel,
  lo_worksheet TYPE REF TO zcl_excel_worksheet,
  lo_style     TYPE REF TO zcl_excel_style,
  lo_comment   TYPE REF TO zcl_excel_comment,
  lv_value     TYPE string,
  lt_rtf       TYPE zexcel_t_rtf,
  ls_rtf       LIKE LINE OF lt_rtf,
  ls_font      LIKE ls_rtf-font.


CONSTANTS:
  gc_save_file_name TYPE string VALUE '48_MultipleStylesInOneCell.xlsx'.

INCLUDE zdemo_excel_outputopt_incl.


START-OF-SELECTION.

  CREATE OBJECT lo_excel.

  lo_worksheet = lo_excel->get_active_worksheet( ).

  lo_style = lo_excel->add_new_style( ).
  ls_font = lo_style->font->get_structure( ).  "Default settings from ZCL_EXCEL_STYLE_FONT's Constructor

  lv_value = 'normal red underline normal red-underline bold italic bigger Times-New-Roman'.

  " red
  ls_rtf-font = ls_font.
  ls_rtf-font-color-rgb = zcl_excel_style_color=>c_red.
  ls_rtf-offset = 7.
  ls_rtf-length = 3.
  INSERT ls_rtf INTO TABLE lt_rtf.

  " underline
  ls_rtf-font = ls_font.
  ls_rtf-font-underline = abap_true.
  ls_rtf-font-underline_mode = zcl_excel_style_font=>c_underline_single.
  ls_rtf-offset = 11.
  ls_rtf-length = 9.
  INSERT ls_rtf INTO TABLE lt_rtf.

  " red and underline
  ls_rtf-font = ls_font.
  ls_rtf-font-color-rgb = zcl_excel_style_color=>c_red.
  ls_rtf-font-underline = abap_true.
  ls_rtf-font-underline_mode = zcl_excel_style_font=>c_underline_single.
  ls_rtf-offset = 28.
  ls_rtf-length = 13.
  INSERT ls_rtf INTO TABLE lt_rtf.

  " bold
  ls_rtf-font = ls_font.
  ls_rtf-font-bold = abap_true.
  ls_rtf-offset = 42.
  ls_rtf-length = 4.
  INSERT ls_rtf INTO TABLE lt_rtf.

  " italic
  ls_rtf-font = ls_font.
  ls_rtf-font-italic = abap_true.
  ls_rtf-offset = 47.
  ls_rtf-length = 6.
  INSERT ls_rtf INTO TABLE lt_rtf.

  " bigger
  ls_rtf-font = ls_font.
  ls_rtf-font-size = 28.
  ls_rtf-offset = 54.
  ls_rtf-length = 6.
  INSERT ls_rtf INTO TABLE lt_rtf.

  " Times-New-Roman
  ls_rtf-font = ls_font.
  ls_rtf-font-name            = zcl_excel_style_font=>c_name_roman.
  ls_rtf-font-scheme          = zcl_excel_style_font=>c_scheme_none.
  ls_rtf-font-family          = zcl_excel_style_font=>c_family_roman.
  " Create an underline double style
  ls_rtf-font-underline       = abap_true.
  ls_rtf-font-underline_mode  = zcl_excel_style_font=>c_underline_double.
  ls_rtf-offset = 61.
  ls_rtf-length = 15.
  INSERT ls_rtf INTO TABLE lt_rtf.

  lo_worksheet->set_cell(
    ip_column = 'B'
    ip_row    = 2
    ip_style  = lo_style->get_guid( )
    ip_value  = lv_value
    it_rtf    = lt_rtf ).

  lo_comment = lo_excel->add_new_comment( ).
  lo_comment->set_text( ip_ref          = 'B2'
                        ip_text         = lv_value
                        it_rtf          = lt_rtf
                        ip_right_column = 12
                        ip_bottom_row   = 10 ).
  lo_worksheet->add_comment( lo_comment ).

*** Create output
  lcl_output=>output( lo_excel ).

with the following result:

48_MultipleStylesInOneCell_with_comment.xlsx

And of course i can read Rüdiger's example excel, but the comment gets the standard box with the shadow, that we know from ABAP2XLSX. I guess that is sufficient.

37-psthr-notes.xlsx

If you agree with this solution i would propose this soon in at least two steps:

  1. changes in writer class and comment class to achive the ZDEMO_EXCEL48 result above
  2. changes in reader class to get it.

@sandraros
Copy link
Collaborator

Current PR #1298. I agree with @darnoc312 that it's best to reuse and put in common CHECK_RTF of ZCL_EXCEL_WORKSHEET. The demo program looks good, but I would suggest to enhance ZDEMO_EXCEL_COMMENTS instead of creating a new program.

METHOD check_rtf.
DATA: lo_style TYPE REF TO zcl_excel_style,
lo_iterator TYPE REF TO zcl_excel_collection_iterator,
lv_next_rtf_offset TYPE i,
lv_tabix TYPE i,
lv_value TYPE string,
lv_val_length TYPE i,
ls_rtf LIKE LINE OF ct_rtf.
FIELD-SYMBOLS: <rtf> LIKE LINE OF ct_rtf.
IF ip_style IS NOT SUPPLIED.
ip_style = excel->get_default_style( ).
ENDIF.
lo_iterator = excel->get_styles_iterator( ).
WHILE lo_iterator->has_next( ) = abap_true.
lo_style ?= lo_iterator->get_next( ).
IF lo_style->get_guid( ) = ip_style.
EXIT.
ENDIF.
CLEAR lo_style.
ENDWHILE.
lv_next_rtf_offset = 0.
LOOP AT ct_rtf ASSIGNING <rtf>.
lv_tabix = sy-tabix.
IF lv_next_rtf_offset < <rtf>-offset.
ls_rtf-offset = lv_next_rtf_offset.
ls_rtf-length = <rtf>-offset - lv_next_rtf_offset.
ls_rtf-font = lo_style->font->get_structure( ).
INSERT ls_rtf INTO ct_rtf INDEX lv_tabix.
ELSEIF lv_next_rtf_offset > <rtf>-offset.
RAISE EXCEPTION TYPE zcx_excel
EXPORTING
error = 'Gaps or overlaps in RTF data offset/length specs'.
ENDIF.
lv_next_rtf_offset = <rtf>-offset + <rtf>-length.
ENDLOOP.
lv_value = ip_value.
lv_val_length = strlen( lv_value ).
IF lv_val_length > lv_next_rtf_offset.
ls_rtf-offset = lv_next_rtf_offset.
ls_rtf-length = lv_val_length - lv_next_rtf_offset.
ls_rtf-font = lo_style->font->get_structure( ).
INSERT ls_rtf INTO TABLE ct_rtf.
ELSEIF lv_val_length > lv_next_rtf_offset.
RAISE EXCEPTION TYPE zcx_excel
EXPORTING
error = 'RTF specs length is not equal to value length'.
ENDIF.
ENDMETHOD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@rplantiko @sandraros @darnoc312 and others