Skip to content

btrfs-progs: minor cleanups related to extent buffer #990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

adam900710
Copy link
Collaborator

The first patch is a cleanup to remove the unused @fs_info parameter
from btrfs_csum_data().

The second patch uses btrfs_csum_data() directly to calculate super
block checksum and avoid use extent buffer helpers.

The third patch removes a convert dedicated helper, and replaces it with
an existing helper.

The final patch enhance btrfs_print_leaf() to be called with temporary
extent buffers in mkfs/convert, which have no eb->leaf populated.

adam900710 added 10 commits May 14, 2025 09:47
[BUG]
When a device replace failed, e.g. try to replace a device on a RO
mounted btrfs, the error message is incorrectly broken into two lines:

 [adam@btrfs-vm ~]$ sudo btrfs replace start -fB 1 /dev/test/scratch3  /mnt/btrfs/
 Performing full device TRIM /dev/mapper/test-scratch3 (10.00GiB) ...
 ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt/btrfs/": Read-only file system

 [adam@btrfs-vm ~]$

Note the newline after the "Read-only file system" error message.

[CAUSE]
Inside cmd_replace_start(), if the ioctl failed we need to handle the
error messages different depeneding on start_args.result.

If the result is not BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_RESULT we will
append extra info to the error message.

But the initial error message is using error(), which implies a newline.

This results the above incorrectly splitted error message.

[FIX]
Instead of manually appending an extra reason to the existing error
message, just do different output depending on the start_args.result in
the first place.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
It's a long known problem that direct IO can lead to data checksum
mismatches if the user space is also modifying its buffer during
writeback.

Although it's fixed in the recent v6.15 release (and backported to older
kernels), we still need a user friendly way to fix those problems.

This patch introduce the dryrun version of "btrfs rescue
fix-data-checksum", which reports the logical bytenr and corrupted mirrors.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[BUG]
There is a seldomly utilized function, btrfs_find_item(), which has no
document and is not behaving correctly.

Inside backref.c, iterate_inode_refs() and btrfs_ref_to_path() both
utilize this function, to find the parent inode.

However btrfs_find_item() will never return 0 if @ioff is passed as 0
for such usage, result early failure for all kinds of inode iteration
functions.

[CAUSE]
Both functions pass 0 as the @ioff parameter initially, e.g:

 We have the following fs tree root:

  	item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
		generation 3 transid 9 size 6 nbytes 16384
		block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0
		sequence 1 flags 0x0(none)
	item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
		index 0 namelen 2 name: ..
	item 2 key (256 DIR_ITEM 2507850652) itemoff 16078 itemsize 33
		location key (257 INODE_ITEM 0) type FILE
		transid 9 data_len 0 name_len 3
		name: foo
	item 3 key (256 DIR_INDEX 2) itemoff 16045 itemsize 33
		location key (257 INODE_ITEM 0) type FILE
		transid 9 data_len 0 name_len 3
		name: foo
	item 4 key (257 INODE_ITEM 0) itemoff 15885 itemsize 160
		generation 9 transid 9 size 16384 nbytes 16384
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 4 flags 0x0(none)
	item 5 key (257 INODE_REF 256) itemoff 15872 itemsize 13
		index 2 namelen 3 name: foo
	item 6 key (257 EXTENT_DATA 0) itemoff 15819 itemsize 53
		generation 9 type 1 (regular)
		extent data disk byte 13631488 nr 16384
		extent data offset 0 nr 16384 ram 16384
		extent compression 0 (none)

  Then we call paths_from_inode() with:
  - @Inum = 257
  - ipath = {.fs_root = 5}

  Then we got the following sequence:

  iterate_irefs(257, fs_root, inode_to_path)
  |- iterate_inode_refs()
     |- inode_ref_info()
        |- btrfs_find_item(257, 0, fs_root)
	|  Returned 1, with @found_key updated to
	|  (257, INODE_REF, 256).

  This makes iterate_irefs() exit immediately, but obviously that
  btrfs_find_item() call is to find any INODE_REF, not to find the
  exact match.

[FIX]
If btrfs_find_item() found an item matching the objectid and type, then
it should return 0 other than 1.

Fix it and keep the behavior the same across btrfs-progs and the kernel.

Since we're here, also add some comments explaining the function.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Previously "btrfs rescue fix-data-checksum" only show affected logical
bytenr, which is not helpful to determine which files are affected.

Enhance the output by also outputting the affected subvolumes (in
numeric form), and the file paths inside that subvolume.

The example looks like this:

  logical=13631488 corrtuped mirrors=1 affected files:
    (subvolume 5)/foo
    (subvolume 5)/dir/bar
  logical=13635584 corrtuped mirrors=1 affected files:
    (subvolume 5)/foo
    (subvolume 5)/dir/bar

Although the end result is still not perfect, it's still much easier to
find out which files are affected.

Signed-off-by: Qu Wenruo <wqu@suse.com>
This mode will ask user for how to fix each block.

User input can match the first letter or the whole action name to
specify given action, the input is verified case insensitive.

If no user input is provided, the default action is to ignore the
corrupted block.

If the input matches no action, a warning is outputted and user must
retry until a valid input is provided.

Signed-off-by: Qu Wenruo <wqu@suse.com>
This adds a new group of action in the interactive mode to fix a csum
mismatch.

The output looks like this:

  logical=13631488 corrtuped mirrors=1 affected files:
    (subvolume 5)/foo
    (subvolume 5)/dir/bar
  <<I>>gnore/<1>:1
  Csum item for logical 13631488 updated using data from mirror 1

In the interactive mode, the update-csum-item behavior is outputted as
all available mirror numbers.

Considering all the existing (and future) action should starts with an
alphabet, it's pretty easy to distinguish mirror number from other
actions.

The update-csum-item action itself is pretty straight-forward, just read
out the data from specified mirror, then calculate a new checksum, and
update the corresponding csum item in csum tree.

Signed-off-by: Qu Wenruo <wqu@suse.com>
This option allows "btrfs rescue fix-data-checksum" to use the specified
mirror number to update checksum item for all corrupted mirrors.

If the specified number is larger than the max number of mirrors, the
real mirror number will be `num % (num_mirrors + 1)`.

Signed-off-by: Qu Wenruo <wqu@suse.com>
The parameter @fs_info is not utilized at all, and there are already
several call sites passing NULL as @fs_info.

And there is no counter-part in kernel (we use crypto_shash_* interface
instead), there is no need to keep the parameter list the same.

So just remove the unused parameter.

Signed-off-by: Qu Wenruo <wqu@suse.com>
There is a long history that we used extent_buffer helpers to modify a
superblock, but we're moving out of that behavior, and migrate to use
the on stack helpers to modify super blocks.

In the function make_btrfs() we use a dummy extent buffer just to
calculate the checksum for the super block.
We do not need such workaround, and can manually call btrfs_csum_data()
on @super, and pass @super directly to sbwrite(), completely avoid the
need of a dummy extent buffer.

Signed-off-by: Qu Wenruo <wqu@suse.com>
For mkfs and convert, we need to create a temporary fs without an
fs_info.

This makes debugging much harder that we can not use btrfs_print_leaf()
to print the temporary tree blocks.

There are only two things causing problems for btrfs_print_leaf() if
eb->fs_info is NULL:

- print_header_info()
  Which needs to grab the checksum type from eb->fs_info.

  This can be avoided by completely skipping checksum output if
  eb->fs_info is NULL.

- btrfs_leaf_free_space()
  Which have two BUG_ON()s checking eb->fs_info, and finally calling
  BTRFS_LEAF_DATA_SIZE().

  Which can be avoided by removing the two BUG_ON()s, and use
  __BTRFS_LEAF_DATA_SIZE(eb->len) to grab the same leaf data size.

  Thankfully all call sites inside mkfs and convert are setting eb->len
  to nodesize correctly.

- __btrfs_print_leaf()
  Which calls BTRFS_LEAF_DATA_SIZE(eb->fs_info).

  Can be avoided by the same method above.

With those changes, we can call btrfs_print_leaf() inside a debugger
for temporary fses created by mkfs and convert.

Signed-off-by: Qu Wenruo <wqu@suse.com>
@kdave kdave force-pushed the devel branch 6 times, most recently from ef43ce6 to 3eff852 Compare May 30, 2025 14:14
@kdave kdave added this to the v6.15 milestone May 30, 2025
@kdave
Copy link
Owner

kdave commented May 30, 2025

Merged to devel, thanks.

@kdave kdave closed this May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants