Title: | Compare excel sheets |
---|---|
Description: | `xldiff` provides tools to compare excel sheets, broadly inspired by "diff"-type functions. Provided functions can read sheets of two excel files and produce a third file that highlights cells that have changed. In the case of numeric changes, the direction of change is highlighted. These tools do not account for structural changes in the sheets (e.g., the addition of a column), but are useful in tracking changed values in tables or parameter files. Utility functions developed to streamline formatting output files are also more broadly useful in programmatically formatting excel files using openxlsx. |
Authors: | Collin Edwards [aut, cre] |
Maintainer: | Collin Edwards <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2025-01-08 01:17:48 UTC |
Source: | https://github.com/cbedwards-dfw/xldiff |
When calling sheet_diff()
, creating a new workbook for the diff contents, and
then coloring to highlight changed cells, the original spreadsheet formatting is lost.
To facilitate interpretting the diff, it can be useful to recreate the major components of the original
formatting, especially cell borders. This function adds cell borders, and is designed for ease of use
when replicating formatting from the original excel file. Blocks of cells to give borders to can be
specified in the original excel format (e.g. "A1:D5"). For only outside borders around each block (default), use argument
every.cell = FALSE
. To add all the cell borders within each block to generate a grid appearance, set every.cell = TRUE
.
Note that non-border formatting of each cell will not be maintained, but border formatting will be overwritten. When adding
thin boundaries between inner cells and a thick outer border for a block of cells, first use add_cell_borders
to
with every.cell = TRUE
, and appropriate border arguments (usually border.thickness = "thin"
)
and then use again with every.cell = FALSE
and appropriate border arguments (usually border.thickness = "medium"
).
add_cell_borders( wb, sheet, block.ranges, sheet.start = "A1", every.cell = FALSE, border.col = "black", border.thickness = "medium" )
add_cell_borders( wb, sheet, block.ranges, sheet.start = "A1", every.cell = FALSE, border.col = "black", border.thickness = "medium" )
wb |
openxlsx workbook object |
sheet |
character corresponding to sheet name of openxlsx workbook object |
block.ranges |
One or more cell ranges specified in excel format (e.g. |
sheet.start |
Optional. If |
every.cell |
Do we want borders around each individual cell in each cell block ( |
border.col |
Color for border. See |
border.thickness |
Thickness for border. See |
openxlsx
worksheet based on changesHighlights cells that changed, coloring differently for increasing, decreasing, and non-numeric cells.
Typically used on a worksheet that contains the $sheet.diff
dataframe from the same sheet comparison as provided
in the cur.sheet
argument. In some cases it may be useful to define custom color schemes (e.g., if increasing numbers are good
and decreasing numbers are bad, you may want green and red foregrounds for those types of changes). Individual styles can be provided
with optional arguments see ?openxlsx::createStyle
for options in defining styles.
add_changed_formats( wb, cur.sheet, sheet.comp, rows.invert = NULL, cols.invert = NULL, df.invert = NULL, nofillStyle = NULL, changeStyle = NULL, posStyle = NULL, negStyle = NULL )
add_changed_formats( wb, cur.sheet, sheet.comp, rows.invert = NULL, cols.invert = NULL, df.invert = NULL, nofillStyle = NULL, changeStyle = NULL, posStyle = NULL, negStyle = NULL )
wb |
openxlsx workbook to make on which to make changes. |
cur.sheet |
sheet name in openxlsx workbook on which to make changes. |
sheet.comp |
list of comparison dataframes generated by |
rows.invert |
Optional vector of row numbers to invert color scheme for increase vs decrease. |
cols.invert |
Optional vector of column numbers to invert color scheme for increase vs decrease. |
df.invert |
Optional data frame with |
nofillStyle |
Optional openxlsx style object for cells with no changes flagged. Default has black text, white foreground. Create custom style with |
changeStyle |
As |
posStyle |
As |
negStyle |
As |
In some cases it may make sense to reverse the color scheme of numeric changes for individual rows, columns,
or cells (e.g., when scanning fishery model outputs, increasing fish escapement and decreasing fish exploitation rates logically should both show the same color.
Similarly, increasing profits and decreasing costs logically should both show the same color.). Optional arguments rows.invert
, cols.invert
, and
df.invert
allow you to specify individual regions of the dataframe to reverse the color scheme.
Translates from excel cell address to rows and columns
cell_range_translate(x, expand = TRUE, start = "A1")
cell_range_translate(x, expand = TRUE, start = "A1")
x |
Single string of individual cell or cell range (e.g. "D6" or "D6:AC8") |
expand |
If |
start |
Optional argument to account for offset when matching cells in an excel file to a dataframe when the dataframe was generated by reading the excel file starting at a location other than "A1". |
dataframe of addresses for each cell in the range, where $row
gives the row number and $col
gives the column number.
cell_range_translate("D6") cell_range_translate("A2:H3")
cell_range_translate("D6") cell_range_translate("A2:H3")
Apply style to worksheet based on one or more excel-style cell ranges
cells_stylize(wb, sheet, style, block.ranges, stack = TRUE)
cells_stylize(wb, sheet, style, block.ranges, stack = TRUE)
wb |
openxlsx workbook object |
sheet |
character corresponding to sheet name of openxlsx workbook object |
style |
|
block.ranges |
One or more cell ranges specified in excel format (e.g. |
stack |
Should style be appended to existing styles ( |
Compares a single sheet between two files, supports providing additional formatting
in the form of the optional extra_format_fun
argument. For more complex use cases (e.g.,
multiple sheet, pre-comparison formatting to compare only specific regions, etc) excel_diff
can
be used as a simple template for writing your own function.
excel_diff( file.1, file.2, results.name, sheet.name, extra_format_fun = NULL, ... )
excel_diff( file.1, file.2, results.name, sheet.name, extra_format_fun = NULL, ... )
file.1 |
Filename (including path) for first file to compare |
file.2 |
Filename (including path) for second file to compare |
results.name |
Name (including path) for file to save comparison to. Must end in ".xlsx" |
sheet.name |
character string of sheet to compare (must be present in both files) |
extra_format_fun |
Optional function to apply additional formatting, allowing users to specify additional
calls of |
... |
Additional arguments passed to |
## Not run: filename.1 = "Documents/WDFW FRAM team work/NOF material/NOF 2024/FRAM/Chin1124.xlsx" filename.2 = "Documents/WDFW FRAM team work/NOF material/NOF 2024/NOF 2/Chin2524.xlsx" excel_diff(file.1 = filename.1, file.2 = filename.2, results.name = "Documents/WDFW FRAM team work/NOF material/NOF 2024/test1.xlsx", sheet.name = "ER_ESC_Overview_New" ) ## create function to add in some additional formatting: extra_form_fun = function(wb, sheet){ ## add bold and increased size for the first two rows. openxlsx::addStyle(wb, sheet, style = openxlsx::createStyle(fontSize = 16, textDecoration = "Bold"), rows = 1:2, cols = 1:8, gridExpand = TRUE, stack = TRUE) ## add thin inner cell borders add_cell_borders(wb, sheet, block.ranges = c("B3:H34") ) ## add thick outer borders add_cell_borders(wb, sheet, block.ranges = c("A2", "B1:D2", "E1:H2", "A3:A34", "B3:D34", "E3:H34", "D36:H37"), border.thickness = "medium") } excel_diff(file.1 = filename.1, file.2 = filename.2, results.name = "Documents/WDFW FRAM team work/NOF material/NOF 2024/test2.xlsx", sheet.name = "ER_ESC_Overview_New", extra_format_fun = extra_form_fun ) ## End(Not run)
## Not run: filename.1 = "Documents/WDFW FRAM team work/NOF material/NOF 2024/FRAM/Chin1124.xlsx" filename.2 = "Documents/WDFW FRAM team work/NOF material/NOF 2024/NOF 2/Chin2524.xlsx" excel_diff(file.1 = filename.1, file.2 = filename.2, results.name = "Documents/WDFW FRAM team work/NOF material/NOF 2024/test1.xlsx", sheet.name = "ER_ESC_Overview_New" ) ## create function to add in some additional formatting: extra_form_fun = function(wb, sheet){ ## add bold and increased size for the first two rows. openxlsx::addStyle(wb, sheet, style = openxlsx::createStyle(fontSize = 16, textDecoration = "Bold"), rows = 1:2, cols = 1:8, gridExpand = TRUE, stack = TRUE) ## add thin inner cell borders add_cell_borders(wb, sheet, block.ranges = c("B3:H34") ) ## add thick outer borders add_cell_borders(wb, sheet, block.ranges = c("A2", "B1:D2", "E1:H2", "A3:A34", "B3:D34", "E3:H34", "D36:H37"), border.thickness = "medium") } excel_diff(file.1 = filename.1, file.2 = filename.2, results.name = "Documents/WDFW FRAM team work/NOF material/NOF 2024/test2.xlsx", sheet.name = "ER_ESC_Overview_New", extra_format_fun = extra_form_fun ) ## End(Not run)
When given two dataframes, uses sheet_comp to compare the two dataframes, then presents the rows that have changed: prints the row numbers to the console, and then returns the "diff" of those rows, with column names matching excel column naming conventions.
present_rows_changed( t1, t2, digits.signif = 4, trim.cols = TRUE, diff.only = TRUE )
present_rows_changed( t1, t2, digits.signif = 4, trim.cols = TRUE, diff.only = TRUE )
t1 |
First dataframe |
t2 |
Second dataframe, same dimensions as first. |
digits.signif |
When flagging changes, comparison is presented in character form. How many significant digits do we present for numerical entries? Numeric, defaults to 4. |
trim.cols |
Remove unchanged columns? Useful with wide dataframes when viewing results in the console. Defaults to TRUE. |
diff.only |
Show only the changed values? defaults to TRUE. |
A diff of the two dataframes, similar to $sheet.diff
part of the return of sheet_comp. Includes a row_number
column, and remaining columns have been labeled to match excel column naming conventions.
Primary funtion for xldiff
package. When cell values change between dataframe t1
and
dataframe t2
, the corresponding $sheetdiff
entry will show [the first value] -->
[the second value].
Note that because these changes are presenting as characters, changes in numbers with many digits can produce difficult-to-read cells.
The digits.signif
can be used to determine how many significant digits should be presented in the "arrow" cells.
sheet_comp(t1, t2, digits.signif = 4)
sheet_comp(t1, t2, digits.signif = 4)
t1 |
First dataframe |
t2 |
Second dataframe, same dimensions as first. |
digits.signif |
When flagging changes, comparison is presented in character form. How many significant digits do we present for numerical entries? Numeric, defaults to 4. |
List of comparison data frames, including logical matrices used in formatting cells to highlight changes.
$sheet.diff
: cell entries for comparison
$mat.changed
logical matrix where TRUE
corresponds to a cell that changed
$mat.diff.decrease
: logical matrix where TRUE
corresponds to a cell of numeric values that decreased
= mat.diff.increase
: as above, but for increases.
## Not run: ## using palmerpenguins data to simulate spreadsheets library(palmerpenguins) t1 = t2 = head(penguins) ## change island variable to characters for easier modification t2$island = t1$island = as.character(t1$island) ## change several entries in the second version t2$island[3] = "Scotland" t2$flipper_length_mm[1] = 18 sheet_comp(t1, t2, digits.signif = 4) ## End(Not run)
## Not run: ## using palmerpenguins data to simulate spreadsheets library(palmerpenguins) t1 = t2 = head(penguins) ## change island variable to characters for easier modification t2$island = t1$island = as.character(t1$island) ## change several entries in the second version t2$island[3] = "Scotland" t2$flipper_length_mm[1] = 18 sheet_comp(t1, t2, digits.signif = 4) ## End(Not run)