3. Summaries

Introduction

Annual variables are useful statistics calculated for the results of hydrograph separation. While separation is performed on a daily basis, the summaries calculated by grwat are annual. Examples are maximum spring flood runoff, mean annual groundwater (“baseflow”) runoff, thaw flood runoff volume (w/o groundwater) or the number of rain flood days. Users proficient in R can calculate such variables simply by summarizing the results of gr_separate() function. However, summarizing functionality is implemented in the package for a more streamlined analysis. The benefits are that the standardized variables can be supplemented by additional functions that perform their statistical testing and plotting.

First of all, we need to separate the hydrograph:

library(grwat)
data(spas) # example Spas-Zagorye data is included with grwat package
head(spas) # separate
#> # A tibble: 6 × 4
#>   Date           Q   Temp  Prec
#>   <date>     <dbl>  <dbl> <dbl>
#> 1 1956-01-01  5.18  -6.46 0.453
#> 2 1956-01-02  5.18 -11.4  0.825
#> 3 1956-01-03  5.44 -10.7  0.26 
#> 4 1956-01-04  5.44  -8.05 0.397
#> 5 1956-01-05  5.44 -11.7  0.102
#> 6 1956-01-06  5.58 -20.1  0.032
sep = gr_separate(spas, params = gr_get_params(reg = 'center'))

For more information on advanced separation, read the Advanced separation vignette.

After that we can use gr_summarize() to summarize the result:

# summarize
vars = gr_summarize(sep)
#> Warning: There were 2 warnings in `dplyr::summarise()`.
#> The first warning was:
#> ℹ In argument: `Dspstart = min(.data$Date[which(.data$Qspri > 0)])`.
#> ℹ In group 19: `Year1 = 1974`.
#> Caused by warning in `min.default()`:
#> ! no non-missing arguments to min; returning Inf
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#> Warning: There was 1 warning in `dplyr::summarise()`.
#> ℹ In argument: `Q30w = condrollmean(.data$Q, .data$Season == 2, 30)`.
#> ℹ In group 64: `Year = 2020`.
#> Caused by warning in `min()`:
#> ! no non-missing arguments to min; returning Inf

head(vars)
#> # A tibble: 6 × 57
#>    Year Year1 Year2 Dspstart   Dspend       Tsp    Qy Qspmax Dspmax      Qygr
#>   <dbl> <dbl> <dbl> <date>     <date>     <int> <dbl>  <dbl> <date>     <dbl>
#> 1  1956  1956  1957 1956-04-08 1956-05-05    27  18.4    467 1956-04-22  8.59
#> 2  1957  1957  1958 1957-03-25 1957-05-04    40  20.2    460 1957-04-08  9.77
#> 3  1958  1958  1959 1958-04-02 1958-05-13    41  27.3    537 1958-04-21 10.2 
#> 4  1959  1959  1960 1959-03-28 1959-04-28    31  27.1    406 1959-04-16 10.9 
#> 5  1960  1960  1961 1960-03-27 1960-04-27    31  29.6    406 1960-04-15 12.3 
#> 6  1961  1961  1962 1961-03-07 1961-05-02    56  18.8    296 1961-04-10 10.8 
#> # ℹ 47 more variables: Qsmin <dbl>, Dsmin <date>, Qwmin <dbl>, Dwmin <date>,
#> #   Q30s <dbl>, D30s1 <date>, D30s2 <date>, Q30w <dbl>, D30w1 <date>,
#> #   D30w2 <date>, Q10s <dbl>, D10s1 <date>, D10s2 <date>, Q10w <dbl>,
#> #   D10w1 <date>, D10w2 <date>, Q5s <dbl>, D5s1 <date>, D5s2 <date>, Q5w <dbl>,
#> #   D5w1 <date>, D5w2 <date>, Wy <dbl>, Wygr <dbl>, Wsp <dbl>, Wspgr <dbl>,
#> #   Wsprngr <dbl>, Wrn <dbl>, Wrngr <dbl>, Wth <dbl>, Wthgr <dbl>, Wgrs <dbl>,
#> #   Ws <dbl>, Wgrw <dbl>, Ww <dbl>, Qrnmax <dbl>, Qthmax <dbl>, …

Resulting data frame contains more than 50 variables that can be used in the analysis of interannual changes. To get familiar with the variables and their names, just use the gr_help_vars() function:

gr_help_vars()
#> # A tibble: 57 × 21
#>       ID Position Width Source Name_old    Name   Units Unitsua Unitsen Readtype
#>    <dbl>    <dbl> <dbl>  <dbl> <chr>       <chr>  <chr> <chr>   <chr>   <chr>   
#>  1     1        1     7      1 year_number Number <NA>  <NA>    <NA>    integer 
#>  2     2        8    10      1 Year1       Year1  Год   Рок     Year    integer 
#>  3     3       18    10      1 Year2       Year2  Год   Рок     Year    integer 
#>  4     4       28    15      1 datestart   Dspst… Дата  Дата    Date    Date    
#>  5     5       43    15      1 datepolend  Dspend Дата  Дата    Date    Date    
#>  6    57        0     0      0 PolProd     Tsp    Дней  Дней    Days    integer 
#>  7     6       58    10      1 Qy          Qy     м^3/с м^3/с   m^3/s   double  
#>  8     7       68    10      1 Qmax        Qspmax м^3/с м^3/с   m^3/s   double  
#>  9     8       78    15      1 datemax     Dspmax Дата  Дата    Date    Date    
#> 10     9       93    10      1 Qygr        Qygr   м^3/с м^3/с   m^3/s   double  
#> # ℹ 47 more rows
#> # ℹ 11 more variables: Type <chr>, Test <dbl>, Desc <chr>, Descua <chr>,
#> #   Descen <chr>, Group <chr>, Winter <dbl>, Chart <chr>, Color <chr>,
#> #   Order <dbl>, Range <chr>

Statistical tests

Variable testing by gr_test_vars() can be used to estimate the statistical significance of the interannual changes. Pettitt test is performed to detect the change year — i.e. the year which divides the time series into the statistically most differing samples. Student (Welch) and Fisher tests are used to estimate the significance of mean and variance differences of these samples. Theil-Sen test calculates the trend slope value. Mann-Kendall test is performed to reveal the significance of the trend. gr_test_vars() returns the list of tests for each variable:

# test all variables
tests = gr_test_vars(vars)

# view Pettitt test for Qygr
tests$ptt$Qygr
#> 
#>  Pettitt's test for single change-point detection
#> 
#> data:  vl[vl_cmp]
#> U* = 865, p-value = 9.505e-08
#> alternative hypothesis: two.sided
#> sample estimates:
#> probable change point at time K 
#>                              23

# view Fisher test for Q30s
tests$ft$Q30s
#> 
#>  F test to compare two variances
#> 
#> data:  d1 and d2
#> F = 0.53023, num df = 21, denom df = 41, p-value = 0.1208
#> alternative hypothesis: true ratio of variances is not equal to 1
#> 95 percent confidence interval:
#>  0.2597352 1.1887012
#> sample estimates:
#> ratio of variances 
#>          0.5302307

# test only Qygr and Q30s using 1978 as fixed year and excluding 1988-1991 yrs
gr_test_vars(vars, Qygr, year = 1979, exclude = 1988:1991)
#> $ptt
#> $ptt$Qygr
#> 
#>  Pettitt's test for single change-point detection
#> 
#> data:  vl[vl_cmp]
#> U* = 773, p-value = 1.625e-07
#> alternative hypothesis: two.sided
#> sample estimates:
#> probable change point at time K 
#>                              23 
#> 
#> 
#> 
#> $mkt
#> $mkt$Qygr
#> 
#>  Mann-Kendall trend test
#> 
#> data:  vl[vl_cmp]
#> z = 4.6112, n = 60, p-value = 4.003e-06
#> alternative hypothesis: true S is not equal to 0
#> sample estimates:
#>            S         varS          tau 
#> 7.240000e+02 2.458333e+04 4.090395e-01 
#> 
#> 
#> 
#> $tst
#> $tst$Qygr
#> 
#>  Sen's slope
#> 
#> data:  df.theil[[2]][fltr]
#> z = 4.6112, n = 60, p-value = 4.003e-06
#> alternative hypothesis: true z is not equal to 0
#> 95 percent confidence interval:
#>  0.06134963 0.13908704
#> sample estimates:
#> Sen's slope 
#>  0.09690887 
#> 
#> 
#> 
#> $ts_fit
#> $ts_fit$Qygr
#> 
#> Call:
#> mblm::mblm(formula = eval(frml), dataframe = df.theil[fltr, ], 
#>     repeated = FALSE)
#> 
#> Coefficients:
#> (Intercept)        Year1  
#>  -157.23729      0.08532  
#> 
#> 
#> 
#> $tt
#> $tt$Qygr
#> 
#>  Welch Two Sample t-test
#> 
#> data:  d1 and d2
#> t = -8.4975, df = 48.174, p-value = 3.829e-11
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -5.551840 -3.427414
#> sample estimates:
#> mean of x mean of y 
#>  9.296712 13.786339 
#> 
#> 
#> 
#> $ft
#> $ft$Qygr
#> 
#>  F test to compare two variances
#> 
#> data:  d1 and d2
#> F = 0.79661, num df = 21, denom df = 37, p-value = 0.5885
#> alternative hypothesis: true ratio of variances is not equal to 1
#> 95 percent confidence interval:
#>  0.3827446 1.8017280
#> sample estimates:
#> ratio of variances 
#>          0.7966132 
#> 
#> 
#> 
#> $year
#> Qygr 
#> 1979 
#> 
#> $maxval
#> $maxval$Qygr
#> [1] 20.05682
#> 
#> 
#> $fixed_year
#> [1] TRUE
#> 
#> $pvalues
#>   N                                    Variable Change.Year   Trend      M1
#> 1 1 Mean annual groundwater ("baseflow") runoff        1979 0.08532 9.29671
#>         M2 MeanRatio     sd1     sd2 sdRatio Mann.Kendall Pettitt Student
#> 1 13.78634      48.3 1.88588 2.11296      12            0       0       0
#>    Fisher
#> 1 0.58851

The tests can be represented in a clear tabular form for visual analysis:

gr_kable_tests(tests)
p-values of statistical criteria
N Variable Change.Year Trend M1 M2 MeanRatio sd1 sd2 sdRatio Mann.Kendall Pettitt Student Fisher
1 Annual runoff volume 1978 2e-05 0.06173 0.07133 15.6 0.01713 0.02185 27.6 0.89399 0.41581 0.06124 0.2421
2 Spring flood runoff volume (w/o groundwater) 1986 -0.00026 0.02293 0.01518 -33.8 0.01204 0.01079 -10.4 9e-05 0.00596 0.00942 0.53749
3 Annual groundwater runoff volume 1979 3e-04 0.02919 0.04465 53 0.00583 0.00919 57.6 2e-05 0 0 0.02785
4 Spring flood runoff volume (with groundwater) 1988 -0.00025 0.03009 0.02089 -30.6 0.01439 0.01210 -15.9 0.00072 0.01499 0.00778 0.33759
5 Rain flood runoff volume (w/o groundwater) 1975 0 0.00571 0.00876 53.4 0.00685 0.00860 25.5 0.97689 0.66529 0.14543 0.30891
6 Rain flood runoff volume (with groundwater) 1977 4e-05 0.01413 0.02375 68.1 0.01045 0.01587 51.9 0.52771 0.09611 0.00572 0.05143
7 Thaw flood runoff volume (w/o groundwater) 1983 -1e-05 0.00254 0.00204 -19.7 0.00305 0.00252 -17.4 0.388 0.44105 0.4967 0.28481
8 Thaw flood runoff volume (with groundwater) 2010 0 0.00748 0.00902 20.6 0.00493 0.00592 20.1 0.90316 1 0.43481 0.37699
9 Winter groundwater runoff volume 1978 9e-05 0.00742 0.01185 59.7 0.00238 0.00347 45.8 0.00023 1e-05 0 0.07134
10 Winter low flow runoff volume 1978 7e-05 0.00953 0.01346 41.2 0.00492 0.00504 2.4 0.01955 0.00126 0.0049 0.93921
11 Summer groundwater runoff volume 1979 0.00019 0.01474 0.02493 69.1 0.00451 0.00705 56.3 2e-05 0 0 0.03089
12 Summer low flow runoff volume 1977 0.00022 0.01917 0.03258 70 0.01064 0.01473 38.4 0.0162 0.00184 0.00014 0.12665
13 Mean annual runoff 1978 0.00067 19.69591 22.67872 15.1 5.69050 6.74490 18.5 0.98613 0.47808 0.0707 0.41637
14 First date of a spring flood 1970 -0.18275 28-Mar 17-Mar -11 8.00000 15.00000 87.5 0.05349 0.08267 0.00232 0.02546
15 Mean annual groundwater (“baseflow”) runoff 1979 0.0883 9.29671 14.17896 52.5 1.88588 2.49706 32.4 1e-05 0 0 0.16895
16 Last date of a spring flood 1970 -0.25 05-May 22-Apr -13 6.00000 16.00000 166.7 0.0031 0.00569 1e-05 0.00043
17 Duration of a spring flood 1989 -0.125 38.0625 34.46875 -9.4 7.33061 12.21048 66.6 0.02915 0.03222 0.15958 0.00577
18 Maximum spring flood runoff 1970 -4 419.78571 241.83 -42.4 162.94462 146.00771 -10.4 8e-05 0.01088 0.00152 0.55597
19 Date of a maximum spring flood runoff 1970 -0.08333 14-Apr 14-Apr 0 5.00000 40.00000 700 0.24607 0.19331 0.93 0
20 Spring flood runoff volume (with groundwater and rain) 1986 -0.00028 0.03074 0.0226 -26.5 0.01391 0.01403 0.9 0.00076 0.01377 0.02367 0.97245
21 Minimum daily winter runoff 1979 0.1206 4.05591 9.42833 132.5 1.93119 2.03176 5.2 0 0 0 0.8237
22 Date of minimum daily winter runoff 1965 -0.19677 16-Feb 19-Jan -28 43.00000 35.00000 -18.6 0.41054 0.36833 0.10043 0.32859
23 Minimum daily summer runoff 1979 0.05578 4.49091 8.35571 86.1 2.19461 1.80075 -17.9 0.00205 0 0 0.27404
24 Date of minimum daily summer runoff 2000 0.33333 23-Jul 15-Aug 23 34.00000 31.00000 -8.8 0.1962 0.05942 0.01235 0.74221
25 Minimum 30-day averaged winter runoff 1979 0.1091 6.00123 11.19813 86.6 2.15918 2.88271 33.5 NA 0 0 0.15792
26 First date of minimum 30-day averaged winter runoff 1987 -0.14545 04-Jan 27-Dec 358 21.00000 25.00000 19 0.39944 0.74118 0.1862 0.44023
27 Minimum 30-day averaged summer runoff 1979 0.05577 6.23477 9.70019 55.6 1.69308 2.32512 37.3 0.00218 1e-05 0 0.12077
28 First date of minimum 30-day averaged summer runoff 2000 0.15789 13-Jul 02-Aug 20 29.00000 28.00000 -3.4 0.42377 0.09454 0.01189 0.98217
29 Minimum 10-day averaged winter runoff 1979 0.10384 5.12577 10.06676 96.4 1.60347 2.25540 40.7 0 0 0 0.09575
30 First date of minimum 10-day averaged winter runoff 1987 -0.2892 26-Jan 11-Jan -15 32.00000 36.00000 12.5 0.25361 0.406 0.0805 0.54115
31 Minimum 10-day averaged summer runoff 1981 0.05427 5.89658 8.8558 50.2 1.55969 1.87498 20.2 0.00063 0 0 0.35115
32 First date of minimum 10-day averaged summer runoff 2000 0.33333 21-Jul 11-Aug 21 27.00000 31.00000 14.8 0.11213 0.03414 0.01318 0.45786
33 Minimum 5-day averaged winter runoff 1979 0.10721 4.77955 9.83214 105.7 1.63118 2.19704 34.7 0 0 0 0.14478
34 First date of minimum 5-day averaged winter runoff 1987 -0.14609 28-Jan 16-Jan -12 36.00000 36.00000 0 0.50514 0.57006 0.16897 0.95117
35 Minimum 5-day averaged summer runoff 1981 0.05467 5.76225 8.68705 50.8 1.51068 1.83279 21.3 0.00038 0 0 0.3274
36 First date of minimum 5-day averaged summer runoff 2000 0.33333 25-Jul 15-Aug 21 27.00000 30.00000 11.1 0.10966 0.03414 0.01299 0.4868
37 Maximum thaw flood runoff 1984 -0.02084 26.94527 24.67525 -8.4 35.56889 35.16949 -1.1 0.67234 0.96366 0.80093 0.93525
38 Date of a maximum thaw flood runoff 1994 0.67308 20-Dec 20-Jan -335 36.00000 39.00000 8.3 0.00768 0.00441 0.00269 0.61063
39 Number of thaw flood events 1995 -0.13043 14.83333 11.19231 -24.5 4.63835 7.20566 55.3 0.00085 0.00509 0.02936 0.01639
40 Number of thaw flood days 1984 -0.70711 79 46.81081 -40.7 20.74015 22.76186 9.7 4e-05 0 0 0.62808
41 Maximum rain flood runoff 1977 0.10523 54.75139 72.47261 32.4 56.94260 56.22200 -1.3 0.68081 0.52857 0.25418 0.90809
42 Date of a maximum rain flood runoff 2002 0.12973 12-Jun 28-Jun 16 67.00000 69.00000 3 0.74116 0.60713 0.4232 0.84757
43 Number of rain flood events 1977 -0.09091 20.05 14.86364 -25.9 4.29780 3.92719 -8.6 0.004 0.00097 6e-05 0.60685
44 Number of rain flood days 1991 -0.13488 112.05882 99.03333 -11.6 29.43091 38.14626 29.6 0.64293 0.41088 0.13573 0.15061
45 Relative variation of runoff during winter low flow 1983 -0.00288 0.44193 0.27131 -38.6 0.31443 0.23726 -24.5 0.02144 0.00913 0.0235 0.11704
46 Duration of winter low flow 1984 -0.5 130.44444 115.05405 -11.8 18.08810 27.01743 49.4 0.00079 0.01883 0.00831 0.0361
47 Relative variation of runoff during summer-autumn low flow 1970 -0.00068 0.56005 0.6225 11.2 0.43161 0.28886 -33.1 0.74122 0.71943 0.61664 0.04352
48 Duration of a summer-autumn low flow 1999 0.5 198.61905 216.27273 8.9 20.61488 23.07582 11.9 0.00034 0.01042 0.00454 0.52416

Interannual plots

Summarized variables can be ploted and statistically tested to reveal interannual changes. The basic function for plotting is gr_plot_vars(). Just pass the names of the variables that you want to plot, and the desired plotting layout. Different background fill colors are used to differentiate seasons:

# plot one selected variable
gr_plot_vars(vars, Qygr) 
#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_smooth()`).


# plot one selected variable
gr_plot_vars(vars, Dspstart) 
#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_smooth()`).
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_point()`).


# plot two variables sequentially
gr_plot_vars(vars, D10w1, Wsprngr)
#> Warning: Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
#> Removed 1 row containing missing values or values outside the scale range
#> (`geom_point()`).

#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_smooth()`).


# plot four variables in matrix layout
gr_plot_vars(vars, Qspmax, Qygr, D10w1, Wsprngr,
             layout = matrix(1:4, nrow = 2, byrow = TRUE)) 
#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_smooth()`).
#> Warning: Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
#> Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_smooth()`).

Tests can be calculated on the fly during the variable plotting or passed into tests = parameters. In this case the plots are supplemented with trend line and change year:

# add tests calculated on the fly (only plotted variables are tested)
gr_plot_vars(vars, Qspmax, Qygr, D10w1, Wsprngr,
             layout = matrix(1:4, nrow = 2, byrow = TRUE),
             tests = TRUE) 
#> Warning: Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
#> Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
#> Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_smooth()`).


# calculate tests beforehand
tests = gr_test_vars(vars)
gr_plot_vars(vars, D10w1, Wsprngr, Nthw, Qrnmax,
             layout = matrix(1:4, nrow = 2, byrow = TRUE),
             tests = tests) 
#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_smooth()`).
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
#> Removed 1 row containing non-finite outside the scale range (`stat_smooth()`).
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_rect()`).
#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_smooth()`).

Period plots

Period plots are boxplots produced by gr_plot_periods() for hydrograph separation variables. The data for each variable is divided into two sample periods: before and after the change year either set by year parameter or extracted from tests (statistically estimated). Different background fill colors are used to differentiate seasons:

# plot periods with fixed change year
gr_plot_periods(vars, Qygr, year = 1979)
#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_boxplot()`).


# plot periods with change year from Pettitt test
gr_plot_periods(vars, Qygr, tests = TRUE)
#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_boxplot()`).


# calculate test beforehand
gr_plot_periods(vars, Qspmax, tests = tests)
#> Warning: Removed 1 row containing non-finite outside the scale range
#> (`stat_boxplot()`).


# use matrix layout to plot multiple variables
gr_plot_periods(vars, Qygr, Qspmax, D10w1, Wsprngr,
                layout = matrix(1:4, nrow = 2),
                tests = tests)
#> Warning: Removed 1 row containing non-finite outside the scale range (`stat_boxplot()`).
#> Removed 1 row containing non-finite outside the scale range (`stat_boxplot()`).
#> Removed 1 row containing non-finite outside the scale range (`stat_boxplot()`).
#> Removed 1 row containing non-finite outside the scale range (`stat_boxplot()`).

Minimum runoff month plots

A histogram of a minimum runoff month for two periods: before and after the change year set by year parameter. This kind of plot is produced by gr_plot_minmonth():

# plot minimum runoff month for two periods divided by Pettitt test
gr_plot_minmonth(vars, tests = gr_test_vars(vars))


# plot minimum runoff month for two periods divided by fixed year
gr_plot_minmonth(vars, year = 1979)

Test plots

Test plots produced by gr_plot_tests() are intended to facilitate the visual analysis of statistical tests. Currently only the change year density is available as a plotting variable:

# plot change year from Pettitt test
gr_plot_tests(tests, type = 'year')

As can be seen from this plot, most of the variables demonstrate the change in their statistical behavior around 1979 year.