Table Of Contents

Previous topic

8. Chi analysis, part 1: getting the channel profiles

Next topic

10. Scripting and Supercomputing for Topographic Analysis

This Page

9. Chi profile analysis, part 2: constraining m/n and transforming profiles

This is part 2 of the chi profile analysis documentation. Part 1 can be found here: Chi analysis, part 1: getting the channel profiles

This section of the documentation assumes you have created a .chan file. The format of the channel file is described here Format of the .chan file. These `channel` files contain information about flow distance, elevation and other properties of a channel network that are used to create a profile transformed into so-called `chi` (\chi) space. The transformation involves integrating drainage area along the length of the channel, so that comparisons of channel steepness can be made for chanels of different drainage area.

The analysis is similar to slope-area analysis but has some advantages; the primary advantage of the method is that it does not take the gradient of noisy topographic data, and so is less noisy than slope-area analysis.

The main disadvantage is that the method assumes channel profiles are well described by predictions of the stream power erosion law. The various advantages and disadvantages of the method are described by Perron and Royden, 2013 ESPL.

9.1. Steps involved to perform channel analysis

After preparing the data (Chi analysis, part 1: getting the channel profiles), performing the channel analysis involves 3 steps, including visualization of the data.

  1. Performing a statistical analysis to constrain the best fit m/n ratio
  2. Extracting the transformed chi-elevation profiles
  3. Visualising the analysis

9.2. Performing a statistical analysis to constrain the best fit m/n ratio

Once the profile data has been converted into a .chan file, the data can then be processed to determine the most likely m/n ratio for individual channels and also via the collinearity test (see Mudd et al (2014)).

9.2.1. Compiling the code

The code for running the statistical analysis to find the most likely m/n ratio can be compiled by caling the makefile chi_m_over_n_analysis.make.

If you are using a windows machine and have installed Cygwin you need to ensure that you have installed the make utility.

To make the file navigate the folder that contains it and run:

make -f chi_m_over_n_analysis.make

This will create the program chi_m_over_n_analysis.exe.

9.2.2. Running the code

The program ‘chi_m_over_n_analysis.exe` is run with 2 arguments to the command line.

  • The first argument is the path name of the path where the .chan file is located, along with a driver file that contains the parameters of the analysis. All data will be printed to files in this path.
  • The second argument is the name of the driver file. We typically use a .driver extension for the driver file but this is not a requirement.

For example, we call the program with:

./chi_m_over_n_analysis.exe /home/smudd/topographic_tools/test_suites/Mandakini/ chi_parameters.driver

The ./ leading chi_m_over_n_analysis.exe is only necessary on a linux system. The driver file contains a number of parameters for running the analysis. This file is used on several different programs so not all parameters are used by chi_m_over_n_analysis.exe. The parameters must be listed in the correct order and there cannot be any extra information between parameters (e.g., a string describing them). The parameters are:

  • 1st row: the prefix of the channel file, that is, the original name of the DEM. If your .chan file is called mandakini_ChanNet_21729.chan, then the first row of the driver file should be mandakini.

  • 2nd row: Not used by chi_m_over_n_analysis.exe.

  • 3rd row: Not used by chi_m_over_n_analysis.exe.

  • 4th row: This is the junction number that was specified when the .chan file was created. So if your .chan file is called mandakini_ChanNet_21729.chan, then the fourth row should be 21729.

  • 5th row: Not used by chi_m_over_n_analysis.exe.

  • 6th row: A reference drainage area for integrating to chi space (A:sub:0)

  • 7th row: The minimum number of pixels in a segment.

    See Mudd et al (2014) for guidance. Values between 10-20 are recommended. The computational time required is a highly nonlinear inverse function of this parameter. 20 might lead to a run lasting a few minutes, whereas 5 might take many hours (or even days). We recommend starting with a value of 14.

  • 8th row: The standard deviation of error ( \sigma in the Mudd et al (2014) paper) on the DEM, and also error from geomorphic noise (e.g., boulders in the channel).

    For SRTM this should be something like 10-30 m. The larger this number, the fewer segments you will get (see Mudd et al (2014)).

  • 9th row: The starting m/n value to test the likelihood of.

  • 10th row: The change in m/n value that you want to loop over.

    For example, suppose the starting m/n is 0.2 and the change in m/n is 0.05, then the next m/n value after 0.2 is 0.25.

  • 11th row: The number of m/n values you want to loop through.

  • 12th row: The `target nodes`, which is the maximum number of nodes you want to run through the partitioning algorithm at a time.

    Recommended values are 80-140. The computational time is nonlinearly related to this parameter, 80 might take several minutes whereas 140 will take many hours, and 200 will take months.

  • 13th row: The number of iteration on the Monte Carlo routine that finds the

    statistics for every node in the channel network rather than a subset of nodes.

  • 14th row: Not used chi_m_over_n_analysis.exe.

  • 15th row: The vertical drop over which slope-area analysis will be performed.

  • 16th row: The horizontal interval over which slope area analysis will be performed

  • 17th row: The maximum change in drainage area of an interval for slope-area

    analysis as a fraction of the area at the midpoint of an interval.

  • 18th row: The `target skip`, which is the average number of nodes the routine skips when it is trying to compute the best segments.

    If the DEM is 90m resolution, for example, the resolution the algorithm will work at is ~300 meters. Here is an example file:

And here is the cheat sheet (also included in driver_cheat_sheet.txt):

e_bathdem <- filename prefix
0.0001    <- minimum slope, don't change
300       <- N contrib pixels for a channel. Could reduce to, say 100 or even 50.
1332      <- junction number, this will change
0.05      <- area frac for channel pruning. 1= mainstem only, low numbers= more tribs
1000      <- A_0 for chi analysis: probably don't need to change.
20        <- minimum segment length. Should be between 5-20.
20        <- sigma: some estimate of uncertainty in elevation data. Smaller = more segments
0.15      <- starting m/n for best for m/n testing
0.025     <- increment of m/n for best for m/n testing
20        <- number of m/n values tested for m/n testing
90        <- target length of nodes to be analysed for segments. Should be between 80-150
250       <- number of iterations for Monte Carlo analysis. 250 seems okay
0.95      <- Not used anymore!
20        <- Vertical interval for sampling for S-A analysis. Should be scaled to DEM resolution
500       <- Horizontal interval for sampling for S-A analysis. Should be scaled to DEM resolution
0.2       <- An area thinning fraction for S-A analysis. 0.2 is probably about right.
2         <- The mean number of data nodes you skip for each node of segment analysis. For LiDAR this can be 10 or more. Nextmap can be 2-10. SRTM 0-2.

Once this program has run, it will print out a file with the filename prefix and an extension of .movern.

NOTE: This program is computationally expensive. Increasing the target length of nodes to be analysed and reducing the minimum segment length increases the computational time required in a highly nonlinear fashion. Increasing the skip value can reduce computational time required.

You can expect the computation to take several minutes (e.g. minimum segment length ~20, target nodes ~100, skip set so mainstem has 300-500 nodes analysed) to many hours (e.g. minimum segment length of 5, target nodes of 120-140, skip set such that thousands of nodes are analysed).

9.2.3. The ‘movern’ file

The .movern file is produced by the statistical analysis of the channel network in order to find the most likely m/n ratio. The filename contains information about parameter values; these are parsed by the visualisation algorithms. The format of the filename is the filename prefix, followed by _BFmovern_, followed by the sigma values, the skip value, the minimum segment length value the target nodes value and the junction number, all separated by the underscore symbol (_). The file then has the extension .movern. For example, the filename:

mandakini_BFmovern_20_2_20_120_51.movern

Indicates that

  • \sigma = 20
  • skip = 2
  • minimum segment length = 20
  • target nodes = 120
  • and the junction number being analysed is 51.
The format of the file is:
  • 1st row: In the first column of the first row there is a placeholder value, -99, followed by the m/n ratios tested each followed by a space.
  • 2nd row: In the first column is a placeholder value, -99, followed by the mean AICc (from n_iterations iterations) for each tested m/n ratio for the collinearity test. These are separated by spaces.
  • 3rd row: In the first column is a place holder value of -99, followed by the standard deviation of the AICc for the collinearity test. When fits are extremely poor, the likelihood approaches zero. Calculating the AICc involves taking the logarithm of the likelihood, to avoid this, the code assigns a very small number to 0 likelihoods. This results in a high, but not infinite, value of AICc. These poor fits will have a standard deviation of zero.
  • Even rows thereafter: The first column is the channel number. The following columns are the mean AICc values for that channel.
  • Odd rows thereafter: The first column is the channel number. The following columns are the standard deviations of the AICc values for that channel.

Here is an example file:

-99 0.15 0.175
-99 4008 4008
-99 0 0
0 2004 2004
0 0 0
1 2004 2004
1 0 0
2 2004 2004
2 0 0
3 1766.89 1788.39
3 608.033 583.88
4 1905.04 1973.54
4 422.523 238.852
5 2004 2004
5 0 0
6 1975.36 1882.18
6 224.595 450.995

9.2.4. Performing a sensitivity analysis on the best fit m/n ratio

For structurally or tectonically complex landscapes, it can be difficult to constrain the m/n ratio. In such cases, it is wise to perform a sensitivity analysis of the best fit m/n ratio. To facilitate this, we provide a python script, movern_sensitivity_driver_generation.py, that generates a number of driver files with the parameters minimum_segment_length, sigma, mean_skip and target_nodes that vary systematically.

To run this script you will need to change the data directory and the filename of the original driver file within the script.

You will need to modify the script before you run it. On lines 41 and 43 you need to modify the data directory and driver name of your files:

# set the directory and filename
DataDirectory =  "/home/smudd/topographic_tools/test_suites/Mandakini/"

DriverFileName = "chi_parameters.driver"

If you are running this file in Spyder from a windows machine, the path name will have slightly different formatting:

DataDirectory =  "m:\\topographic_tools\\test_suites\\Mandakini\\"

Note that if you run from the command line you will need to navigate to the folder that contains the script.

The driver files will be numbered (e.g., my_driver.1.driver, my_driver.2.driver, etc.):

smudd@burn Mandakini $ ls *.driver
chi_parameters.1.driver
chi_parameters.2.driver
chi_parameters.driver

You can run these with:

./chi_m_over_n_analysis.exe /home/smudd/topographic_tools/test_suites/Mandakini/ my_driver.1.driver

Or if you want to run them with no hangup and nice:

nohup nice ./chi_m_over_n_analysis.exe /home/smudd/topographic_tools/test_suites/Mandakini/ my_driver.1.driver &

And then just keep running them in succession until you use up all of your CPUs (luckily at Edinburgh we have quite a few)!

We have also written an additional python script called Run_drivers_for_mn.py which simply looks for all the drivers in folder and sends them to your server. Once again, you’ll need to modify this python script before you run it in order to give the script the correct data directory. In this case the relevant line is line 18:

DataDirectory =  "/home/smudd/topographic_tools/test_suites/Mandakini/"

You can run it from command line with:

python Run_drivers_for_mn.py

Again, you’ll need to be in the directory holding this file to run it (but it doesn’t have to be in the same directory as the data).

Warning: this will send all drivers to your servers to be run, so if you generated 1000 drivers, then 1000 jobs will be sent to the server. Use with caution!

9.3. Extracting the transformed chi-elevation profiles

The next stage of the analysis is to extract the chi (\chi) profiles. To compile the program to extract the chi profiles, you need to use the makefile chi_get_profiles.make. The program is compiled with:

make -f  chi_get_profiles.make

The makefile compiles a program called chi_get_profiles.exe. This is run, like chi_m_over_n_analysis.exe , with two arguments: the path name and the driver name:

./chi_get_profiles.exe /home/smudd/topographic_tools/test_suites/Mandakini/ chi_parameters.driver

The driver file has exactly the same format as the driver file for chi_m_over_n_analysis.exe. A chi profile will be produced for each m/n value outlined by these elements in the driver file:

  • 7th row: The starting (lowest) m/n value to test if it is the most likely.
  • 8th row: The change in m/n value that you want to loop over (suppose the starting m/n is 0.2 and the change in m/n is 0.05, then the next m/n value after 0.2 is 0.25).
  • 9th row: The number of m/n values you want to loop through.

Users may wish to modify these values in the driver file from the original values to explore only those values which have `plausible` values of the m/n ratio (see Mudd et al (2014) for guidance).

For each m/n ratio tested, the code produces a file with the extension .tree and the string within the filename _fullProfileMC_forced_. This filename also contains the m/n value so for example a filename might be called:

pa_basin_fullProfileMC_forced_0.3_5_2_20_100_3124.tree

The numbers in the filename are arranged in the following order: m/n ratio, \sigma value, mean skip, minimum segment length and target nodes. The final number before the extension (here, 3124) is copied from the 4th row of the driver file: it is the junction number. Users can assign different numbers to different basins to facilitate automation of data analysis.

9.3.1. The .tree file

The .tree file has as many rows as there are nodes in the channel network. There will be more nodes in the .tree file than in the .chan file because the code extends all tributaries to the outlet. Each row has 23 columns. The columns are

  • 1st column: The channel number (like in .chan file)
  • 2nd column: The receiver channel (like in .chan file)
  • 3rd column: The node on receiver channel (like in .chan file)
  • 4th column: The node index (like in .chan file)
  • 5th column: The row of the node (like in .chan file)
  • 6th column: The column of the node (like in the .chan file)
  • 7th column: The flow distance of the node
  • 8th column: The chi coordinate of the node
  • 9th column: The elevation of the node
  • 10th column: The drainage area of the node
  • 11th column: The number of data points used to calculate node statistics. Because of the skipping algorithm (see Mudd et al (2013 draft manuscript)) not all nodes are analysed each iteration.
  • 12th column: The mean M_\chi value for the node.
  • 13th column: The standard deviation of the M_\chi value for the node.
  • 14th column: The standard error of the M_\chi value for the node.
  • 15th column: The mean B_\chi value for the node.
  • 16th column: The standard deviation of the B_\chi value for the node.
  • 17th column: The standard error of the B_\chi value for the node.
  • 18th column: The mean DW value for the node.
  • 19th column: The standard deviation of the DW value for the node.
  • 20th column: The standard error of the DW value for the node.
  • 21st column: The mean fitted elevation for the node.
  • 22nd column: The standard deviation of the fitted elevation for the node.
  • 23rd column: The standard error of the fitted elevation for the node.

9.4. Visualising the analysis

We have also provided python scripts for visualising the data (that is `LSDVisualisation` for you yanks).

9.4.1. AICc_plotting.py

This script makes a plot of the AICc as a function of the m/n ratio for each channel as well as for the collinearity test. The mean and standard deviation of the AICc is plotted. In addition the m/n ratio with the minimum AICc value is highlighted, and there is a horizontal dashed line depicting the minimum AICc value plus the standard deviation of the minimum AICc value. This dashed line can help the user determine which m/n ratios are plausible (see Mudd et al (2014)). Here is an example:

_images/AICc_plotting.jpg

To run the AICc_plotting.py script you must modify the path name and filename after line 35 of the script. The file it takes is a .movern file.

9.4.2. Plotting m/n sensitivity: AICc_plotting_multiple.py

This script looks through a directory (you need to change the DataDirectory variable in the script) for any files with BFmovern_.movern in them and plots the AICc results. The code extracts the parameter values from the filename so each plotted figure has the parameter values in the title. Note that this script plots to file instead of to screen. You can change the kind of output file by changing the parameter OutputFigureFormat. See MatPlotLib documentation for options, but possibilities include jpg and pdf.

9.4.3. Plotting chi profiles and M_\chi values chi_visualisation.py

This script makes three figures. First, you must define the path name and the filename after line 39 of the script. This script takes a .tree file. The first figure is a plot of the channels in chi (\chi) space. The transformed data is in a semi-transparent solid line and the best fit segments are in dashed lines. Each tributary is plotted with a different colour. Here is an example:

_images/chi_profile.jpg

NOTE: These examples are derived from numerical model landscapes and are ```perfect```. Natural channels will be considerably noisier.

The second figure generated by this script is a figure showing the gradient in \chi-elevation space as a function of \chi. The gradient in \chi-elevation is indicative of a combination of erosion rate and erodability, so these plots allow perhaps a clearer idea of the different segments identified by the segment fitting algorithms. The colour scheme from the first figure is carried forward, so that it is easy to identify the characteristics of each tributary. Here is an example:

_images/M_chi.jpg

The third figure displays the longitudinal channel profiles (elevation as a function of flow distance), but these channel profiles are coloured by the gradient in \chi-elevation space. Here is an example:

_images/profile_Mchi_colour.jpg

9.4.4. Plotting the sensitivity of best fit m/n values to parameters: bf_movern_sensitivity.py

This script looks in the directory DataDirectory for all the files with BFmovern_.movern in the filename and then compiles the best fit m/n ratio from these files. It then produces box and whisker plots of the best fit m/n ratio. The red line is the median m/n ratio, the box shows the 25th and 75th percentile m/n ratios, the whiskers show the data range, with outliers (as determined by the Matplotlib function boxplot) as `+` symbols. The boxes are notched; these give 85% confidence intervals to the median value after bootstrapping 10,000 times. Here is an example plot:

_images/mn_sensitivity.jpg

9.4.5. Plotting the spatial distribution of M_\chi values: raster_plotter_2d_only.py

This script contains two functions. One is for plotting the tributaries superimposed on a hillshade (or any other raster) and another is for plotting the M_\chi values superimposed on a raster (we usually do this on the hillshade).

For this to work, the .chan file must be referenced to a coordinate system. That means that the row and column information in the .chan file corresponds to the xllcorner, yllcorner and node spacing data in the first few lines of the .chan file. If you have the LSDTopoToolbox this will happen automatically, but if you are writing a script to generate your own .chan files you`ll need to ensure that your channel nodes have the correct row and column data.

The DEM used for the hillshade does not need to be the same as the DEM used for the .chan file (that is, it can have different n_rows and n_columns etc, so you can, in principle, do a chi analysis on a clipped version of a DEM but then plot the results on the full DEM extent). The two functions are:

  1. coloured_chans_like_graphs: This takes two strings: the filename of the raster and the filename of the .tree file. You have to include the full path name for both of these files.

    The colouring scheme used for the channels is the same as in the elevation and M_\chi plots made by chi_visualisation.py. Here is an example:

    _images/raster_chan.jpg
  2. m_values_over_hillshade: Again, this takes two strings: the filename of the raster and the filename of the .tree file. You have to include the full path name for both of these files.

    The colouring scheme on the M_\chi values is the same as in the chi_visualisation.py plot where M_\chi is plotted on the channel profile. Here is an example:

    _images/raster_chan.jpg

To run one or the other of these function you need to scroll to the bottom of the script and comment or uncomment one of the function lines.

9.5. A Sample Chi Analysis Workflow

Link to the full-size version of the flowchart

_images/workflow.png