.. _Chi-part-2:

=========================================================================
Chi profile analysis, part 2: constraining m/n and transforming profiles
=========================================================================

This is part 2 of the chi profile analysis documentation. 
Part 1 can be found here: :ref:`chi-analysis-get-profiles`

This section of the documentation assumes you have created a `.chan` file. 
The format of the channel file is described here :ref:`format-of-chan-file`.
These ```channel``` files contain information about flow distance, elevation 
and other properties of a channel network that are used to create a profile 
transformed into so-called ```chi``` (:math:`\chi`) space. 
The transformation involves integrating drainage area along the length of the channel, 
so that comparisons of channel steepness can be made for chanels of different drainage area. 

The analysis is similar to slope-area analysis but has some advantages; 
the primary advantage of the method is that it does not take the gradient of noisy topographic data, 
and so is less noisy than slope-area analysis. 

The main disadvantage is that the method assumes channel profiles are well described by
predictions of the stream power erosion law. The various advantages and disadvantages of the method
are described by `Perron and Royden, 2013 ESPL <http://web.mit.edu/perron/www/files/PerronRoyden13.pdf>`_.

Steps involved to perform channel analysis 
==========================================

  After preparing the data (:ref:`chi-analysis-get-profiles`), 
  performing the channel analysis involves 3 steps, including visualization of the data.

  #. :ref:`Chi-Perform-statistical-analysis-chi` 
  #. :ref:`Extract-chi-profiles`
  #. :ref:`Visualize-chi-data`

.. _Chi-Perform-statistical-analysis-chi:

Performing a statistical analysis to constrain the best fit m/n ratio
======================================================================

Once the profile data has been converted into a `.chan` file, 
the data can then be processed to determine the most likely m/n ratio for individual channels 
and also via the collinearity test (see `Mudd et al (2014) <http://onlinelibrary.wiley.com/doi/10.1002/2013JF002981/abstract>`_). 

Compiling the code
---------------------

  The code for running the statistical analysis to find the most likely m/n ratio can 
  be compiled by caling the makefile `chi_m_over_n_analysis.make`. 
     
  If you are using a windows machine and have installed `Cygwin <http://www.cygwin.com/>`_  
  you need to ensure that you have installed the make utility. 

  To make the file navigate the folder that contains it and run::
 
    make -f chi_m_over_n_analysis.make
   
  This will create the program `chi_m_over_n_analysis.exe`.
 
Running the code
----------------

  The program 'chi_m_over_n_analysis.exe` is run with 2 arguments to the command line. 
  
  * The first argument is the path name of the path where the `.chan` file is located, 
    along with a driver file that contains the parameters of the analysis. 
    All data will be printed to files in this path. 
  * The second argument is the name of the driver file. We typically use a `.driver` extension 
    for the driver file but this is not a requirement. 

  For example, we call the program with::
  
    ./chi_m_over_n_analysis.exe /home/smudd/topographic_tools/test_suites/Mandakini/ chi_parameters.driver 

  The `./` leading `chi_m_over_n_analysis.exe` is only necessary on a linux system. 
  The driver file contains a number of parameters for running the analysis. 
  This file is used on several different programs so not all parameters are used by `chi_m_over_n_analysis.exe`. 
  The parameters must be listed in the correct order and there cannot be any extra information between parameters
  (e.g., a string describing them). The parameters are:

    * 1st row: the prefix of the channel file, that is, the original name of the DEM. If your `.chan` file is called `mandakini_ChanNet_21729.chan`, then the first row of the driver file should be `mandakini`. 
    * 2nd row: Not used by `chi_m_over_n_analysis.exe`. 
    * 3rd row: Not used by `chi_m_over_n_analysis.exe`. 
    * 4th row: This is the junction number that was specified when the `.chan` file was created. So if your `.chan` file is called `mandakini_ChanNet_21729.chan`, then the fourth row should be `21729`.
    * 5th row: Not used by `chi_m_over_n_analysis.exe`. 
    * 6th row: A reference drainage area for integrating to chi space (A\:sub:`0`\ )
    * 7th row: The minimum number of pixels in a segment. 
       See `Mudd et al (2014) <http://onlinelibrary.wiley.com/doi/10.1002/2013JF002981/abstract>`_ for guidance. 
       Values between 10-20 are recommended. 
       The computational time required is a highly nonlinear inverse function of this parameter. 
       20 might lead to a run lasting a few minutes, whereas 5 might take many hours (or even days).
       *We recommend starting with a value of 14*. 
    * 8th row: The standard deviation of error ( :math:`\sigma` in the `Mudd et al (2014) <http://onlinelibrary.wiley.com/doi/10.1002/2013JF002981/abstract>`_ paper) on the DEM, and also error from geomorphic noise (e.g., boulders in the channel). 
       *For SRTM this should be something like 10-30 m.* 
       The larger this number, the fewer segments you will get (see `Mudd et al (2014) <http://onlinelibrary.wiley.com/doi/10.1002/2013JF002981/abstract>`_).
    * 9th row: The starting m/n value to test the likelihood of. 
    * 10th row: The change in m/n value that you want to loop over. 
       For example, suppose the starting m/n is 0.2 and the change in m/n is 0.05, then the next m/n value after 0.2 is 0.25. 
    * 11th row: The number of m/n values you want to loop through.  
    * 12th row: The ```target nodes```, which is the maximum number of nodes you want to run through the partitioning algorithm at a time. 
       Recommended values are 80-140. The computational time is nonlinearly related to this parameter,
       80 might take several minutes whereas 140 will take many hours, and 200 will take months. 
    * 13th row: The number of iteration on the Monte Carlo routine that finds the
       statistics for every node in the channel network rather than a subset of nodes. 
    * 14th row: Not used `chi_m_over_n_analysis.exe`. 
    * 15th row: The vertical drop over which slope-area analysis will be performed. 
    * 16th row: The horizontal interval over which slope area analysis will be performed
    * 17th row: The maximum change in drainage area of an interval for slope-area 
       analysis as a fraction of the area at the midpoint of an interval. 
    * 18th row: The ```target skip```, which is the average number of nodes the routine skips when it is trying to compute the best segments. 
       If the DEM is 90m resolution, for example, the resolution the algorithm will work at is ~300 meters. Here is an example file:       
 
  And here is the cheat sheet (also included in `driver_cheat_sheet.txt`)::
  
    e_bathdem <- filename prefix
    0.0001    <- minimum slope, don't change
    300       <- N contrib pixels for a channel. Could reduce to, say 100 or even 50. 
    1332      <- junction number, this will change
    0.05      <- area frac for channel pruning. 1= mainstem only, low numbers= more tribs
    1000      <- A_0 for chi analysis: probably don't need to change. 
    20        <- minimum segment length. Should be between 5-20.
    20        <- sigma: some estimate of uncertainty in elevation data. Smaller = more segments
    0.15      <- starting m/n for best for m/n testing
    0.025     <- increment of m/n for best for m/n testing
    20        <- number of m/n values tested for m/n testing
    90        <- target length of nodes to be analysed for segments. Should be between 80-150
    250       <- number of iterations for Monte Carlo analysis. 250 seems okay
    0.95      <- Not used anymore!
    20        <- Vertical interval for sampling for S-A analysis. Should be scaled to DEM resolution
    500       <- Horizontal interval for sampling for S-A analysis. Should be scaled to DEM resolution
    0.2       <- An area thinning fraction for S-A analysis. 0.2 is probably about right.
    2         <- The mean number of data nodes you skip for each node of segment analysis. For LiDAR this can be 10 or more. Nextmap can be 2-10. SRTM 0-2.

 
  Once this program has run, it will print out a file with the filename prefix and an extension of `.movern`. 
  
  **NOTE: This program is computationally expensive. 
  Increasing the target length of nodes to be analysed and reducing the minimum segment length 
  increases the computational time required in a highly nonlinear fashion. 
  Increasing the skip value can reduce computational time required.** 

  You can expect the computation to take several minutes 
  (e.g. minimum segment length ~20, target nodes ~100, skip set so mainstem has 300-500 nodes analysed) 
  to many hours (e.g. minimum segment length of 5, target nodes of 120-140, skip set such that thousands of nodes are analysed). 


The 'movern' file
-----------------

  The .movern file is produced by the statistical analysis of the channel network in order to find the most likely m/n ratio. 
  The filename contains information about parameter values; these are parsed by the visualisation algorithms. 
  The format of the filename is the filename prefix, followed by `_BFmovern_`, followed by the sigma values, 
  the skip value, the minimum segment length value the target nodes value and the junction number, 
  all separated by the underscore symbol (`_`). The file then has the extension `.movern`. 
  For example, the filename:: 
  
    mandakini_BFmovern_20_2_20_120_51.movern
    
  Indicates that
  
   * :math:`\sigma` = 20
   * skip = 2
   * minimum segment length = 20
   * target nodes = 120
   * and the junction number being analysed is 51. 
  
  The format of the file is:
	* 1st row: In the first column of the first row there is a placeholder value, `-99`, followed by the :math:`m/n` ratios tested each followed by a space. 
	* 2nd row: In the first column is a placeholder value, `-99`, followed by the mean :math:`AICc` (from n_iterations iterations) 
	  for each tested :math:`m/n` ratio for the collinearity test. These are separated by spaces. 
	* 3rd row: In the first column is a place holder value of `-99`, 
	  followed by the standard deviation of the :math:`AICc` for the collinearity test. 
	  When fits are extremely poor, the likelihood approaches zero. Calculating the :math:`AICc` involves taking the logarithm of the likelihood, 
	  to avoid this, the code assigns a very small number to 0 likelihoods. 
	  This results in a high, but not infinite, value of :math:`AICc`. 
	  These poor fits will have a standard deviation of zero. 
	* Even rows thereafter: The first column is the channel number. 
	  The following columns are the mean :math:`AICc` values for that channel. 
	* Odd rows thereafter: The first column is the channel number. 
	  The following columns are the standard deviations of the :math:`AICc` values for that channel. 

  Here is an example file:: 
  
    -99 0.15 0.175 
    -99 4008 4008 
    -99 0 0 
    0 2004 2004 
    0 0 0 
    1 2004 2004 
    1 0 0 
    2 2004 2004 
    2 0 0 
    3 1766.89 1788.39 
    3 608.033 583.88 
    4 1905.04 1973.54 
    4 422.523 238.852 
    5 2004 2004 
    5 0 0 
    6 1975.36 1882.18 
    6 224.595 450.995   


Performing a sensitivity analysis on the best fit :math:`m/n` ratio
-------------------------------------------------------------------

  For structurally or tectonically complex landscapes, it can be difficult to constrain the :math:`m/n` ratio. 
  In such cases, it is wise to perform a sensitivity analysis of the best fit :math:`m/n` ratio. 
  To facilitate this, we provide a python script, `movern_sensitivity_driver_generation.py`, 
  that generates a number of driver files with the parameters minimum_segment_length, sigma, 
  mean_skip and target_nodes that vary systematically. 
  
  To run this script you will need to change the data directory and the filename of the original driver file within the script. 
  
  You will need to modify the script before you run it. On lines 41 and 43 you need to modify the data directory
  and driver name of your files::
  
    # set the directory and filename
    DataDirectory =  "/home/smudd/topographic_tools/test_suites/Mandakini/"
    
    DriverFileName = "chi_parameters.driver"
      
  If you are running this file in Spyder from a windows machine, the path name will have slightly different formatting::
  
    DataDirectory =  "m:\\topographic_tools\\test_suites\\Mandakini\\"
  
  Note that if you run from the command line you will need to navigate to the folder that contains the script. 
  
  
  The driver files will be numbered (e.g., `my_driver.1.driver`, `my_driver.2.driver`, etc.)::
  
    smudd@burn Mandakini $ ls *.driver
    chi_parameters.1.driver                   
    chi_parameters.2.driver                   
    chi_parameters.driver                     
  
  You can run these with:: 

    ./chi_m_over_n_analysis.exe /home/smudd/topographic_tools/test_suites/Mandakini/ my_driver.1.driver

  Or if you want to run them with no hangup and `nice`::

    nohup nice ./chi_m_over_n_analysis.exe /home/smudd/topographic_tools/test_suites/Mandakini/ my_driver.1.driver &

  And then just keep running them in succession until you use up all of your CPUs (luckily at Edinburgh we have quite a few)!
  
  We have also written an additional python script called `Run_drivers_for_mn.py` which simply looks for all the drivers in folder
  and sends them to your server. Once again, you'll need to modify this python script before you run it in order to 
  give the script the correct data directory. In this case the relevant line is line 18::
  
    DataDirectory =  "/home/smudd/topographic_tools/test_suites/Mandakini/"
    
  You can run it from command line with::
  
    python Run_drivers_for_mn.py
    
  Again, you'll need to be in the directory holding this file to run it (but it doesn't have to be in the same directory as the data).   
  
  *Warning: this will send* **all** *drivers to your servers to be run, so if you generated 1000 drivers, then 1000 jobs will be
  sent to the server. Use with caution!*


.. _Extract-chi-profiles:

Extracting the transformed chi-elevation profiles
=================================================

  The next stage of the analysis is to extract the chi (:math:`\chi`) profiles. 
  To compile the program to extract the chi profiles, you need to use the makefile `chi_get_profiles.make`. 
  The program is compiled with::
  
    make -f  chi_get_profiles.make

  The makefile compiles a program called `chi_get_profiles.exe`. 
  This is run, like `chi_m_over_n_analysis.exe` , with two arguments: the path name and the driver name:: 
  
    ./chi_get_profiles.exe /home/smudd/topographic_tools/test_suites/Mandakini/ chi_parameters.driver
  
  The driver file has exactly the same format as the driver file for `chi_m_over_n_analysis.exe`. 
  A chi profile will be produced for each :math:`m/n` value outlined by these elements in the driver file: 

    * 7th row: The starting (lowest) :math:`m/n` value to test if it is the most likely. 
    * 8th row: The change in :math:`m/n` value that you want to loop over 
      (suppose the starting :math:`m/n` is 0.2 and the change in :math:`m/n` is 0.05, 
      then the next :math:`m/n`  value after 0.2 is 0.25). 
    * 9th row: The number of :math:`m/n` values you want to loop through.  

  Users may wish to modify these values in the `driver` file from the original values to explore only 
  those values which have ```plausible``` values of the :math:`m/n` ratio (see `Mudd et al (2014) <http://onlinelibrary.wiley.com/doi/10.1002/2013JF002981/abstract>`_ for guidance). 
  
  For each :math:`m/n` ratio tested, the code produces a file with the extension `.tree` and the string within the filename `_fullProfileMC_forced_`. 
  This filename also contains the m/n value so for example a filename might be called::
  
    pa_basin_fullProfileMC_forced_0.3_5_2_20_100_3124.tree
    
  The numbers in the filename are arranged in the following order: :math:`m/n` ratio,  :math:`\sigma` value, mean skip, minimum segment length and target nodes.  
  The final number before the extension (here, 3124) is copied from the 4th row of the driver file: it is the `junction number`. 
  Users can assign different numbers to different basins to facilitate automation of data analysis.  
  
The `.tree` file  
-----------------
  
  The `.tree` file has as many rows as there are nodes in the channel network. 
  There will be more nodes in the `.tree` file than in the `.chan` file because the code extends all tributaries to the outlet. 
  Each row has 23 columns. The columns are
  
  	* 1st column: The channel number (like in .chan file)
  	* 2nd column: The receiver channel (like in .chan file)
  	* 3rd column: The node on receiver channel (like in .chan file)
  	* 4th column: The node index (like in .chan file)
  	* 5th column: The row of the node (like in .chan file)
  	* 6th column: The column of the node (like in the .chan file)
  	* 7th column: The flow distance of the node
  	* 8th column: The chi coordinate of the node
  	* 9th column: The elevation of the node
  	* 10th column: The drainage area of the node
  	* 11th column: The number of data points used to calculate node statistics. 
  	  Because of the skipping algorithm (see Mudd et al (2013 draft manuscript)) not all nodes are analysed each iteration. 
  	* 12th column: The mean :math:`M_\chi` value for the node.
  	* 13th column: The standard deviation of the :math:`M_\chi` value for the node.
  	* 14th column: The standard error of the :math:`M_\chi` value for the node.
  	* 15th column: The mean :math:`B_\chi` value for the node.
  	* 16th column: The standard deviation of the :math:`B_\chi` value for the node.
  	* 17th column: The standard error of the :math:`B_\chi` value for the node.
  	* 18th column: The mean :math:`DW` value for the node.
  	* 19th column: The standard deviation of the :math:`DW` value for the node.
  	* 20th column: The standard error of the :math:`DW` value for the node.
  	* 21st column: The mean fitted elevation for the node.
  	* 22nd column: The standard deviation of the fitted elevation for the node.
  	* 23rd column: The standard error of the fitted elevation for the node.


.. _Visualize-chi-data:

Visualising the analysis
========================

We have also provided python scripts for visualising the data (that is ```LSDVisualisation``` for you yanks). 

`AICc_plotting.py`
------------------

  This script makes a plot of the :math:`AICc` as a function of the :math:`m/n` ratio for each channel as well as for the collinearity test. 
  The mean and standard deviation of the :math:`AICc` is plotted. 
  In addition the :math:`m/n` ratio with the minimum :math:`AICc` value is highlighted, and there is a horizontal dashed line 
  depicting the minimum :math:`AICc` value plus the standard deviation of the minimum :math:`AICc` value. 
  This dashed line can help the user determine which :math:`m/n` ratios are plausible (see `Mudd et al (2014) <http://onlinelibrary.wiley.com/doi/10.1002/2013JF002981/abstract>`_). 
  Here is an example: 
 
    .. image:: ./_images/Chi_profiles/AICc_plotting.jpg
 
  To run the `AICc_plotting.py` script you must modify the path name and filename after line 35 of the script. 
  The file it takes is a `.movern` file.
  
Plotting :math:`m/n` sensitivity: `AICc_plotting_multiple.py`
--------------------------------------------------------------

  This script looks through a directory (you need to change the DataDirectory variable in the script) for any files with *BFmovern_*.movern 
  in them and plots the :math:`AICc` results. 
  The code extracts the parameter values from the filename so each plotted figure has the parameter values in the title. 
  Note that this script plots to file instead of to screen. 
  You can change the kind of output file by changing the parameter  `OutputFigureFormat`. 
  See `MatPlotLib documentation <http://matplotlib.org/>`_  for options, but possibilities include `jpg` and `pdf`.

Plotting chi profiles and :math:`M_\chi` values `chi_visualisation.py`
----------------------------------------------------------------------

  This script makes three figures. 
  First, you must define the path name and the filename after line 39 of the script. 
  This script takes a `.tree` file. The first figure is a plot of the channels in chi (:math:`\chi`) space. 
  The transformed data is in a semi-transparent solid line and the best fit segments are in dashed lines.  
  Each tributary is plotted with a different colour.  Here is an example:	
  
    .. image:: ./_images/Chi_profiles/chi_profile.jpg
    
  **NOTE: These examples are derived from numerical model landscapes and are ```perfect```. Natural channels will be considerably noisier.** 
  
  The second figure generated by this script is a figure showing the gradient in :math:`\chi`-elevation space as a function of :math:`\chi`. 
  The gradient in :math:`\chi`-elevation is indicative of a combination of erosion rate and erodability, 
  so these plots allow perhaps a clearer idea of the different segments identified by the segment fitting algorithms.  
  The colour scheme from the first figure is carried forward, so that it is easy to identify the characteristics of each tributary. Here is an example: 
  
    .. image:: ./_images/Chi_profiles/M_chi.jpg
    
  The third figure displays the longitudinal channel profiles (elevation as a function of flow distance), 
  but these channel profiles are coloured by the gradient in :math:`\chi`-elevation space. Here is an example: 
  
    .. image:: ./_images/Chi_profiles/profile_Mchi_colour.jpg
    
Plotting the sensitivity of best fit :math:`m/n` values to parameters: `bf_movern_sensitivity.py`
---------------------------------------------------------------------------------------------------

  This script looks in the directory DataDirectory for all the files with *BFmovern_*.movern in the filename and then compiles 
  the best fit :math:`m/n` ratio from these files. 
  It then produces box and whisker plots of the best fit :math:`m/n` ratio. 
  The red line is the median :math:`m/n` ratio, the box shows the 25th  and 75th percentile :math:`m/n` ratios, 
  the whiskers show the data range, with outliers (as determined by the Matplotlib function boxplot) 
  as ```+``` symbols. The boxes are notched; these give 85% confidence intervals to the median value after bootstrapping 10,000 times. 
  Here is an example plot:  
  
    .. image:: ./_images/Chi_profiles/mn_sensitivity.jpg 
    
Plotting the spatial distribution of :math:`M_\chi` values: `raster_plotter_2d_only.py`   
---------------------------------------------------------------------------------------

  This script contains two functions. 
  One is for plotting the tributaries superimposed on a hillshade (or any other raster) 
  and another is for plotting the :math:`M_\chi` values superimposed on a raster (we usually do this on the hillshade). 
  
  For this to work, the `.chan` file must be referenced to a coordinate system. 
  That means that the row and column information in the `.chan` file corresponds to the `xllcorner`, `yllcorner` and
  `node spacing` data in the first few lines of the `.chan` file. 
  If you have the LSDTopoToolbox this will happen automatically, but if you are writing a script to generate your own
  `.chan` files you`ll need to ensure that your channel nodes have the correct row and column data. 
  
  The DEM used for the hillshade does not need to be the same as the DEM used for the `.chan` file 
  (that is, it can have different n_rows and n_columns etc, so you can, in principle,
  do a chi analysis on a clipped version of a DEM but then plot the results on the full DEM extent).  
  The two functions are:
  
    #. `coloured_chans_like_graphs`: This takes two strings: the filename of the raster and the filename of the `.tree` file. 
       You have to include the full path name for both of these files. 
       
       The colouring scheme used for the channels is the same as in the `elevation` and :math:`M_\chi` plots made by `chi_visualisation.py`. 
       Here is an example: 
       
         .. image:: ./_images/Chi_profiles/raster_chan.jpg 
         
    #. `m_values_over_hillshade`: Again, this takes two strings: the filename of the raster and the filename of the `.tree` file. 
       You have to include the full path name for both of these files. 
       
       The colouring scheme on the :math:`M_\chi` values is the same as in the `chi_visualisation.py` plot 
       where :math:`M_\chi` is plotted on the channel profile. Here is an example:
       
         .. image:: ./_images/Chi_profiles/raster_chan.jpg
         
  To run one or the other of these function you need to scroll to the bottom of the script and comment or uncomment one of the function lines. 

A Sample Chi Analysis Workflow
==============================

`Link to the full-size version of the flowchart <http://www.geos.ed.ac.uk/~smudd/LSDTT_docs/html/_images/workflow.png>`_

	.. image:: ./_images/Chi_profiles/workflow.png
		:scale: 40 %