Project Management Plan 

Table of Contents

Identification 

Document Overview 

This document outlines the manner in which the project will be planned and executed. It describes the project’s development process. It contains additional information on development tools used.

Abbreviations and Glossary 

HMSC: Hierarchical Modelling of Species Communities software name.
CSC: Finnish IT Center for Science primary stakeholder.

References 

Document ID	Document Title	URL
[hmsc]	Ovaskainen, O. et al. 2017. How to make more out of community data? A conceptual framework and its implementation as models and software. Ecology Letters 20, 561-576.	https://doi.org/10.1111/ele.12757
[pep8]	PEP 8 – Style Guide for Python Code	https://peps.python.org/pep-0008/
[tpope]	A Note About Git Commit Messages	http://tbaggery.com/2008/04/19/a- note-about-git-commit-messages.html
[cct]	Cookiecutter: Better Project Templates	https://cookiecutter.readthedocs.io /en/stable/
[nbdiff]	Welcome to NBDiff’s Documentation!	https://nbdiff-docs.readthedocs.io/ en/latest/index.html

Team and Responsibilities 

Two members make up the HMSC team. We mutually share most analysis, development, and testing tasks. Their responsibilities are listed in detail below.

In addition to the project team, we have a project supervisor, Dr Otso Ovaskainen. He is the one of the PI of project funded by the HORIZON-INFRA-2021-TECH-01 titled, “Biodiversity Digital Twin for Advanced Modelling, Simulation and Prediction Capabilities.”

Title	Name	Responsibilities
Team Member	Gleb Tikhonov	Software Development, Documentation, Testing
Team Member	Anis U Rahman	Software Development, Documentation, Testing

Work Breakdown Structure, Tasks and Planning 

Work Breakdown Structure and Task Estimation 

WBS :header-rows:1
#	Work	Estimated effort (hrs)
1.	PROJECT MANAGEMENT PLAN	65
1.1.	Software Requirements Specification	25
1.2.	Data Gathering	30
1.2.1.	Research existing joint species distribution modeling	10
1.2.2.	Research end-to-end machine learning	10
1.2.3.	Research high-performance computing for machine learning	10
1.3.	Interview End-users	5
1.4.	Research Similar Solutions/Attempts	5
2.	DESIGN	25
2.1.	Software Architecture Design	15
2.1.1.	Design modules for data loading/preparation/segregation	5
2.1.2.	Design modules for model structure and fit	5
2.1.3.	Design modeles for model evaluation	5
2.2.	Program Interface Design	10
2.2.1.	Create program configuration design	5
2.2.2.	Design program’s command line interface (CLI)	5
3.	PROTOTYPE	20
3.1.	Design prototype version of algorithm	5
3.2.	Design CLI for prototype	5
3.3.	Design tests for prototype	5
3.4.	Implement prototype version	5
4.	SOFTWARE DEVELOPMENT
4.1.	Development	220
4.1.1.	Implement data loader	20
4.1.2.	Implement base model	100
4.1.3.	Implement model trainer	100
5.	TESTING AND QUALITY ASSURANCE	25
5.1.	Test Plan	5
5.2.	Unit Testing	15
5.2.1.	Implement tests for loader	5
5.2.2.	Implement tests for trainer	5
5.2.3.	Implement tests for predictor	5
5.3.	Program CLI Testing	5
6.	INTEGRATION
6.1.	Integration Testing	5
7.	DEPLOYMENT/ROLLOUT	50
7.1.	Define Configuration and Readme Files	5
7.2.	Define Online Help	30
7.2.1.	Documentation for hmsc.readthedocs.org	20
7.3.	Installation and User Guide	5
7.3.1.	Document installation instructions	5
7.3.2.	Document user guide	5
7.4.	Maintain and Update Documentation	15
8.	PROJECT PLANNING	25
8.1.	Team Meetings	20
8.2.	Stakeholder Meetings	5

Relationships with project stakeholders 

End-User Involvement 

As our project will be an open source project, many end-users will choose to give feedback on the GitHub issue tracker and mailing list, before and after releases.

Communication 

Meetings 

Initial meeting: We will meet with our PI and discuss project requirements and goals.
Weekly meeting: We will discuss the project’s progress bi-weekly with our team members in a remote meeting. We will discuss the features in progress; our progress towards the next release/prototype, and review feedback from interested project repository watchers.
Post-release meeting: We will discuss a release of the software after it is published.

Reviews 

Code Review: Code review will be done on every pull request (i.e., code change).
- At least one team member other than the author will review the
- code change.
- The reviewer(s) will annotate the code with their comments.
- The developer will revise their pull request to satisfy the reviewer.
- The reviewer will merge the code change into the main repository.
Design Review: New features will be discussed in the GitHub issue tracker. Feedback will be solicited from interested watchers.
Release Candidates (RCs): before each release, a release candidate version will be provided to the supervisor and interested end-users for review. This will provoke feedback of various kinds.

Training 

We started training before project initiation to learn both the pythonic way of coding, end-to-end machine learning frameworks, and linear algebra libraries. Before implementing a feature, we would have meetings to review the topics that had been covered in the trainings and discuss available choices for its implementation. We intend to continue this training throughout the project development phase to produce high quality code and proficient software.

System Requirements and Project Input Data 

Configuration Management 

Software Configuration Management 

We will use Git for software configuration management. Each change to the software will be captured in a commit on the developer’s computer. These changes will then be uploaded to GitHub for review and merging into the master branch.

Each commit contains a description of the change. We will follow the recommendations found on Tim Pope’s blog post on the subject ‘[tpope] <http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html>’__ and enforce the rules during code review.

Documentation Configuration Management 

We will use Git and GitHub (https://github.com/tarmstrong/nbdiff-docs) to track our documents as we produce and receive them. This will also track changes to the documents.

Software Development Management 

Software Development Process 

Our development process will be based on an mixed development approach. The approach combines both the vertical and the horizontal approaches. The idea is to divide the project into features (vertical). Then each feature is divided into layers (horizontal) and attempted in an iterative and incremental manner. The rationale for this choice is:

to focus on important features in the project.
to continuous perform unit and integration tests, this makes the outcomes predictable.
to regularly gather feedback to adjust requirements and design.

We have split the project into three major milestones spaced 5 weeks apart. These will have equal portions of work with a release comprising a functioning feature of the software and a release of updated documents to the supervisor.

Milestone	Milestone Date
M1	2012-09-19
M2	2012-10-24
M3	2012-11-28

Software Development Tools 

The following is a list of the main tools we will use while developing this project. We will add tools to this document as we discover which are effective for our process.

Git: Git is a distributed version control system for source code.
GitHub: is a hosting service for Git that provides a web-based interface to various Git features, and includes issue trackers and release hosting.
Python: is the programming language compatible with the TensorFlow framework, we will adopt it to program our software.
- PyTests: is a unit testing tool for Python.
- PyFlakes: is a tool for automatically checking our Python code against the PEP-8 standard [pep8].
- Mock: is a library for mocking objects in unit tests for Python.
- Black: is an uncompromising Python code formatter.
TensorFlow: is a free and open-source software library for machine learning and artificial intelligence.
tf.linalg: is a TensorFlow library with operations for linear algebra.
Scipy: is a free and open-source Python library used for scientific computing and technical computing.
Documentation:
- Sphinx: is a widely-used documentation system for Python. This will be useful for source code documentation and the manually written documentation (including installation instructions, tutorials, etc.)
TravisCI (https://travis-ci.org/): is a free, online continuous integration service that runs automated tests, checks code coverage, and checks code quality every time a patch is submitted to a project. This will be used to provide automatic verification of pull requests to aid reviewers.

Software Development Rules and Standards 

For our source code (both functional code and test code), we will adhere to the following standards. Where possible, we will use a tool to automatically verify that our code adheres to the standard. We will also verify this through our code reviews.

Coding standard for Python: PEP-8 [pep8] automated using Black
Static code analysis using PyFlakes: https://pypi.python.org/pypi/pyflakes

For architectural documentation, we will use the Unified Modeling Language (UML).

Project architecture 

The software domain involves a series of functional components including data loading/preparation and modelling. The following use cases can be identified in general, namely

Collect/load/prepare data
View/use fitted model
Fit/Evaluate a model

The resulting use case diagram for the systems is,

We present a two-layered view of the system: data and bussiness layer. The layered view can be deployed on a single machine with a graphics co-processor. The data is available locally on the machine with the program invoked using a command-line interface. The data layer prepares the program and loads the data, followed by fitting and evaluating a model.

The high-level logical view of the system captures the functionality provided to its end-users, and it illustrates the collaborations among the system components.

The high-level deployment view of the software demonstrates that the python source code is compatible for transformation into datagraphs for execution on co-processors located remotely in the form of high-performance computing platform.

Mapping existing R codebase 

For reference the existing codebase in R has two main functions: sample_mcmc() and compute_predicted_values(). Below is the callgraph for sample_mcmc().

Likewise, the callgraph for compute_predicted_values() (renamed as compute_pred_vals()) is illustrated below. We plan to map these functions into the suitable component within the logical view presented earlier.

Folder structure 

The directory structure of your new project looks like this:

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- Make this project pip installable with `pip install -e`
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Test Phases Management 

Unit tests 

PyTest is a framework to improve testing productivity. The test are written in Python language, they are easy and scalable.

Feature Tests 

One or multiple unit tests can be used to test the functionality of a component.

Integration Tests 

To test multiple components of the software or to perform end-to-end testing to ensure that the software is working.

Performance Tests 

We will perform performance tests to measure the efficiency of a piece of code. The size of the code may range from a single method to the whole software.

Problem Resolution 

We will use GitHub’s issue tracking to handle all feature requests, change requests, inquiries, questions as well as to report bugs. Using GitHub’s tracking feature, issues will be opened when a matter is raised. GitHub allows us to create custom categories to easily classify our issues. This will allow us to filter through the different requests, inquiries and/or bugs. We will also be able to assign issues to different individuals based on who is more qualified to handle the given issue. Comments can be left on issues, allowing for discussion and problem solving among other team members, as well as status updates on the given issue. Finally, once an issue is resolved, the issue can be closed, allowing us to easily track which issues remain.