Pre-VTP Survey

Please complete the Pre-VTP Survey.

Introduction

VTP Goal

In this Image Analysis + Neuroanatomy Virtual Training Project (VTP), you will use the image analyis software ImageJ and the Python programming language to investigate the evolution of brain anatomy from the fish species astyanax mexicanus, towards the goal of helping us understand the correlation between changes in the size and shape of brain regions and their functions.

Throughout this process, you will learn about image analysis, segmentation, data visualizetion, basic statistics and clustering.

How are the computational/data science skills learned in this project useful?

In case you are thinking, "How could I use this knowledge in real life?", here are several real-world applications of the techniques you will learn:

Research Applications: you can ask interesting questions about fundamental biology that impact our understanding of... you'll analyze in this VTP comes from, would be an example of this application.
Clinical Applications:
Biotech/Industry Applications:
Tech applications

In a broader sense, the skills you'll learn are a subdivision of data science. We'll point out data science principles as we go along.

Data Source

The source of the data you will analyze in this VTP is from the research study A brain-wide analysis maps structural evolution to distinct anatomical modules, which explores the evolution of brain anatomy in the blind Mexican cavefish, using advanced imaging techniques to analyze variations in brain region shape and volume and investigate the influence of genetics in brain-wide anatomical evolution.

Introduction to Brain Evolution

The brain's overall topology remains highly conserved across vertebrate lineages, but individual brain regions can vary significantly in size and shape.

Novel anatomical changes in the brain are built upon ancestral anatomy, leading to the development of new regions that expand function and functional repertoire.

There are two main ideas about how the brain evolves:

The first hypothesis suggests that different parts of the brain tend to evolve together. This means that selection acts on mechanisms that control the growth of all regions of the brain at the same time.
The second hypothesis proposes that selection can also act on individual brain regions. According to this idea, regions of the brain that have similar functions will change together in their anatomy, even if other brain regions are not affected.

Challenges in Studying Brain Evolution

Volume and shape govern anatomical variation, but it is unclear if similar or distinct mechanisms drive these parameters, and their relationship is poorly understood.
Anatomical differences in the brain are influenced by both its volume (size) and shape, but scientists are not sure yet if they are controlled by the same or different mechanisms.
Most studies tend to focus on either volume or shape separately, and few compare both together.
The current organismal systems researchers have used to study how volume and shape influence brain evolution face some challenges, such as lack of genetic divsersity or experimental tools, which makes it difficult to investigate the fundamental principles of how the brain evolves.

The Blind Mexican Cavefish as a Model for Brain Evolution

The blind Mexican cavefish provides a powerful model for studying genetic variation's impact on brain-wide anatomical evolution due to its distinct surface and cave forms with high genetic diversity.
Hybrid offspring between surface and cave populations allow the exploration of genetic differences and the identification of genetic underpinnings of neuroanatomical evolution.

Findings and Implications

A brain-wide neuroanatomical atlas was generated for the cavefish, and computational tools were applied to analyze volume and shape changes in brain regions.
Associations between naturally occurring genetic variation and neuroanatomical phenotypes were studied in hybrid brains, revealing genetically-specified regulation of brain-wide anatomical evolution.
Brain regions exhibited covariation in both volume and shape, indicating shared developmental mechanisms causing dorsal contraction and ventral expansion.
Selection may be operating on simple developmental mechanisms that influence early patterning events, modulating the volume and shape of brain regions.

How did the researchers Collect, Process, and Analyze Study Data

[Methods Overview]

Overview: Bioinformatics Analyses

Summary

Throughout this VTP, you will...

Here's a flowchart of the analysis you'll perform:

%% Sequence diagram code flowchart LR id1(Manual<br />Segmentation)-->id2(Registration,<br />Automated Segmentation); id2-->id3(Neuroanatomical<br />Measurements); id3-->id4(Cross-morph Brain Volume Distribution); id4-->id5(Correlation Heatmap<br />& Clustering) style id1 fill:#ddd, stroke:#524,stroke-width:4px; style id2 fill:#ddd, stroke:#524,stroke-width:4px; style id3 fill:#ddd, stroke:#524,stroke-width:4px; style id4 fill:#7be, stroke:#524,stroke-width:4px; style id5 fill:#7be, stroke:#524,stroke-width:4px;

Fiji Python Current Step

Cloud Computing

All bioinformatics tasks will be performed in "the cloud" on your own Amazon Web Services (AWS) hosted high performance compute instance.

What is cloud computing:

Tools you'll use: Fiji, Python

For this project, you will execute tasks using either Fiji or Python on you cloud computing high-performance compute instance through your browser.

Fiji Fiji is an open-source software package widely used for scientific image analysis and processing. It provides a user-friendly interface and numerous tools for tasks like image filtering, segmentation, and measurement. Fiji is based on another software called ImageJ and includes many plugins and extensions to enhance its functionality. Fiji stands for "Fiji is ImageJ".

Here's what Fiji will look like using the URL we've provided you:

Python Python is a popular programming language known for its simplicity and versatility. It is widely used in data science and scientific research and has extensive support for various scientific libraries and tools. Python provides a flexible and intuitive syntax, making it easier for scientists and researchers to write code for data analysis, machine learning, and image processing tasks.

You will run the python code in a Jupyter Notebook. Jupyter Notebook is an interactive computing environment where users can create and share documents containing live code, visualizations, and explanatory text. It allows for the execution of code in cells, enabling iterative development and easy testing. With support for multiple programming languages, Jupyter Notebook is widely used for data analysis, scientific research, and education. It also facilitates the creation of interactive visualizations and allows documents to be saved and shared in various formats, promoting collaboration and reproducible research.

Here is what the Python (Jupyter Notebook) dashboard looks like:

The URLs and passwords to access Fiji and Python can be found in the Getting Started section.

Analysis Tasks

Approach

Observe, Replicate, and Apply: During this analysis, you'll initially witness the analysis performed on an Example Sample. Subsequently, replicate that step with the same sample to confirm you achieve identical results. Afterward, apply this procedure to your assigned sample(s) that you can find in the Getting Started section.
Focus on Understanding, not Coding Syntax: Some steps necessitate the execution of code. However, our primary goal isn't teaching you programming languages but to ensure you grasp the fundamentals of inputs, parameters, outputs, and the interpretation of these outputs. For better comprehension, consider each step as a mathematical function (e.g., y = mx + b): no matter how complex a block of code appears, you are always inputting data and receiving output.
Mathematics is a Tool, not a Barrier: Certain steps will call for the use of complex mathematical functions to manipulate data. We understand you may not have a deep understanding of the underlying mathematics, but our focus is to understand the purpose and outcome of each step at a high level. Comprehending the input, the general function of the step, the output, and how to interpret and use the output is what's crucial.

Tutorial: Image Segmentation with Fiji

Here's a step-by-step tutorial on Image Segmentation and Quantification based on the video "FIJI for Quantification: Cell Segmentation" presented by Dr. Paul McMillan of the Biological Optical Microscopy Platform at the University of Melbourne.

Objective To perform cell segmentation using FIJI/Image J. We will turn a fluorescence image into a segmented image where each individual cell has been individually segmented.

Materials Needed A computer with FIJI/Image J installed, and the fluorescence image file located on the Desktop in a folder called Segmentation-Tutorial.

(Reminder: the URL to the cloud instance with FIJI can be found in Getting Started)

First, watch the video of Dr. McMillan, then perfrom the steps on your assigned Fiji instance.

Steps

Open the Fluorescence Image
Open FIJI on the Desktop via web browser
In FIJI navigate to "Open > Desktop > Segmentation-Tutorial > Cell.tif"
Duplicate the Image
Once you've opened the image in FIJI/Image J, go to the "Image" menu at the top, then select "Duplicate...".
You should now have two images, "Cell.tif" and the duplicate "Cell-1.tif".
Identify Each Cell on "Cell.tif"
Use the "Find Maxima" function under the "Process" menu.
Preview the number of points and enter the correct value, which in this data is 400. (Note: (1) in the video the parameter is called Noise Tolerance, but in your version of Fiji it's called Prominance; (2) exact point number you will get is similar to, but not exactly the same as the video due to different versions of Fiji used)
Save this as "Mask1.tif".
Define the Area of All the Cells on the duplicated image "Cell-1.tif"
Assign a threshold under the "Image > Adjust > Threshold..." menu and manually change the thresholds to identify the area occupied by the cells. In this example, the value is 388 using the default algorithm.
Before applying, clean this data up by going to "Process > Smooth" and then clicking "Apply". This creates a binary image defining the area of the cells.
Save this as "Mask2.tif".
Combine "Mask_1.tif" and "Mask_2.tif"
Do this under "Process > Image Calculator".
This starts to segment the individual cells within the image.
Clean Up the Image
There might be some small fragments sitting within the image. This can be cleaned up using the "Analyze > Analyze Particles" function and setting a filter of anything above 250 pixels squared.
Exclude on edges and create a mask.
This plots only those structures which are over 250 pixels squared.
Invert the lookup table.
One final step is to clean up the structures by going to "Process > Binary > Fill Holes".
Save this as "Mask_3.tif".
Analyze the Data
Go back to "Analyze > Set Measurements" to define the parameters you want to read out. In this case, Area, Shape descriptors, Integrated density, Mean gray value, Perimeter, Feret's diameter, Display label.
Redirect back to the original data, "Cell.tif". Use the new mask that you've created to quantify what's in the original data.
Go back to "Analyze > Analyze Particles". You have to show outlines or keep the same filters you had previously. This time, display results, clear previous results, and summarize the data.
This gives you annotated outlines of the cells, a count of the number of cells in the image, and the data for each single cell with each of the parameters defined Set Measurements.
Save the results as "Final_Analysis.txt" in the "Segmentation-Tutorial".

Remember, the specific values used in this tutorial (such as the threshold value of 388 and the filter value of 250 pixels squared) are specific to this analyzed image. When you're doing this with, you might need to adjust these values based on your specific image and what you're trying to achieve. Don't be afraid to experiment and see what works best for your data!

Checkpoint (CP) 1

Please upload screenshots of the following into your Google Sheet:

"Mask1.tif"
"Mask2.tif"
"Mask3.tif"
"Final_Analysis.txt"

Analysis Strategy

In this training project, you will execute each step with an Example Sample and then your own assigned sample (Found in the Getting Started).

Manual Segmentation

Fiji Python Current Step

[Background/Rationale]

Run Step: Manual Segmentation

Execute with Example Sample name_of_sample

Manually segment brain regions from a single fish (maybe used stained sample?)

Step 1: Step 2: Step 3:

Video

Execute with Your Sample

Manually segment brain regions from a single fish (maybe used stained sample?)

no video

Takeaways:

Checkpoint (CP) 2

In the Google Sheet, please:

Tutorial: Linux Terminal

Linux is a command-line based operating system. For our purposes Linux = Unix.

The next step of this project is performed in Linux. Linux is a powerful operating system used by many scientists and engineers to analyze data. Linux commands are written on the command line, which means that you tell the computer what you want it to do by typing, not by pointing and clicking like you do in operating systems like Windows, Chrome, iOS, or macOS. The commands themselves are written in a language called BaSH.

Here's a brief tutorial that covers some concepts and common commands using a sample code.

Concepts

Command Line

The Linux command line is a way of interacting with the computer's operating system by typing in commands in a text-based interface. It's a bit like talking to your computer using a special language, and that language is called Bash.

The command line in the Terminal tab should look something like this:

(base) your-user#@your-AWS-IP-address:~
$

The $ (or %) symbol that you might see in a command line is called a prompt. It's a sign that the command line is ready to accept your input. When you see the prompt, you can start typing in a command.

(base) your-user#@your-AWS-IP-address:~
$ linux-commands-are-typed here

Aside: in some Linux Terminals, the prompt username, computer name, and/or IP address are listed on the same line.

(base) your-user#@your-AWS-IP-address:~ $

This is a difference in Terminal style, and does not impact execution of commands.

Using the command line is different than using a graphical user interface (GUI) because instead of clicking on icons and buttons to interact with your computer, you type in text commands. It can take some time to learn the commands and syntax of the command line, but once you do, you can often do things faster and more efficiently than with a GUI. Some people also prefer using the command line because it gives them more control over their computer's settings and can be more powerful for certain tasks.

Let's type a command in the command line.

List the files and directories in the current directory using the ls command:

ls

You should see something like this output to the screen:

It looks like a lot, but don't fret, this is just a bunch of files and directories (i.e. folders). You'll notice the Terminal colors by their type. For example, directories are colored blue. A directory is computer-science speak for a folder. So this tells you what folder you are currently sitting in.

Directory As we have mentioned, a directory is another name for a folder in which files are stored.

If you look immediately to the left of the $, you will see what is called the "working directory". The ~ symbol has a special meaning in Linux systems: it refers to the home directory on your system.

Navigate to the CMTK_Analyses directory using the cd command.

cd ~/CMTK_Analyses/

After you execute the command, your command line should look like this:

(base) user2@ip-172-31-25-174:~/reads
$

Now, use the ls command to list the files and directories in this directory:

ls

You should see something like this`:

Now, let's say you need to go back to the home ~ directory:

cd ~

You should once again be back in the home ~ directory. It should look like this:

(base) your-user#@your-AWS-IP-address:~
$

Now, go back to the CMTK_Analyses directory.

cd CMTK_Analyses

Sometime we need to create a new directory to store files around a project, like when you create a new folder on your computer.

Let's create a new directory called test_directory in the current directory using the mkdir command.

mkdir test_directory

Execute ls to confirm that the test_directory directory was indeed created.

Enter the test_directory directory.

cd test_directory

Your terminal should look like this:

(base) user2@ip-172-31-25-174:~/CMTK_Analyses/test_directory
$

Return to the ~/CMTK_Analyses directory:

cd ~/CMTK_Analyses

List the last last 15 commands run in your terminal so you can submit it in the next checkpoint:

history 15

There's a lot more that we could cover about working with the Linux command line, but this is enough to get started with your bioinformatics analysis.

Checkpoint (CP) 3

In the Google Sheet, please:

Registraion & Automated Segmentation

Fiji Python Current Step

[Background/Rationale]

Run Step: Registraion & Automated Segmentation

Run CMTK Registration

Execute with Example Sample name_of_sample

Run CMTK on example sample

Video

Step 1: Step 2: Step 3:

The output should look like this:

Execute with Your Sample

Run CMTK on your assigned sample.

Checkpoint (CP) 4

In the Google Sheet, please:

Neuroanatomical Measurements

Fiji Python Current Step

[Background/rationale]

Run Step: Neuroanatomical Measurements

Execute with Example Sample name_of_sample

Generate volumetric data by region on Example Sample using CMTK or CobraZ (?)

Execute with Your Sample

Generate volumetric data by region on Assigned Sample using CMTK or CobraZ (?)

Checkpoint (CP) 5

In the Google Sheet, please:

Introduction to Python

For the next set of steps, we'll use the Python Programming Language.

Python Programming Tutorial for High School Students

Part 1: Introduction to Python

Python is a high-level, interpreted, and general-purpose dynamic programming language that focuses on code readability. The syntax in Python helps the programmers to do coding in fewer steps as compared to Java or C++.

Part 2: Basics of Python

2.1 Python Variables and Data Types

In Python, variables are created when you assign a value to it. Python has various data types including numbers (integer, float, complex), string, list, tuple, and dictionary.

# Defining variables in Python
a = 10              # integer
b = 5.5             # float
c = 'Hello World'   # string

2.2 Python Operators

There are various operators in Python such as arithmetic operators (+, -, , /, %, *, //), comparison operators (==, !=, >, <, >=, <=), and logical operators (and, or, not).

# Python operators
a = 10
b = 20

print(a + b)  # output: 30
print(a > b)  # output: False

Part 3: Python Conditional Statements and Loops

3.1 If Else Statement

Python supports the usual logical conditions from mathematics. These can be used in several ways, most commonly in "if statements" and loops.

# Python if else statement
a = 10
b = 20

if a > b:
    print("a is greater than b")
elif a == b:
    print("a is equal to b")
else:
    print("a is less than b")

3.2 For Loop

A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).

# Python for loop
fruits = ["apple", "banana", "cherry"]

for x in fruits:
    print(x)

3.3 While Loop

With the while loop we can execute a set of statements as long as a condition is true.

# Python while loop
i = 1
while i < 6:
    print(i)
    i += 1

Part 4: Python Functions

A function is a block of code which only runs when it is called. You can pass data, known as parameters, into a function. A function can return data as a result.

# Python function
def my_function():
    print("Hello from a function")

my_function()  # calling the function

Checkpoint (CP) 6

In the Google Sheet, please:

Cross-morph Brain Volume Distribution

Fiji Python Current Step

[Background/Rationale]

Run Step: Cross-morph Brain Volume Distribution

Execute with Example Sample Optic Tectum

Create violin plots to compare volumetric distributions of brain regions in Surface vs. SPF2 fish.

The example sample is the Optic Tectum, or TeO for short.

Open 2_brain-volume-distribution.ipynb and run these steps:

import pandas as pd
from scipy.stats import ttest_ind
import matplotlib.pyplot as plt
import seaborn as sns

# Load the new data
teo_data = pd.read_csv('./TeO.csv')

# Display the first few rows of the dataframe
teo_data.head()

# Create a new 'Category' column based on the 'File' column
teo_data['Category'] = teo_data['File'].apply(lambda x: 'Surface' if 'Surface' in x else 'SPF2')

# Perform a two-sample Student's t-test for 'TeO'
surface_data = teo_data[teo_data['Category'] == 'Surface']['Sum']
spf2_data = teo_data[teo_data['Category'] == 'SPF2']['Sum']
t_stat, p_val = ttest_ind(surface_data, spf2_data)

print(t_stat)

print(p_val)

# Create a mapping of p-value to significance indicator
if p_val < 0.0001:
    sig_indicator = '****'
elif p_val < 0.001:
    sig_indicator = '***'
elif p_val < 0.01:
    sig_indicator = '**'
elif p_val < 0.05:
    sig_indicator = '*'
else:
    sig_indicator = ''

# Create the violin plot (without points, p-value, title, y-axis label)
plt.figure(figsize=(10, 6))
sns.violinplot(x='Category', y='Sum', data=teo_data, inner=None, palette="pastel")

# Create the violin plot (with points)
plt.figure(figsize=(10, 6))
sns.violinplot(x='Category', y='Sum', data=teo_data, inner=None, palette="pastel")

# Overlay the datapoints
for category in ['Surface', 'SPF2']:
    category_data = teo_data[teo_data['Category'] == category]
    plt.plot([0 if category == 'Surface' else 1]*len(category_data), category_data['Sum'], 'k.', markersize=5)

# Show the plot
plt.show()

# Create the violin plot (with points, p-value, title, y-axis label)
plt.figure(figsize=(10, 6))
sns.violinplot(x='Category', y='Sum', data=teo_data, inner=None, palette="pastel")

# Overlay the datapoints
for category in ['Surface', 'SPF2']:
    category_data = teo_data[teo_data['Category'] == category]
    plt.plot([0 if category == 'Surface' else 1]*len(category_data), category_data['Sum'], 'k.', markersize=5)


# Add a horizontal line and significance indicator if the p-value is significant
if sig_indicator:
    ymax = teo_data['Sum'].max()
    plt.plot([0, 1], [ymax + 0.1*ymax, ymax + 0.1*ymax], 'k-')
    plt.text(0.5, ymax + 0.12*ymax, sig_indicator, ha='center', fontsize=14)
    plt.ylim(-0.1*ymax, ymax + 0.2*ymax)

# Set the title and labels of the plot
plt.title('TeO')
plt.ylabel('% of Total Brain Volume')

# Show the plot
plt.show()

Execute with Your Sample

Now recreate this analysis for your assigned sample. In the template script below,

Replace 'Your_File.csv' with the name of your data file.
replace your_sample_data with the name of your sample. for example, if you're assigned the Tegmentum (Tg, your_sample_data would be changed to tg_data

# Load the new data
your_sample_data = pd.read_csv('./your_sample.csv')

# Display the first few rows of the dataframe
print(your_sample_data.head())

# Create a new 'Category' column based on the 'File' column
your_sample_data['Category'] = your_sample_data['File'].apply(lambda x: 'Surface' if 'Surface' in x else 'SPF2')

# Perform a two-sample Student's t-test for your_sample
surface_data = your_sample_data[your_sample_data['Category'] == 'Surface']['Sum']
spf2_data = your_sample_data[your_sample_data['Category'] == 'SPF2']['Sum']
t_stat, p_val = ttest_ind(surface_data, spf2_data)

print(t_stat)

print(p_val)

# Create a mapping of p-value to significance indicator
if p_val < 0.0001:
    sig_indicator = '****'
elif p_val < 0.001:
    sig_indicator = '***'
elif p_val < 0.01:
    sig_indicator = '**'
elif p_val < 0.05:
    sig_indicator = '*'
else:
    sig_indicator = ''

# Create the violin plot (without points, p-value, title, y-axis label)
plt.figure(figsize=(10, 6))
sns.violinplot(x='Category', y='Sum', data=your_sample_data, inner=None, palette="pastel")

# Create the violin plot (with points, p-value, title, y-axis label)
plt.figure(figsize=(10, 6))
sns.violinplot(x='Category', y='Sum', data=your_sample_data, inner=None, palette="pastel")

# Overlay the datapoints
for category in ['Surface', 'SPF2']:
    category_data = your_sample_data[your_sample_data['Category'] == category]
    plt.plot([0 if category == 'Surface' else 1]*len(category_data), category_data['Sum'], 'k.', markersize=5)

# Add a horizontal line and significance indicator if the p-value is significant
if sig_indicator:
    ymax = your_sample_data['Sum'].max()
    plt.plot([0, 1], [ymax + 0.1*ymax, ymax + 0.1*ymax], 'k-')
    plt.text(0.5, ymax + 0.12*ymax, sig_indicator, ha='center', fontsize=14)
    plt.ylim(-0.1*ymax, ymax + 0.2*ymax)

# Set the title and labels of the plot
plt.title(your_sample_file.replace('.csv', ''))  # Use the filename as the title, minus the .csv extension
plt.ylabel('% of Total Brain Volume')

# Show the plot
plt.show()

Checkpoint (CP) 7

In the Google Sheet, please:

Correlation Heatmap Clustering

Fiji Python Current Step

[Background/rationale]

Run Step: Correlation Heatmap Clustering

Execute with Small Example Sample

Execute step with n = 5 Example Sample Dataset

# Import libraries that we'll need
import pandas as pd
import numpy as np
import scipy.cluster.hierarchy as spc
import matplotlib.pyplot as plt
from matplotlib import rcParams
import seaborn as sns

# Import data, and drop columns that we do not need. the result will be a matrix with each of 180 neuroanatomcial regions 
# as columens and each row as a fish.
file = 'ExampleSample_5-F2s.xlsx'
# file = 'out_region_size_SPF2_HB_macro_v3.xlsx'
df_small_example = pd.read_excel(file)
df_small_example.set_index('File')
df_small_example_final = df_small_example.drop(['Unnamed: 0', 'File', 'BrainSize', 'NucSize', 'SypSize'],axis=1)
df_small_example_final.head(5)

# Import data, and drop columns that we do not need. the result will be a matrix with each of 180 neuroanatomcial regions 
# as columens and each row as a fish.
file = 'ExampleSample_5-F2s.xlsx'
# file = 'out_region_size_SPF2_HB_macro_v3.xlsx'
df_small_example = pd.read_excel(file)
df_small_example.set_index('File')
df_small_example_final = df_small_example.drop(['Unnamed: 0', 'File', 'BrainSize', 'NucSize', 'SypSize'],axis=1)
df_small_example_final.head(5)

# Generate a heat map with correaltions for all 180 regions. This is unclustered
corr_mat_df_small_example = df_small_example_final.corr().to_numpy()
corr_mat_df_small_example[np.isnan(corr_mat_df_small_example)] = 0 # casting nan values to zero correlation
rcParams['figure.figsize'] = 20,20
sns.heatmap(corr_mat_df_small_example)
plt.show()

## Perform clustering
# Cluster
pdist_small_example = spc.distance.pdist(corr_mat_df_small_example)
linkage_small_example = spc.linkage(pdist_small_example, method='complete')
idx_small_example = spc.fcluster(linkage_small_example, 0.5 * pdist.max(), 'distance')

# print(idx_small_example)
# Get cluster vector
cluster_vector_small_example=np.concatenate((np.argwhere(idx_small_example==1), np.argwhere(idx_small_example==2),np.argwhere(idx_small_example==3),np.argwhere(idx_small_example==4),np.argwhere(idx_small_example==5),np.argwhere(idx_small_example==6),
        np.argwhere(idx_small_example==7), np.argwhere(idx_small_example==8), np.argwhere(idx_small_example==9), np.argwhere(idx_small_example==10), np.argwhere(idx_small_example==11), np.argwhere(idx_small_example==12)
                            ))
# print(np.shape(cluster_vector_small))

# Restructure correlation matrix
corr_mat_df_small_example=np.copy(orr_mat_df_small_example)
# print(np.shape(corr_mat))
for i in range(len(corr_mat_df_small_example)):
    for j in range(len(corr_mat_df_small_example)):
        corr_mat_df_small_example[i,j] = corr_mat_df_small_example[cluster_vector_small_example[i],cluster_vector_small_example[j]]

# Plot
sns.heatmap(corr_mat_df_small_example)
plt.show()

Execute with Small Your Sample

Execute step with n = 5 Example Sample Dataset

Execute with Medium Example Sample

Execute step with n = 20 Example Sample Dataset

Execute with Medium Your Sample

Execute step with n = 20 Your Sample Dataset

Execute with All Samples

Execute with All Samples.

# Import data, and drop columns that we do not need. the result will be a matrix with each of 180 neuroanatomcial regions 
# as columens and each row as a fish.
file = 'out_region_size_HB_macro.xlsx'
# file = 'out_region_size_SPF2_HB_macro_v3.xlsx'
df = pd.read_excel(file)
df.set_index('File')
df2 = df.drop(['Unnamed: 0', 'File', 'BrainSize', 'NucSize', 'SypSize'],axis=1)
df2.head(5)

# Generate a heat map with correaltions for all 180 regions. This is unclustered
corr_mat = df2.corr().to_numpy()
corr_mat[np.isnan(corr_mat)] = 0 # casting nan values to zero correlation
rcParams['figure.figsize'] = 20,20
sns.heatmap(corr_mat)
plt.show()

## Perform clustering
# Cluster
pdist = spc.distance.pdist(corr_mat)
linkage = spc.linkage(pdist, method='complete')
idx = spc.fcluster(linkage, 0.5 * pdist.max(), 'distance')
# print(idx)

# Get cluster vector
cluster_vector=np.concatenate((np.argwhere(idx==1), np.argwhere(idx==2),np.argwhere(idx==3),np.argwhere(idx==4),np.argwhere(idx==5),np.argwhere(idx==6),
        np.argwhere(idx==7), np.argwhere(idx==8), np.argwhere(idx==9), np.argwhere(idx==10), np.argwhere(idx==11), np.argwhere(idx==12)
                            ))
# print(np.shape(cluster_vector))

# Restructure correlation matrix
corr_mat_clustered=np.copy(corr_mat)
# print(np.shape(corr_mat))
for i in range(len(corr_mat)):
    for j in range(len(corr_mat)):
    corr_mat_clustered[i,j] = corr_mat[cluster_vector[i],cluster_vector[j]]

Checkpoint (CP) 8

Checkpoint (CP) 7

In the Google Sheet, please:

Wrapping Up

So, at this point,

Post-VTP Survey

Please complete the Post-VTP Survey.