Full pipeline
After conducting accurate annotations published in Nuclei and Cytosol, training U-Net and HoVer-Net models for segmenting the nuclei channel, we have compared the top 3 best models from each architecture and at the end we have used the best U-Net model that we have trained on our own annotated images as well as some images from BBBC image set. The model’s performance on test set were validated where it had 85% F1-score on 90% overlapping threshold and 88% average Jaccard index.
The experiment were conducted in two Oxidative stress (A, B) and non-oxiditive stress (C,D) groups each group has one repetition all saved in 79 groups for almost 18000 genes. The status of each group of raw data and how we proceeded to analyse is summarized in progress_log/plateProgress_done.xlsx file. Some of plates were missing or there were several copies. We had to deal with missing or repeated data.
The overall process for each plate is as follows:
Download from swestore
Run extract_conversion.py script over plates to extract only d0 channel.
Normalize them to 8bit images using bash command
Save the names in 4_filelist folder and remove .C01 from the folder and only keep .png ones (6144 images in each full plate)
Copy and run prediction model over them
Run area_size.py script over them and copy the files to A, or B, or C, or D
Run the plot and visualization scripts over all of them
Download and Extract
For downloading plates from Swestore, we have used:
lftp https://username@webdav.swestore.se/snic/folder/
For downloading whole plate we used:
get plate_number1.tar.gz plate_number2.tar.gz ...
For extracting and converting the format of images we have used “bfconvert” function of bftools command line tool.
This is conducted through command line or preprocessing/extract_conversion.py script and subprocess library.
The following command should run over images after bfconvert command to normalize images to 8-bit format,
ls *.png ; while read file; do convert file -auto-level -depth 8 -define quantum:format=unsigned -type grayscale file; done
or through extract_conversion.py script.
Besides, We extracted multi plates through :
for FILE in *.tar.gz; do
echo ${FILE} | cut -d '/' -f 3
sbatch -A project_name -n 1 -t 5:00:00 --wrap="python extract_conversion.py $(echo ${FILE}|cut -d '/' -f 3)"
sleep 1 # pause to be kind to the scheduler
done
Image list
ls *.png > ../4_filelists/plate_num_names.txt
Run Prediction script
For running prediction script we have used CPU multi-thread parallel prediction and gpu by adding following parts to the code.
global model
global graph
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, as_completed
from multiprocessing import cpu_count
with open(path_files_list) as image_list:
image_names_all = [os.path.join(path_files + sys.argv[1]+'/', f.strip()) for f in image_list]
clear_session()
model = utils.model_builder.get_model_3_class(1104, 1104,1)
model.load_weights('model_14.hdf5') ## loading best model here from UNet models
model._make_predict_function()
tf.Graph()
graph = tf.get_default_graph()
test = [prediction_images(image_names_all[i_batch*128:(i_batch+1)*128]) for i_batch in range(48)] ## use this if you do not want multi-threading
## use this if you want multi-threading
CPU_LIMIT=128 ## This is the cluster limit
with ThreadPoolExecutor(min(CPU_LIMIT,cpu_count())) as executor:
print('befoooooooooooooooooore submit function')
futures=[executor.submit(prediction_images ,image_names_all[i_batch*16:(i_batch+1)*16]) for i_batch in range(384)]#range(len(image_names_all))]
for future in as_completed(futures):
res=future.result()
print('resssssssssssssuuuuuuuuuuuuuult: ',res)
We have also predict several plates using only gpu by
python run_prediction.py
where, we have
import subprocess
subprocess.run("python prediction.py MFGTMPcx7_170801050001 ../data/4_filelists/MFGTMPcx7_170801050001_names.txt", shell=True)
subprocess.run("python prediction.py MFGTMPcx7_170801100001 ../data/4_filelists/MFGTMPcx7_170801100001_names.txt", shell=True)
...
Area and Number of Nuclei
After running prediction script, for every plate, a “segm” folder was created where the predicted segmentation masks were stored. By running pipeline/plate_script/area_count.py
We will have a .csv file for each plate where we have three columns as follows.
Pred_Object |
Image_name |
Area |
---|---|---|
0 |
MFGTMPcx7_170525180001_A01f00d0.png |
141 |
1 |
MFGTMPcx7_170525180001_A01f00d0.png |
1545 |
2 |
MFGTMPcx7_170525180001_A01f00d0.png |
2179 |
- |
- |
- |
Visualization
This then saved and averaged for all wells for each plate and connected to the gene that were knocked-down at that well. The visualization and numercial results are all in screen_visualization and results directories.