# Full pipeline After conducting accurate annotations published in [Nuclei](https://www.sciencedirect.com/science/article/pii/S2352340922009726) and [Cytosol](https://www.sciencedirect.com/science/article/pii/S2352340924011107), training U-Net and HoVer-Net models for segmenting the nuclei channel, we have compared the top 3 best models from each architecture and at the end we have used the best U-Net model that we have trained on our own annotated images as well as some images from [BBBC](https://bbbc.broadinstitute.org/) image set. The model's performance on test set were validated where it had 85% F1-score on 90% overlapping threshold and 88% average Jaccard index. The experiment were conducted in two Oxidative stress (A, B) and non-oxiditive stress (C,D) groups each group has one repetition all saved in 79 groups for almost 18000 genes. The status of each group of raw data and how we proceeded to analyse is summarized in progress_log/plateProgress_done.xlsx file. Some of plates were missing or there were several copies. We had to deal with missing or repeated data. The overall process for each plate is as follows: 1) Download from swestore 2) Run extract_conversion.py script over plates to extract only d0 channel. 3) Normalize them to 8bit images using bash command 4) Save the names in 4_filelist folder and remove .C01 from the folder and only keep .png ones (6144 images in each full plate) 5) Copy and run prediction model over them 6) Run area_size.py script over them and copy the files to A, or B, or C, or D 7) Run the plot and visualization scripts over all of them ## Download and Extract For downloading plates from Swestore, we have used: ```bash lftp https://username@webdav.swestore.se/snic/folder/ ``` For downloading whole plate we used: ```bash get plate_number1.tar.gz plate_number2.tar.gz ... ``` For extracting and converting the format of images we have used "bfconvert" function of [bftools](https://docs.openmicroscopy.org/bio-formats/5.7.1/users/comlinetools/index.html) command line tool. This is conducted through command line or preprocessing/extract_conversion.py script and subprocess library. The following command should run over images after bfconvert command to normalize images to 8-bit format, ```bash ls *.png ; while read file; do convert file -auto-level -depth 8 -define quantum:format=unsigned -type grayscale file; done ``` or through extract_conversion.py script. Besides, We extracted multi plates through : ```bash for FILE in *.tar.gz; do echo ${FILE} | cut -d '/' -f 3 sbatch -A project_name -n 1 -t 5:00:00 --wrap="python extract_conversion.py $(echo ${FILE}|cut -d '/' -f 3)" sleep 1 # pause to be kind to the scheduler done ``` ## Image list ```bash ls *.png > ../4_filelists/plate_num_names.txt ``` ## Run Prediction script For running prediction script we have used CPU multi-thread parallel prediction and gpu by adding following parts to the code. ```python global model global graph from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, as_completed from multiprocessing import cpu_count with open(path_files_list) as image_list: image_names_all = [os.path.join(path_files + sys.argv[1]+'/', f.strip()) for f in image_list] clear_session() model = utils.model_builder.get_model_3_class(1104, 1104,1) model.load_weights('model_14.hdf5') ## loading best model here from UNet models model._make_predict_function() tf.Graph() graph = tf.get_default_graph() test = [prediction_images(image_names_all[i_batch*128:(i_batch+1)*128]) for i_batch in range(48)] ## use this if you do not want multi-threading ## use this if you want multi-threading CPU_LIMIT=128 ## This is the cluster limit with ThreadPoolExecutor(min(CPU_LIMIT,cpu_count())) as executor: print('befoooooooooooooooooore submit function') futures=[executor.submit(prediction_images ,image_names_all[i_batch*16:(i_batch+1)*16]) for i_batch in range(384)]#range(len(image_names_all))] for future in as_completed(futures): res=future.result() print('resssssssssssssuuuuuuuuuuuuuult: ',res) ``` We have also predict several plates using only gpu by ```bash python run_prediction.py ``` where, we have ```python import subprocess subprocess.run("python prediction.py MFGTMPcx7_170801050001 ../data/4_filelists/MFGTMPcx7_170801050001_names.txt", shell=True) subprocess.run("python prediction.py MFGTMPcx7_170801100001 ../data/4_filelists/MFGTMPcx7_170801100001_names.txt", shell=True) ... ``` ## Area and Number of Nuclei After running prediction script, for every plate, a "segm" folder was created where the predicted segmentation masks were stored. By running pipeline/plate_script/area_count.py We will have a .csv file for each plate where we have three columns as follows. |Pred_Object | Image_name | Area | |--------------|-----------------------------------------|--------| | 0 | MFGTMPcx7_170525180001_A01f00d0.png | 141 | | 1 | MFGTMPcx7_170525180001_A01f00d0.png | 1545 | | 2 | MFGTMPcx7_170525180001_A01f00d0.png | 2179 | | - | - | - | ## Visualization This then saved and averaged for all wells for each plate and connected to the gene that were knocked-down at that well. The visualization and numercial results are all in screen_visualization and results directories.