February A deep learning-based toolkit for 3D nuclei segmentation and quantitative analysis in cellular and tissue context Athul Vijayan 0 10 11 12 13 2 3 4 5 6 7 8 9 Tejasvinee Atul Mody 0 10 11 12 13 2 3 4 5 6 7 8 9 Qin Yu 0 10 11 12 13 2 3 4 5 6 7 8 9 Adrian Wolny 0 10 11 12 13 2 3 4 5 6 7 8 9 Lorenzo 0 10 11 12 13 2 3 4 5 6 7 8 9 Anna Kreshuk 0 10 11 12 13 2 3 4 5 6 7 8 9 Kay Schneitz kay.schneitz@tum.de 0 10 11 12 13 2 3 4 5 6 7 8 9 , Fred A. Hamprecht , Richard S. Smith 3 Cerrone , Soeren Strauss Collaboration for joint PhD degree between European Molecular Biology Laboratory Department of Comparative Developmental and Genetics, Max Planck Institute for European Molecular Biology Laboratory , Heidelberg , Germany IWR, Heidelberg University , Heidelberg , Germany Plant Breeding Research , Cologne , Germany Plant Developmental Biology, TUM School of Life Sciences, Technical University TUM School of Life Sciences Technical University of Munich The John Innes Centre , Norwich Research Park, Norwich , UK and Heidelberg University, Faculty of Biosciences , Heidelberg , Germany of Munich , Freising , Germany 2024 21 2024 9 50

Present address Plant Developmental Biology

-

4,5 Authors contributed equally

Corresponding author Plant Developmental Biology TUM School of Life Sciences Technical University of Munich

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Emil-Ramann-Str. 4 D-85354 Freising Tel: +49 8161 715438 Running title: 3D segmentation of nuclei Arabidopsis, ovule

Summary Statement

We present computational tools that allow versatile and accurate 3D nuclear segmentation in plant organs, enable the analysis of cell-nucleus geometric relationships, and improve the accuracy of 3D cell segmentation. 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 We present a new set of computational tools that enable accurate and widely applicable 3D segmentation of nuclei in various 3D digital organs. We developed a novel approach for ground truth generation and iterative training of 3D nuclear segmentation models, which we applied to popular CellPose, PlantSeg, and StarDist algorithms. We provide two high-quality models trained on plant nuclei that enable 3D segmentation of nuclei in datasets obtained from fixed or live samples, acquired from different plant and animal tissues, and stained with various nuclear stains or fluorescent protein-based nuclear reporters. We also share a diverse high-quality training dataset of about 10,000 nuclei. Furthermore, we advanced the MorphoGraphX analysis and visualization software by, among other things, providing a method for linking 3D segmented nuclei to their surrounding cells in 3D digital organs. We found that the nuclear-to-cell volume ratio varies between different ovule tissues and during the development of a tissue. Finally, we extended the PlantSeg 3D segmentation pipeline with a proofreading script that uses 3D segmented nuclei as seeds to correct cell segmentation errors in difficult-to-segment tissues. 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82

Introduction

Tissue morphogenesis is a complex, multi-scale process that ultimately results in an organ or tissue of a specific size and shape and characteristic 3D cellular architecture. Advances in imaging increasingly allow generation of 3D digital organs with cellular resolution, which are useful tools for unraveling the integration and feedback processes between molecular regulatory circuits and the cellular architecture of developing tissues and organs. Plants are excellent systems for generating 3D digital organs because their cells are immobile and the cellular architecture of plant organs can be easily observed using various types of microscopy.

Over the years, and partly through the application of artificial intelligence, powerful open-source software packages have been developed for 3D cell segmentation of confocal microscopy images (Barbier de Reuille et al., 2015; Eschweiler et al., 2019; Fernandez et al., 2010; Schmidt et al., 2014; Sommer et al., 2011; Stegmaier et al., 2016) . Machine learning based software, including CellPose, PlantSeg and StarDist, represents a recent advance in this area, providing improved 3D segmentation of tissues at cellular resolution (Eschweiler et al., 2019; Stringer et al., 2021; Weigert et al., 2020; Wolny et al., 2020) . The output of such pipelines can then be quantitatively analyzed in image analysis software like MorphoGraphX (Barbier de Reuille et al., 2015; Strauss et al., 2022) . The advances in these computational resources have enabled the generation of a number of digital 3D models of a variety of plant organs, which have allowed single-cell analysis in 3D and have been instrumental in gaining fundamental insights into various processes in plants, including embryo, root, and ovule development (Bassel et al., 2014; Fridman et al., 2021; Graeff et al., 2021; Hernandez-Lagana et al., 2021; Lora et al., 2017; Montenegro-Johnson et al., 2015; 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107

Ouedraogo et al., 2023; Pasternak et al., 2017; Schmidt et al., 2014; Vijayan et al., 2021; Yoshida et al., 2014).

An important feature that is presently missing from these 3D digital models is the integration of the size and shape of the nuclei into the cellular framework. The ability to not only robustly segment nuclei in 3D, even in deeper tissues, but also to link the 3D architectures of nuclei and their surrounding cells in a tissue-specific context enables the study of central biological processes such as nuclear size control (Cantwell and Nurse, 2019c) . Another key process is the control of gene expression. Spatial gene expression patterns as well as expression levels can be assessed with cellular resolution, for example, using ratiometric nuclear reporters driven by genespecific promoters (Federici et al., 2012).

ClearSee-based protocols for cleared whole-mount preparations of plant organs allow staining of cell walls and nuclei with various cytological dyes without the need for transgenic plants carrying the appropriate reporter constructs and maintain compatibility with reporters based on fluorescent proteins (Kurihara et al., 2015; Musielak et al., 2015; Tofanelli et al., 2019; Ursache et al., 2018). The establishment of the 3D digital reference atlas of Arabidopsis ovule development represents a recent example that used this approach (Vijayan et al., 2021). During the preparation of the atlas, ovules were fixed and cleared with ClearSee (Kurihara et al., 2015). Cell outlines were stained with the cell wall stain SCRI Renaissance (SR2200) (Harris et al., 2002; Musielak et al., 2015), while the nuclei were stained with TO-PRO-3 (Bink et al., 2001; Van Hooijdonk et al., 1994). The digital ovule atlas provided detailed insight into the 3D cellular architecture of the ovule but lacked information on the size and shape of the nuclei. TO-PRO-3 stains double-stranded nucleic acids and can therefore be a useful tool for 3D volumetric nuclear extraction. However, the signal 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 intensity of any typical nuclear stain can exhibit variable intensities, scatter, and photobleaching when imaging deeper tissue layers, rendering accurate 3D nuclear segmentation extremely difficult.

Therefore, our overall goal is to accurately segment plant nuclei in 3D images with weakly stained nuclei. Several deep learning-based segmentation algorithms have recently been proposed for this task: PlantSeg (Wolny et al., 2020), Cellpose (Stringer et al., 2021) , and StarDist (Weigert et al., 2020). However, none of them can be used out of the box. PlantSeg and CellPose pre-trained models have not been exposed to weakly stained plant nuclei while 3D StarDist does not provide trained models and requires retraining. The main bottleneck for model training is the lack of publicly available 3D ground truth with correctly delineated nuclei. This step is famously labor-intensive even for high-contrast, high signal-to-noise ratio (SNR) image volumes.

In this study, we combine different staining strategies to quickly achieve 3D segmentation ground truth for model training. Together with human-in-the-loop correction, we use this approach to acquire fully annotated volumes of weakly stained nuclei. On this basis, we train highly accurate segmentation networks, which we show to be generalizable to other datasets obtained by various imaging methods and from a variety of plant and animal tissues labeled with different staining methods. In addition, we introduce a combination of processes in MorphoGraphX that associates each nucleus with the cell in which it resides, and that provides the nucleus with the cells' respective tissue labels. It allows the investigation of various cell-nucleus relationships, such as the nucleus-to-cell volume (N/C) ratio. We demonstrate the

general value and broad applicability of these technical advances in proof-of-concept 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 analyses.

Results

training A novel iterative approach to ground truth generation and 3D nuclear model In a first attempt at 3D nuclear segmentation of TO-PRO-3-stained ovule nuclei in 3D image stacks, we found that the available plant nuclei segmentation model in PlantSeg did not yield segmented nuclei of sufficient quality for ground truth generation. Thus, we employed Cellpose (Pachitariu and Stringer, 2022; Stringer et al., 2021) as it had an existing nuclei model used for 3D nuclear segmentation. However, we still observed improper segmentation with errors in detecting and separating nuclear borders (Fig. 1A-D). This is probably due to the TO-PRO-3 nuclear staining being variable and often quite weak and diffuse, particularly in deeper layers. In addition, the signal was absent in the nucleolus, resulting in an uneven nuclear surface and segmentation that looked like a hole extruded from the nuclear surface (Fig. 1C). To address these issues, we developed a novel strategy based on samples that simultaneously show strong and faint signals in the nuclei that can be collected in separate channels. We first generated a transgenic line expressing a translational fusion of the fluorescent protein tdTomato to histone H2B driven by the UBIQUITIN10 (UBQ) promoter (pUBQ::H2B:tdTomato). Ovules of this reporter line were fixed, cleared, and stained with the cell wall stain SR2200 and the nuclear stain TO-PRO-3. Ovules were imaged and the SR2200, TO-PRO-3, and H2B:tdTomato signals were collected in three separate channels (Fig. 1F,G,I). The broadly expressing 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 nuclear pUBQ::H2B:tdTomato reporter provided a strong and uniform nuclear signal that could be segmented into nuclei using the standard Cellpose nuclear model (Fig. 1D,E,G,H). We then used the results of human proofread instance nuclear segmentation of the strong H2B:tdTomato reporter channel as the “initial ground truth” for training three sets of initial 3D models: PlantSeg_3Dnuc_initial, StarDistResNet_3Dnuc_initial, and Cellpose-Finetune-nuclei_3Dnuc_initial. The PlantSeg and StarDist initial models were trained on the weak TO-PRO-3 nuclear stain channel using the neural networks implemented in the respective pipelines. The Cellpose initial models were trained on the TO-PRO-3 channel by fine-tuning the pretrained Cellpose “nuclei'' model. The segmentation results using the initial models turned out to be still imperfect and required several corrections by an expert.

To obtain further model improvements we applied an iterative training strategy (Fig. 1J). We used the StarDist-ResNet_3Dnuc_initial model to segment the original weak TO-PRO-3-based nuclear stain channel as it provided the best qualitative results, resulting in a modified ground truth. This modified ground truth was then human proofread, resulting in the “gold ground truth”. In a next step, the "gold ground truth" and the original weak TO-PRO-3-based nuclei stain were used to train six sets of 3D “gold models'' using one or multiple neural networks implemented in PlantSeg, Cellpose, and StarDist (Table 2), probing for the best parameter settings. We tested how much model performance improved when human-in-the-loop (HITL) was involved, i.e., initial vs gold model. To this end we employed a quantitative comparison of initial and gold PlantSeg, StarDist-ResNet and Cellpose-Finetunenuclei models. We made use of the imperfect initial models to generate modified and better ground truth by involving a HITL proofreading before using them for the 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 training that resulted in the “gold models''. (Fig. 1J). The detailed description of model training including the datasets used for training and testing are provided in the “Model training and score quantification” section of Materials and Methods. Comparison of model performance between initial and gold PlantSeg, Cellpose-Finetune-nuclei and StarDist-ResNet models was performed by 5-fold average precision (AP) score quantification (Table 1). Results indicate that all methods demonstrate increased performance after gold training. PlantSeg and StarDist-ResNet gold models turned out to be superior to the Cellpose-Finetune-nuclei gold models and demonstrated high precision segmentation compared to the respective initial models.

Comparisons of the different gold models

Quantitative and qualitative performance comparisons of the different gold models were performed and results are presented in Table 2, Fig. 2, and Fig. S1. With the exception of the Cellpose-derived models, all other gold models performed excellently on the raw images of nuclei stains as can be seen with qualitative comparison (Fig. 2, Fig. S1). The weak nuclei signals were strongly detected especially with the proposed new PlantSeg_3Dnuc_gold, StarDistResNet_3Dnuc_gold and Stardist-UNet_3Dnuc_gold models. Segmented nuclei surfaces were devoid of any artifacts like an extruded hole as in the raw nuclei image segmentation prior to developing this method. The AP scores obtained in these cases were very high when compared to the proposed new Cellpose nuclei gold models (Table 2). Average precision graphs also clearly indicate high precision of the PlantSeg, StarDist-ResNet, and StarDist-UNet gold models and how little they vary compared to the Cellpose gold models (Fig. S1I-N).

StarDist-ResNet and PlantSeg gold models are two highly reliable models 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226

The PlantSeg model was trained to produce a nuclear center probability map and a nuclear envelope probability map (Fig. 2E). The nuclear envelope probability was processed by Generalized Algorithm for Signed Graph Partitioning (GASP) (Bailoni et al., 2019) to obtain an initial instance segmentation, which is then filtered according to the probability of the nuclei center (Fig. 2D,F). Data volumes for both training and inference do not need any changes in terms of isotropy or intensity, and can be fed into PlantSeg as it is. Increasing patch size does not improve accuracy. The downside of PlantSeg is that the post-processing algorithms were designed for dense segmentation and therefore tend to over-segment the background, which can be easily fixed by applying a foreground mask or even manually. PlantSeg results in the assignment of very accurate instance masks to most objects, because it finds boundaries of the biological structure of interest and provides a nuclear envelope probability map (Fig. 2D-F). The minor imperfections caused by PlantSeg GASP and final thresholding in PlantSeg segmentation can be very easily improved by removing a few false positives and relabeling a few false negatives.

StarDist-ResNet and StarDist-UNet models output a nuclei probability map (Fig. 2H,K) and nuclei instance segmentation (Fig 2G,I,J,L). Both the StarDist models resulted in very smooth and uniform instance masks of all objects, because it fits starconvex shapes to objects (Fig 2G-L). StarDist is sensitive to object shapes; elongated objects are predicted accurately in its probability maps, but are then sometimes fitted into small and wrong instance masks. The segmentation always looked clean and smooth. Isotropy of data volumes matters, one could specify a grid parameter that downsamples the input to fit instances into the network’s field of view. A bigger patch size can help in terms of object detection but not mean average precision. The 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 imperfection caused by size and shape prior in StarDist segmentation can be improved by merging a few oversegmented instances.

For Cellpose, we fine tuned two pretrained models (Nuclei, Cyto2) and in addition, trained a new model from scratch (Fig. S1). Due to the 2D nature of Cellpose, it is recommended that data for either training or future inference be transformed into isotropic volumes for best results. Cellpose is very sensitive to its diameter parameter. In this study, the fixed default object diameter parameters for pretrained models were set to be 30 for non-nucleus models and 17 for nucleus models, and that for scratchtrained models is inferred from our data. Cellpose results in good instance masks (Fig. S1) but overall less accurate segmentations compared to proposed StarDist and PlantSeg models (Table 2). Overall, while final Cellpose output turned out to be worse than StarDist and PlantSeg even after retraining, it’s important to remember that it was the best method (Table S2) to provide a starting point in absence of humanannotated ground truth in the first step of our experiments.

Wide applicability of the PlantSeg_3Dnuc and StarDist-ResNet platinum models So far, the results indicated that PlantSeg_3Dnuc_gold and StarDistResNet_3Dnuc_gold emerged as the preferred models for accurately segmenting 3D plant nuclei. Therefore, we trained two final platinum models based on PlantSeg and StarDist-ResNet, respectively, using all available training datasets (Fig. S2). This resulted in the two 3D platinum models, PlantSeg_3Dnuc_platinum and StarDistResNet_3Dnuc_platinum. For nuclei segmentation using the two platinum models, we made available the GoNuclear repository (https://github.com/kreshuklab/go-nuclear) that hosts the pipelines used in this study. 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273

To test the broad applicability of the trained platinum models in 3D nuclear segmentation, we used both platinum models to segment nuclei from diverse and challenging datasets, including a variety of tissues from different plant species as well as early mouse embryos, stained with nuclear stains or expressing nuclear reporters. Our diverse 3D nuclei datasets included a fixed, cleared, TO-PRO-3-stained Antirrhinum majus ovule; a fixed, cleared, DAPI-stained Arabidopsis thaliana ovule; live Arabidopsis sepal nuclei expressing the pATML1::mCitrine-ATML1 reporter (Meyer et al., 2017); live Cardamine hirsuta leaf expressing the ChCUC2g::VENUS reporter (Rast-Somssich et al., 2015); and fixed and cleared Arabidopsis shoot apical meristem nuclei expressing the pFD:3xHA-mCHERRY-FD reporter (Cerise et al., 2023; Martignago et al., 2023). In addition, we segmented nuclei of the BlastoSPIM data set obtained by live 3D imaging of blastocyst stage mouse embryos expressing the nuclear marker H2B-miRFP720 using Selective Plane Illumination Microscopy (SPIM) (Nunley et al., 2023).

Both the PlantSeg_3Dnuc_platinum and StarDist-ResNet_3Dnuc_platinum models resulted in comparable high quality segmentations. The results of segmentation using StarDist-ResNet_3Dnuc_platinum are presented here, as its use is less involved compared to PlantSeg_3Dnuc_platinum (Fig. 3, Fig. S3).

We segmented the above-mentioned datasets using the StarDist-ResNet and PlantSeg platinum models after image preprocessing (Table 3). The preprocessing was required to ensure the datasets to be segmented matched the training datasets in nuclear size and quality. We observe that the nuclei of all mentioned datasets could be properly 3D segmented using the proposed models (Fig. 3). Further, even though the models were trained on cleared, high-resolution datasets, they are capable of segmenting nuclei 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 from low resolution datasets as well, for instance the Cardamine leaf nuclei and mouse embryo nuclei from live samples. A precise segmentation of the pChCUC2g::Venus nuclear signal further allows for quantification of the number of pChCUC2g::Venus expressing nuclei along with signal quantification if required. The StarDist-ResNet platinum model could also segment extremely challenging datasets with high variation in intensities after applying some preprocessing (Fig. S3). The results demonstrate the broad applicability of the platinum models in 3D segmentation of nuclei of different tissues and species.

MorphoGraphX as a platform for mapping 3D nuclei to whole organ cell atlas with single cell and tissue resolution Multichannel 3D confocal imaging allowed simultaneous imaging of both the cell and nuclear stain channels. MorphoGraphX enables 3D visualization and allows complex annotations and quantifications (Fig. S2F-I, Fig. S3H). We reasoned that it should be possible to combine 3D cell segmentation and 3D nuclear segmentation of the imaged 3D stack. 3D cell segmentation assigns cells their cell IDs and 3D nuclear segmentation assigns nuclei their nuclei IDs; however, they are not directly linked. In MorphoGraphX, these 3D cell and nuclei segmentation images are converted to 3D meshes representing individual objects. To address the issue of linking nuclei and corresponding cell IDs, we developed a novel process in MorphoGraphX that automatically annotates and links nuclei IDs with their corresponding cell IDs (Fig. 4A-H) (see Materials and Methods). The 3D cell meshes can then be assigned tissue labels via manual or semi-automated cell-type labeling (Strauss et al., 2022) (Fig. 4AB). 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320

In addition to linking nuclear and cell IDs, we also added MorphoGraphX tools to quantify the Euclidean distance between 3D cell and 3D nuclear centroids or map 2.5D cells to underlying 3D nuclei (Fig. S4) (Materials and Methods). 3D cell segmentation can be challenging especially when working with live images. In such cases, one may have to resort to 2.5D cell segmentation. We present a MorphoGraphX method for associating 2.5D surface cells with 3D nuclei. MorphoGraphX achieves this link by projecting 3D segmented nuclei stacks onto the 2.5D segmented cell mesh (Fig. S4A-C). Additionally, the process “Select Duplicated Nuclei” is a useful tool to identify cell segmentation errors as it detects cells with more than one nucleus. This entire collection of processes are included in MorphoGraphX version 2.0.2. and higher and can be found in the process folder “Mesh/Nucleus” (see Materials and Methods). The development of these new MorphoGraphX processes opens up new possibilities to integrate cell features with nuclei features and to study quantitative parameters of nuclei in their cellular context. Developmental regulation of the nucleus-to-cell volume ratio in Arabidopsis ovules For more than a century it has been noticed that the N/C ratio is a constant parameter of a given cell type that can vary between cell types in multicellular organisms (Cantwell and Nurse, 2019c; Wilson, 1925) . Most of these studies involved selecting a few cells of embryos or single cells, such as yeast, and measurements based on diameter or area values derived from 2D sections. Here, we investigated the N/C ratio in Arabidopsis ovules of different stages and in full 3D tissue context. We measured the nuclear volumes, cell volumes, N/C ratios and their trends in five stage 2-I ovule primordia and in two more differentiated stage 3-II ovules. In addition, we assessed 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 these parameters during the development of an integumentary cell layer using two ovules per stage (Fig. 4I-N) (Fig. S5).

The dome-shaped Arabidopsis ovule primordium, like the shoot apical meristem, has a layered organization, such that the L1, L2, and L3 are the outer to inner layers, respectively (Jenik and Irish, 2000; Satina et al., 1940; Schneitz et al., 1995) . At stage 2-I the primordium is further characterized by the presence of an enlarged L2-derived megaspore mother cell (MMC) at the tip that will undergo meiosis and eventually produce the haploid female gametophyte (Schneitz et al., 1995; Vijayan et al., 2021) . We investigated if nuclear and cell volumes, as well as N/C ratios differ in a layerspecific manner in the ovule primordium. We observed that the L1 layer can be distinguished from the L2 and L3 layers by its different N/C ratio, as the N/C ratio of L1 cells was statistically different from the N/C ratio of L2 or L3 cells. The L2 and L3 N/C ratios were not noticeably different (Fig. 4I, Fig. S5A,B). Cells of the outermost L1 layer have the highest N/C ratio (0.30 ± 0.08 (mean ± SD), followed by the cells of the inner L2 (0.24 ± 0.07) and L3 (0.23 ± 0.09) layers (Fig. 4I). For all three layers, we obtained a positive Pearson correlation coefficient, r, between nuclear and cell volumes; the correlation is strongest in the L2 layer, followed by the L1 and L3 layers, respectively (Fig. 4J-L). When analyzing the average cell and nuclear volumes 3 for each layer, we found that the average cell volumes of the L2 (128.10 ± 47.73 µm , 3 excluding MMCs) and L3 (132.70 ± 54.53 µm ) layers were similar and markedly 3 larger than the average cell volume of the L1 (98.86 ± 38.06 µm ) layer (Fig. S5A). In contrast, the average nuclear volume between the three cell layers remained 3 3 comparable with values of 27.88 ± 9.14 µm (L1), 28.92 ± 8.87 µm (L2, excluding 3 MMCs), and 27.44 ± 10.53 µm (L3) (Fig. S5B). Thus, the difference in the N/C 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 values between the L1 and L2/L3 layers relates to the smaller average cell volume in the L1 compared to the L2 and L3 layers.

Current evidence suggests that nuclear size scales with cell size and not with the amount of nuclear DNA (Cantwell and Nurse, 2019c) . We tested if the scaling rule holds true for the MMCs (Fig. S5C,D). We found that the average nuclear and cell 3 volumes of the tested MMCs (147.9 ± 27.85 µm for the nuclear volume and 845.4 ± 3 101.5 µm for the cell volume) both exceeded the respective values of the other much smaller L2 cells by approximately a factor of 5. As a result, the N/C ratio values of the MMCs and the other L2 cells were indistinguishable, and thus the MMCs conform to this rule.

To confirm the finding of cell type-specific N/C ratios in the ovule primordium we explored the more differentiated stage 3-II ovules exhibiting a clear multi-tissue organization. By this stage the Arabidopsis ovule is composed of the distal nucellus, which contains the developing female gametophyte, the central chalaza with two lateral determinate structures, the integuments, and the proximal funiculus, the stalk that connects the ovule to the placenta (Schneitz et al., 1995; Vijayan et al., 2021) . In addition, the chalaza can be divided into an anterior and posterior chalaza based on morphological criteria such as different cell shapes and sizes of its constituent cells. In addition, each integument consists of two cell layers, each one cell thick. The analysis of the average N/C values across different tissues revealed that the nucellus and funiculus exhibited comparable values. In contrast, we found the posterior chalaza to show a higher N/C ratio than the anterior chalaza (Fig. 4M). We also observed that the inner layers of both the outer and inner integuments exhibited a higher N/C ratio than the corresponding outer layers. 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392

To address the question if the N/C ratio changes during development of a specific tissue layer, we focussed on the outer layer of the inner integument (ii2). We analyzed the ii2 nuclear and cell volumes, and the N/C ratios for stages 2-IV, 2-V, 3-I, 3-II, 3IV and 3-V. We observed that from stage 2-IV to stage 3-IV, there was a decline in the ii2 N/C ratio (0.29 ± 0.08 towards 0.15 ± 0.07), followed by an increase from stage 3-IV to 3-V (0.14 ± 0.06 versus 0.17 ± 0.07) (Fig. 4N). To assess the basis for this decrease in the N/C ratio during development of the ii2 layer we analyzed the average nuclear and cell volumes between successive stages (Fig. S5C-D). We found that the average cell volume of ii2 cells increased noticeably with a value of 129.8 ± 3 3 58.06 µm at stage 2-IV and 220.50 ± 130.0 µm at stage 3-IV (Fig. S5C). In comparison, the average nuclear volume experienced only minor alterations (35.46 ± 13.73 µm3, stage 2-IV; 27.03 ± 10.36 µm3, stage 3-II; 31.93 ± 9.91 µm3, stage 3-V) (Fig. S5C). Thus, we find that the change in the N/C ratio during development of the ii2 cell layer is related to a marked increase in cell volume accompanied by a largely constant nuclear volume. Further estimation of the stagewise Pearson correlation coefficient, r, for ii2 revealed that there is a positive correlation between cell volumes and corresponding nuclear volumes of ii2 across development up to stage 3-IV. By stage 3-V, however, this correlation is noticeably reduced (Fig. S5E-J). In summary, the results suggest that the N/C ratio is specific to a cell type and its developmental stage in the Arabidopsis ovule.

Automatic proofreading of 3D cell segmentation based on reliable 3D nuclear segmentation Despite the significant improvement in cell boundary prediction provided by the PlantSeg segmentation pipeline, the final image segmentation may still contain some 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 errors in certain regions of the images where cell wall staining is poor. An example is the faint walls around the megaspore mother cell (MMC) in young Arabidopsis ovules (Vijayan et al., 2021) (Fig. 5A-D). From the raw cell wall stain images, it is almost impossible to identify the presence of these faint walls. A similar scenario sometimes applies to cells in the interior chalaza (Fig. 5E-H). The processed raw images (brightened) along with the nuclei stain clearly display the faint wall and the presence of multiple nuclei in this region confirming the cell segmentation error in this region. We developed a python script called “proofreading” to automatically correct the instance 3D cell segmentation using a trusted and proofread 3D nuclear segmentation and added it to the growing collection of helper tools for the PlantSeg pipeline (https://github.com/hci-unihd/plant-seg-tools). The script takes the cell boundary prediction, cell segmentation and nuclei segmentation as input images. It automatically finds the erroneous cell segmentation by first quantifying the number of nuclei within a cell. When it finds a cell with more than one nucleus, a bounding box is approximated in 3D around this cell. Further corrections are only made within the bounding box. Corrections are made by resegmenting the erroneous 3D cell using watershed segmentation with nuclei as seeds. The t-merge parameter can be altered to improve the segmentation further if the default value does not seem to improve the result. The method does not apply to a scenario where the segmentation error relates to a missing cell instead of an under segmented cell. The detailed method is described in the Materials and Methods section.

This method now corrects the segmentation error in most cases and leaves other cells without segmentation errors untouched (Fig. 5C,D and G,H). We performed another test by assessing Cardamine parviflora (C. parviflora) ovule primordia (Fig. 5I-L). 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440

This species harbors weakly crassinucellate ovule primordia (Endress, 2011), i.e., it develops an additional hypodermal cell layer, with an initial archesporial cell in the L2 undergoing periclinal division resulting in an upper parietal cell and a lower MMC (Harvey and Smith, 2013; Mody et al., 2023) . The ability to visualize this is usually lost after standard PlantSeg-based 3D cell segmentation (Fig. 5I,J), but the proofreading script can correct this error (Fig. 5K,L). The proofreading thus minimizes 3D cell segmentation errors and enables the examination of 3D cell volumes for cells that are challenging to segment accurately.

Discussion

We present a collection of computational tools and datasets that extend the capabilities for quantitative analysis of 3D digital organs. We have developed a deeplearning based computational toolkit for 3D nuclear segmentation that enables accurate 3D segmentation of nuclei in a variety of 3D digital organs labeled with a range of nuclear markers or stains, even in faintly stained and noisy images. Importantly, we not only provide a valuable plant nuclear dataset for training 3D nuclear segmentation algorithms but also two accurate platinum models for 3D nuclear segmentation with broad applicability. In addition, we outline novel and processes that we have added to MorphoGraphX to enable the analysis of various cellnucleus geometric parameters in 3D, including the N/C ratio. Finally, we have created a proofreading script that significantly improves the fidelity of 3D cell segmentation. All tools are open source and readily available to the community via public software repositories.

A particular value of the 3D nuclear segmentation toolkit lies in its broad applicability. The method can be successfully used with various nuclear staining 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 methods, ranging from different nuclear stains with variable staining intensities, such as TO-PRO-3 or DAPI, to nuclear reporters based on fluorescent reporters. In addition, nuclei can be segmented in data sets obtained from cleared or live tissue, not only from a range of different plant tissues, but also from animal tissues such as mouse embryos. An optimized workflow from imaging to 3D segmentation of nuclei dataset can be found in the Materials and Methods section.

We used PlantSeg (Wolny et al., 2020), Cellpose (Stringer et al., 2021) and StarDist (Schmidt et al., 2018; Weigert et al., 2020) as three strong baselines for 3D nuclear segmentation and performed a comparative analysis of the performance of the models obtained from each platform. Cellpose was the only tool that provided a pre-trained model which could perform the initial segmentation. However, in the presence of ground truth, it was demonstrated to be less stable, with more variability in the results depending on the training/test split of the data. Re-trained PlantSeg and StarDist both demonstrated excellent, stable performance. The advantage of PlantSeg is its ability to also perform cell segmentation from membrane staining and the general absence of explicit star-convexity prior which can be harmful for segmentation of irregular nuclei. However, in very noisy conditions StarDist is preferred as the shape prior helps it overcome the low SNR. It also needs to be noted that our ground truth annotations are produced through iterative improvement using a StarDist model, so the resulting shapes might be biased towards being more regular and star-convex. An important feature of MorphoGraphX is the projection of secondary signals onto the cell surfaces, which enables the quantification of nuclei, cell wall or cytoplasmic signal intensities based on the cellular segmentation (Barbier de Reuille et al., 2015; Montenegro-Johnson et al., 2015). What up to now was missing, however, was the integration of the size and shape of the nuclei into the cellular framework. We present 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 process identifies the cells in which nuclei are located. It is run on the active 3D cell mesh in MorphoGraphX mesh 1, while the 3D nuclei mesh is loaded in the MorphoGraphX Mesh 2. The process assigns cell IDs as “parents” annotation to the nuclei labels, thereby linking cells IDs to nuclei IDs. On the 3D nuclei mesh (active), the “Mesh/Lineage tracking/Save parents” process was used to save the nuclei IDs and their corresponding parent cell IDs in a csv file, followed by the “Mesh/Lineage tracking/Copy parents to labels” process to rewrite the nuclei labels IDs to that of cells. These processes in combination with “Mesh/Heat map” and “Mesh/Heat map/Operators/Export heat to Attr Map” processes were used to generate csv files containing cell IDs, their corresponding nuclei IDs, parent (tissue) labels, and cell and nuclei geometric attributes.

Further, we created a process (“Mesh/Nucleus/Select Duplicated Nuclei”) to detect and automatically select nuclei in cells where multiple nuclei were detected. This process was used to detect segmentation errors. Another process (”Mesh/Nucleus/Distance Nuclei”) was implemented to quantify the Euclidean distance between cell centroids and nuclei centroids. We also included a process (“Mesh/Nucleus/Label Nuclei Surface”) to associate 3D segmented nuclei IDs with the cells of curved surface meshes. All these processes are documented within MorphoGraphX (Help/Process Docs). Specific application and minimal guide on the process can be viewed by hovering the mouse over the process.

Proofreading cell segmentation using nuclear segmentation

PlantSeg-tools offers this script for proofreading cell segmentation based on nuclei knowledge (https://github.com/hci-unihd/plant-seg-tools). The method is first described in this manuscript and is part of this study. The cell segmentation will be 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 adjusted to resolve any conflict with the respective nuclear segmentation, thus the accuracy of the nuclei is extremely important. Errors in nuclear segmentation are propagated to cell segmentation. The script is composed of two different subroutines. One for correcting the split errors in cell segmentation and one for fixing the merge mistakes. The split routine checks for each cell whether two or more nuclei (measured as a percentage of the total cell volume) overlap with the cell segmentation by more than a user-defined “threshold-split (t-split)”. If the overlap is above the threshold, the script will use the nuclear segmentation as seed and split the cell using the seeded watershed algorithm. The merge routine checks for each nucleus whether two or more cells (measured as a percentage of the total nucleus volume) overlap a single nucleus segmentation by more than a user-defined “threshold-merge (t-merge)”. If the overlap is above the threshold, the script will merge the cells. The default thresholds provided are 66% for "t-split" and 33% for "t-merge".

Optimized workflow from imaging to segmentation of nuclei dataset

Obtaining confocal Z slices is achievable with a recommended xyz voxel size ranging from 0.12 x 0.12 x 0.25µm³ to 0.25 x 0.25 x 0.25 µm³, ensuring visually identifiable non-oversaturated nuclei signals. For optimal results, we propose imaging with line average ranging from 2 to 5 whenever feasible. Employing microscope objectives with a high numerical aperture (ideally around 1.2 NA or higher) is advised. Nevertheless, both the PlantSeg and the StarDist-ResNet platinum models are quite flexible to the imaging conditions as they were able to process a range of image quality (Table. 3). For nuclei segmentation using the two platinum models, we present GoNuclear (https://github.com/kreshuklab/go-nuclear). GoNuclear comes with the PlantSeg and StarDist-ResNet platinum models. Although the results are comparable, we recommend trying StarDist with the StarDist-ResNet platinum model first, as it is 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 a bit less involved compared to the PlantSeg 3D nuclei segmentation pipeline. GoNuclear can batch process nuclei images and output segmentation can be saved as a tiff/HDF5 file which can be imported into MorphoGraphX. As an alternative, the PlantSeg_3Dnuc_platinum model has been integrated into MorphoGraphX, allowing 3D nuclear predictions to be generated, which can then be 3D segmented using the ITK watershed algorithm, all within MorphoGraphX. MorphoGraphX enables multiple 3D stacks and segmented images to be superimposed on each other, allowing the data sets to be proofread as needed. A 3D nuclei mesh can be created in MorphoGraphX and quantifications can be performed. Numerical results can be exported as a csv file for further processing.

Acknowledgements

We acknowledge support by EMBL IT Services and the Center for Advanced Light Microscopy (CALM) of the TUM School of Life Sciences.

Competing interests
No competing interests declared. Funding

This work was funded by the German Research Council (DFG) through grant FOR2581 (TP3) FAH, (TP7) to KS, (TP8) RSS, (TP9) MT, and (TPZ2) to AK.

Data availability

Information and code for training and inference using PlantSeg, Cellpose, or StarDist, including how to segment new 3D nuclei volumes as mentioned in this study, can be found in the GoNuclear repository: https://github.com/kreshuklab/go-nuclear. Other software can be downloaded at the following links: MorphoGraphX: https://morphographx.org. PlantSeg: https://github.com/hci-unihd/plant-seg. Plantseg-tools: https://github.com/hci-unihd/plant-seg-tools. StarDist: https://github.com/stardist. Cellpose: https://github.com/mouseland/cellpose. We provide the 2 platinum models through the BioImage Model Zoo (https://bioimage.io) for FAIR use through different client tools of our community.

PlantSeg_3Dnuc_platinum: Zenodo ID 0.5281/zenodo.8401064; Zoo name: efficientchipmunk. StarDist3DResnet_3Dnuc_platinum: Zenodo ID: 10.5281/zenodo.8421755; Zoo name: modest-octopus. All datasets used for the figures and the entire bundle of models we trained can be downloaded from BioImage Archive (BIA) (https://www.ebi.ac.uk/bioimage-archive/) (Hartley et al., 2022)/BioStudies (https://www.ebi.ac.uk/biostudies/) (Sarkans et al., 2018) , accession S-BIAD1026. The MorphoGraphX Process “Mesh/Nucleus” is available with version 2.0.2.and above https://morphographx.org. The data used for quantification of the Arabidopsis ovule N/C ratios include the training datasets generated in this study (Biostudies accession S-BIAD1026) and were also obtained from (Vijayan et al., 2021) (BioStudies, accession S-BSST475). The mouse embryo BlastoSPIM data set (Nunley et al., 2023) can be downloaded from the respective website

Bailoni, A., Pape, C., Hütsch, N., Wolf, S., Beier, T., Kreshuk, A. and

Hamprecht, F. A. (2019). GASP, a generalized framework for agglomerative clustering of signed graphs and its application to Instance Segmentation. arXiv [cs.CV].

Barbier de Reuille, P., Routier-Kierzkowska, A.-L., Kierzkowski, D., Bassel, G. W., Schüpbach, T., Tauriello, G., Bajpai, N., Strauss, S., Weber, A., Kiss, A., et al. (2015). MorphoGraphX: A platform for quantifying morphogenesis in 4D. Elife 4, 05864.

Bassel, G. W., Stamm, P., Mosca, G., Barbier de Reuille, P., Gibbs, D. J., 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796

Winter, R., Janka, A., Holdsworth, M. J. and Smith, R. S. (2014).

Mechanical constraints imposed by 3D cellular geometry and arrangement modulate growth patterns in the Arabidopsis embryo. Proc. Natl. Acad. Sci. U. S. A. 111, 8685–8690.

Bink, K., Walch, A., Feuchtinger, A., Eisenmann, H., Hutzler, P., Höfler, H. and Werner, M. (2001). TO-PRO-3 is an optimal fluorescent dye for nuclear conterstaining in dual-colour FISH on paraffin sections. Histochem. Cell Biol. 115, 292–299.

Cantwell, H. and Nurse, P. (2019a). A systematic genetic screen identifies essential factors involved in nuclear size control. PLoS Genet. 15, e1007929. Cantwell, H. and Nurse, P. (2019b). A homeostatic mechanism rapidly corrects aberrant nucleocytoplasmic ratios maintaining nuclear size in fission yeast. J. Cell Sci. 132,.

Cantwell, H. and Nurse, P. (2019c). Unravelling nuclear size control. Curr. Genet. 65, 1281–1285.

Cerise, M., da Silveira Falavigna, V., Rodríguez-Maroto, G., Signol, A.,

Severing, E., Gao, H., van Driel, A., Vincent, C., Wilkens, S., Iacobini, F. R., et al. (2023). Two modes of gene regulation by TFL1 mediate its dual function in flowering time and shoot determinacy of Arabidopsis. Development 150,. Clough, S. J. and Bent, A. F. (1998). Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J. 16, 735–743.

Conklin, E. G. (1912). Cell size and nuclear size. J. Exp. Zool. 12, 1–98. Deviri, D. and Safran, S. A. (2022). Balance of osmotic pressures determines the nuclear-to-cytoplasmic volume ratio of the cell. Proc. Natl. Acad. Sci. U. S. A. 119, e2118301119.

Endress, P. K. (2011). Angiosperm ovules: diversity, development, evolution. Ann. Bot. 107, 1465–1489.

Eschweiler, D., Spina, T. V., Choudhury, R. C., Meyerowitz, E., Cunha, A. and Stegmaier, J. (2019). CNN-based preprocessing to optimize watershedbased cell segmentation in 3D confocal microscopy images. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 223–227.

Federici, F., Dupuy, L., Laplaze, L., Heisler, M. and Haseloff, J. (2012).

Integrated genetic and computation methods for in planta cytometry. Nat. Methods 9, 483–485. 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836

Fernandez, R., Das, P., Mirabet, V., Moscardi, E., Traas, J., Verdeil, J.-L., Malandain, G. and Godin, C. (2010). Imaging plant growth in 4D: robust tissue reconstruction and lineaging at cell resolution. Nat. Methods 7, 547–553. Fridman, Y., Strauss, S., Horev, G., Ackerman-Lavert, M., Reiner-Benaim, A., Lane, B., Smith, R. S. and Savaldi-Goldstein, S. (2021). The root meristem is shaped by brassinosteroid control of cell geometry. Nat Plants 7, 1475–1484. Fulton, L., Batoux, M., Vaddepalli, P., Yadav, R. K., Busch, W., Andersen,

S. U., Jeong, S., Lohmann, J. U. and Schneitz, K. (2009). DETORQUEO,

QUIRKY, and ZERZAUST represent novel components involved in organ development mediated by the receptor-like kinase STRUBBELIG in Arabidopsis thaliana. PLoS Genet. 5, e1000355. 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 development contributes to midblastula transition timing. Curr. Biol. 25, 45–52. Koncz, C. and Schell, J. (1986). The promoter of TL-DNA gene 5 controls the tissue-specific expression of chimaeric genes carried by a novel Agrobacterium binary vector. Mol. Gen. Genet. 204, 383–396.

Kurihara, D., Mizuta, Y., Sato, Y. and Higashiyama, T. (2015). ClearSee: a rapid optical clearing reagent for whole-plant fluorescence imaging. Development 142, 4168–4179.

Lampropoulos, A., Sutikovic, Z., Wenzl, C., Maegele, I., Lohmann, J. U. and Forner, J. (2013). GreenGate---a novel, versatile, and efficient cloning system for plant transgenesis. PLoS One 8, e83043.

Lemière, J., Real-Calderon, P., Holt, L. J., Fai, T. G. and Chang, F. (2022). Control of nuclear size by osmotic forces in Schizosaccharomyces pombe. Elife 11,.

Lora, J., Herrero, M., Tucker, M. R. and Hormaza, J. I. (2017). The

transition from somatic to germline identity shows conserved and specialized features during angiosperm evolution. New Phytol. 216, 495–509. Martignago, D., da Silveira Falavigna, V., Lombardi, A., Gao, H., Korwin Krukowski, P., Galbiati, M., Tonelli, C., Coupland, G. and Conti, L. (2023). The bZIP transcription factor AREB3 mediates FT signalling and floral transition at the Arabidopsis shoot apical meristem. PLoS Genet. 19, e1010766.

Meyer, H. M., Teles, J., Formosa-Jordan, P., Refahi, Y., San-Bento, R., Ingram, G., Jönsson, H., Locke, J. C. W. and Roeder, A. H. K. (2017).

Fluctuations of the transcription factor ATML1 generate the pattern of giant cells in the Arabidopsis sepal. Elife 6,.

Mody TA, Rolle A, Stucki N, Roll F, Bauer U, Schneitz K (2023). Diverse 3D cellular patterns underlie the development of Cardamine hirsuta and Arabidopsis thaliana ovules. bioRxiv.

Montenegro-Johnson, T. D., Stamm, P., Strauss, S., Topham, A. T., Tsagris, M., Wood, A. T. A., Smith, R. S. and Bassel, G. W. (2015). Digital single-cell analysis of plant organ development using 3DCellAtlas. Plant Cell 27, 1018– 1033.

Musielak, T. J., Schenkel, L., Kolb, M., Henschen, A. and Bayer, M. (2015). A simple and versatile cell wall staining protocol to study plant reproduction. Plant Reprod. 28, 161–169.

Nunley, H., Shao, B., Grover, P., Singh, J., Joyce, B., Kim-Yip, R.,

Kohrman, A., Watters, A., Gal, Z., Kickuth, A., et al. (2023). A novel ground truth dataset enables robust 3D nuclear instance segmentation in early mouse embryos. bioRxiv.

Ouedraogo, I., Lartaud, M., Baroux, C., Mosca, G., Delgado, L., Leblanc, O., Verdeil, J.-L., Conéjéro, G. and Autran, D. (2023). 3D cellular 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 morphometrics of ovule primordium development in Zea mays reveal differential division and growth dynamics specifying megaspore mother cell singleness. Front. Plant Sci. 14, 1174171. 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106. Tofanelli, R., Vijayan, A., Scholz, S. and Schneitz, K. (2019). Protocol for rapid clearing and staining of fixed Arabidopsis ovules for improved imaging by confocal laser scanning microscopy. Plant Methods 15, 120.

Ursache, R., Andersen, T. G., Marhavý, P. and Geldner, N. (2018). A

protocol for combining fluorescent proteins with histological stains for diverse cell wall components. Plant J. 93, 399–412.

Vaddepalli, P., de Zeeuw, T., Strauss, S., Bürstenbinder, K., Liao, C.-Y., Ramalho, J. J., Smith, R. S. and Weijers, D. (2021). Auxin-dependent control of cytoskeleton and cell shape regulates division orientation in the Arabidopsis embryo. Curr. Biol.

Van Hooijdonk, C. A., Glade, C. P. and Van Erp, P. E. (1994). TO-PRO-3 iodide: a novel HeNe laser-excitable DNA stain as an alternative for propidium iodide in multiparameter flow cytometry. Cytometry 17, 185–189. Vijayan, A., Tofanelli, R., Strauss, S., Cerrone, L., Wolny, A., Strohmeier, J., Kreshuk, A., Hamprecht, F. A., Smith, R. S. and Schneitz, K. (2021). A digital 3D reference atlas reveals cellular growth patterns shaping the Arabidopsis ovule. Elife 10,.

Weigert, M., Schmidt, U., Haase, R., Sugawara, K. and Myers, G. (2020). Star-convex polyhedra for 3D object detection and segmentation in microscopy. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3655–3662.

Wenzl, C. and Lohmann, J. U. (2023). 3D imaging reveals apical stem cell responses to ambient temperature. Cells Dev 175, 203850.

Wilson, E. B. (1925). The karyoplasmic ratio. In The Cell in Development and Heredity, pp. 727–733. The Macmillan Company.

Wolny, A., Cerrone, L., Vijayan, A., Tofanelli, R., Barro, A. V., Louveaux,

M., Wenzl, C., Strauss, S., Wilson-Sánchez, D., Lymbouridou, R., et al.

(2020). Accurate and versatile 3D segmentation of plant tissues at cellular resolution. Elife 9, e57613.

Yoshida, S., Barbier de Reuille, P., Lane, B., Bassel, G. W., Prusinkiewicz, P., Smith, R. S. and Weijers, D. (2014). Genetic control of plant development by overriding a geometric division rule. Dev. Cell 29, 75–87.

Figures and figure legends

Fig. 1. 3D dataset for model training. (A) 2D section view of TO-PRO-3-stained nuclei in Arabidopsis ovules. (B) 3D nuclear segmentation of weak nuclei stain performed using Cellpose nuclei model. (C) A zoomed-in view displaying the erroneous segmentation. Typical segmentation errors in the nuclei stains segmentation resulting in improper size, shape and number of nuclei. (D) Fluorescent nuclei reporter H2B: tdTomato raw image. (E) 3D Cellpose nuclei model segmentation of the bright tdTomato nuclei fluorescence. (F-I) 2D section view from one of the five training dataset. (F) Weak nuclei channel (TO-PRO-3-stained) used for training. (G) Strong nuclei channel (nuclei reporter H2B: tdTomato) used for generating ground truths. (H) Initial ground truth used for training initial model. 3D nuclear segmentation of the strong nuclei channel performed using the Cellpose nuclei model. (I) Raw cell wall stain, PlantSeg cell boundary predictions and cell segmentation available with the training dataset (from left to right) (J) Illustration of model training strategy. Scale bars: 5µm (A-E); 20 µm (F-I). models. Qualitative comparison displaying the Arabidopsis ovule testing dataset 1135 (N5 dataset) with trained model (Model-5) using four other training datasets. (A) 3D view of ground truth nuclear segmentation. (B) Zoomed 2D section view of raw weak

TO-PRO-3 iodide nuclei stain. (C) Ground truth nuclear segmentation corresponding to the zoomed view in (B). (D-E) PlantSeg predictions and segmentation using the proposed PlantSeg model. (D) 3D PlantSeg GASP segmentation performed using the proposed PlantSeg model. (E) View corresponding to (B) showing PlantSeg nuclei predictions. Top panel: PlantSeg nuclei center predictions. Bottom panel: PlantSeg nuclei envelope prediction from raw data. (F) PlantSeg GASP segmentation of the corresponding section in (B). (G-I) StarDist ResNet nuclei predictions and segmentation using the proposed ResNet model. (G) StarDist ResNet 3D nuclear segmentation performed using the proposed StarDist model. (H) View corresponding to (B) showing StarDist ResNet nuclei predictions. (I) StarDist ResNet nuclear segmentation of the corresponding section in (B). (J-L) StarDist UNet nuclei predictions and segmentation using the proposed UNet model. (J) StarDist UNet 3D nuclear segmentation performed using the proposed StarDist model. (K) View corresponding to (B) showing StarDist UNet nuclei predictions. (I) StarDist UNet nuclear segmentation of the corresponding section in (B). Scale bars: 10μm.

Fig. 3. Wide applicability of trained nuclei segmentation models in segmenting stained or nuclear reporter-expressing different plant organ nuclei imaged under different conditions. (A-C) Antirrhinum majus ovule nuclei stained with TO-PRO-3 iodide, (D-F) Arabidopsis thaliana ovule nuclei stained with DAPI, (G-I) Arabidopsis sepal nuclei expressing the pATML1::mCitrine-ATML1 reporter, (J-L) Cardamine hirsuta leaf nuclei expressing the pChCUC2g::Venus reporter, (M-O) Mouse embryo nuclei expressing the H2B-miRFP720 reporter. (A,D,G,J,M) 3D confocal images of raw nuclei stained with a nuclear stain or expressing nuclear reporter. Raw images have been adjusted for brightness and contrast for depiction. (B,E,H,K,N) 3D nuclear segmented stacks, segmented using the StarDist-ResNet model generated from this study. Nuclei IDs are represented in different colors. (C,F,I,L,O) Overlay of 3D segmented stack with the corresponding MorphoGraphX-generated 3D nuclear mesh. (A-O) Insets with white outline show the zoomed-in view of 3D nuclei. Scale Bars: 10 μm (organs) and 5 μm (insets).

Fig. 4. MorphoGraphX as a platform for mapping 3D nuclei to whole organ cell atlas at single cell and tissue resolution. (A-H) Stage 3-II 3D cell and nuclei meshes for the same ovule sample generated from corresponding segmented stacks. (A) Midsagittal section of 3D mesh showing cell IDs in different colors. (B) Mid-sagittal section of 3D mesh showing cell parent (tissue) labels. (C) Cell-type labeled 3D mesh overlaid with nuclei mesh showing nuclei IDs in different colors. (D) Cell-type labeled 3D mesh overlaid with nuclei mesh showing nuclei lacking parent labels. (E) Cropped section of 3D mesh showing that initially cell IDs are initially independent of nuclei IDs (cells and their corresponding nuclei in different colors). (F) Cropped section and (G) mid-sagittal section of 3D mesh showing cell IDs mapped onto their corresponding nuclei using MorphoGraphX processes, resulting in the same color for cells and their corresponding nuclei. (H) In the final step parent tissue labels of cells are mapped onto the corresponding nuclei in MorphoGraphX. (I) Plot showing N/C ratio of the radial layers, L1, L2, and L3 of stage 2-I ovule primordia. (J-L) Plots showing correlation between nuclear and cell volumes in different layers of stage 2-I primordia along with the respective Pearson correlation coefficients, r. (J) L1, (K) L2, (L) L3. (M) Plot showing nuclear to cell volume ratio (N/C) of different tissues and tissue layers of stage 3-II ovules. (N) Plot showing N/C ratio of the outer layer of the inner integument (ii2) for different stages of ovule development from 2-IV up to 3-V. Asterisks represent statistical significance (ns, p≥0.5; *, p<0.05; **, p<0.01, ***, p<0.001; ****, p<0.0001; Student’s t-test). Scale bars: 10 μm.

Fig. 5. PlantSeg proofreading tools to correct 3D cell segmentation errors. (A-D) Mid-sagittal section Arabidopsis thaliana ovule primordium (dataset 598A, (Vijayan et al., 2021)). (E-H) Cropped section of an Arabidopsis thaliana 3-II ovule (dataset 527, (Vijayan et al., 2021)). (I-L) Mid-sagittal section of a Cardamine parviflora ovule primordium (dataset 1598B, Mody et al., 2023). (A,E,I) 3D cell boundary predictions along with insets showing raw SR2200 (white) and TO-PRO-3 channel (magenta) signals after adjusting for brightness and contrast to show the weak cell wall staining in specific regions (outlined in orange boxes) and resulting in missing or incomplete walls in the cell boundary predictions. (B,F,J) Plant-seg cell segmentations overlaid with cell boundary prediction. Black arrows point to undersegmented cells. (C,G,K) StarDist-segmented nuclei overlaid with cell boundary prediction, showing multiple nuclei in the undersegmented cells in the MMC region (B,J) and in cells of funiculus and chalaza (F). (D,H,L) 3D cell segmentations corrected with PlantSeg proofreading tools (black arrows) and overlaid with the cell boundary prediction. Cardamine parviflora ovule primordia are crassinucellate (K,L); the ability to visualize this is lost after cell segmentation (I,J). PlantSeg proofreading tools enable re-distinguishing the primary parietal cell from the MMC. Scale Bars: 10 μm. 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 human in the loop to train a gold model.

5-fold AP ± STD

Segmentation of the test dataset was performed using each of the listed initial and gold models and the mean average precision is scored for different methods compared to gold ground truth.

5-fold AP ± STD

Trained from scratch 51.26% ± 13.75% Segmentation of the test dataset was performed using each of the listed methods and the mean average precision is scored for different methods compared to a human proofread ground truth. The configuration files used for training can be found along 1063 1064 1065 pFD:3xHAmCHERRY-FD pChCUC2g::Ven us

TO-PRO-3 iodide Ovule DAPI

pATML1::mCitri ne-ATML1

H2B-miRFP720 SPIM Microscopy CLSM, 40x 1.25NA Gly objective, cleared sample CLSM, 16x

0.6NA water dipping objective, live sample

CLSM, 63x 1.3NA Gly objective, cleared sample CLSM, 63x 1.3NA Gly objective, cleared sample CLSM, 20x 1.0NA Water objective

Raw data voxel size 3 (xyz µm )

Autobright;

smooth 3x Downsampled in x, y and unchanged in z to 0.6 x 0.6 x 2 1066 1067 1068 1069 1072 1079 N1 N2 N3 N4 N5

Stage Number of cells/nuclei in the

3-V 3-IV 3-V 3-III 2-II image 3961

Raw image

3 voxel size (xyz, μm ) Datasets represent a confocal 3D z stack of Arabidopsis ovules of different stages. Each dataset is given an ID and a dataset Nr to refer to its use on model training as mentioned in Fig S2A.

Tool used
PlantSeg

1082 1085 1086 1087

1. Supplementary Results 1088 2. Supplementary File One each of the initial, gold, and platinum models can be downloaded from the

Attached as a separate file. Caicedo , J. C. , Goodman , A. , Karhohs , K. W. , Cimini , B. A. , Ackerman , J. , Haghighi , M. , Heng , C. , Becker , T. , Doan , M. , McQuin , C. , et al. ( 2019 ). Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl . Nat. Methods 16 , 1247 - 1253 . Goswami , R. , Asnacios , A. , Milani , P. , Graindorge , S. , Houlné , G. , Mutterer , J. , Hamant , O. and Chabouté , M.-E. ( 2020 ). Mechanical shielding in plant nuclei . Curr. Biol . 30 , 2013 - 2025 . e3 . ( 2021 ). A single-cell morpho-transcriptomic map of brassinosteroid action in the Arabidopsis root . Mol. Plant 14 , 1985 - 1999 . and Ritz , K. ( 2002 ). In situ visualisation of fungi in soil thin sections: problems with crystallisation of the fluorochrome FB 28 (Calcofluor M2R) and improved staining by SCRI Renaissance 2200 . Mycol . Res. 106 , 293 - 297 . and Brazma , A. ( 2022 ). The BioImage Archive - building a home for lifesciences microscopy data . J. Mol. Biol . 434 , 167505 . Harvey , R. and Smith , B. I. ( 2013 ). Megasporogenesis and megagametogenesis of Cardamine parviflora L. (Brassicaceae) . J. Pa. Acad. Sci . 87 , 120 - 124 . Hernandez-Lagana , E. , Mosca , G. , Mendocilla-Sato , E. , Pires , N. , Frey , A. , Giraldo-Fonseca , A. , Michaud , C. , Grossniklaus , U. , Hamant , O. , Godin , C. , et al. ( 2021 ). Organ geometry channels reproductive cell fate in the Arabidopsis ovule primordium . Elife 10 ,. Hertwig , R. ( 1903 ). Ueber die Korrelation von Zell- und Kerngrösse und ihre Bedeutung für die geschlechtliche Differenzierung und die Teilung der Zelle . Biol Centralbl 23 , 49 - 62 . Hirling , D. , Tasnadi , E. , Caicedo , J. , Caroprese , M. V. , Sjögren , R. , Aubreville , M. , Koos , K. and Horvath , P. ( 2023 ). Segmentation metric misinterpretations in bioimage analysis . Nat. Methods. Jenik , P. D. and Irish , V. F. ( 2000 ). Regulation of cell proliferation patterns by homeotic genes during . Development 127 , 1267 - 1276 . Jevtić , P. and Levy , D. L. ( 2015 ). Nuclear size scaling during Xenopus early Pachitariu, M. and Stringer , C. ( 2022 ). Cellpose 2.0: how to train your own model . Nat. Methods 19 , 1634 - 1641 . ( 2017 ). A 3D digital atlas of the Nicotiana tabacum root tip and its use to investigate changes in the root apical meristem induced by the Agrobacterium 6b oncogene . Plant J . 92 , 31 - 42 . Sarkans , U. , Gostev , M. , Athar , A. , Behrangi , E. , Melnichuk , O. , Ali , A. , Minguet , J. , Rada , J. C. , Snow , C. , Tikhonov , A. , et al. ( 2018 ). The BioStudies database-one stop shop for all data supporting a life sciences study . Nucleic Acids Res . 46 , D1266 - D1270 . Satina , S. , Blakeslee , A. F. and Avery , A. G. ( 1940 ). Demonstration of the three germ layers in the shoot apex of Datura by means of induced polyploidy in periclinal chimeras . Am. J. Bot . 27 , 895 - 905 . Schmidt , T. , Pasternak , T. , Liu , K. , Blein , T. , Aubry-Hivet , D. , Dovzhenko , A. , Duerr , J. , Teale , W. , Ditengou , F. A. , Burkhardt , H. , et al. ( 2014 ). The iRoCS Toolbox--3D analysis of the plant root apical meristem at cellular resolution . Plant J . 77 , 806 - 814 . Schmidt , U. , Weigert , M. , Broaddus , C. and Myers , G. ( 2018 ). Cell detection with star-convex polygons . In Medical Image Computing and Computer Assisted Intervention - MICCAI 2018 , pp. 265 - 273 . Cham: Springer International Publishing. Schneitz , K. , Hülskamp , M. and Pruitt , R. E. ( 1995 ). Wild-type ovule development in Arabidopsis thaliana: a light microscope study of cleared wholemount tissue . Plant J . 7 , 731 - 749 . Sommer , C. , Straehle , C. , Kothe , U. and Hamprecht , F. A. ( 2011 ). Ilastik: Interactive learning and segmentation toolkit .pp. 230 - 233 . IEEE. Stegmaier , J. , Amat , F. , Lemon , W. C. , McDole , K. , Wan , Y. , Teodoro , G. , Mikut , R. and Keller, P. J. ( 2016 ). Real-time three-dimensional cell segmentation in large-scale microscopy data of developing embryos . Dev. Cell 36 , 225 - 240 . Strasburger , E. ( 1893 ). Ueber die Wirkungssphäre der Kerne und die Zellgrösse . Histolog. Beiträge 5 ,. Strauss , S. , Runions , A. , Lane , B. , Eschweiler , D. , Bajpai , N. , Trozzi , N. , Routier-Kierzkowska , A.-L. , Yoshida , S. , Rodrigues da Silveira, S. , Vijayan , A. , et al. ( 2022 ). Using positional information to provide context for biological image analysis with MorphoGraphX 2.0 . Elife 11,. Stringer , C. , Wang , T. , Michaelos , M. and Pachitariu , M. ( 2021 ). Cellpose: a