Optimized,Deep,Learning,Model,for,Fire,Semantic,Segmentation

来源:优秀文章 发布时间:2023-01-15 点击:

Songbin Li,Peng Liu,Qiandong Yan and Ruiling Qian

1Institute of Acoustics,Chinese Academy of Sciences,Beijing,100190,China

2Loughborough University,Loughborough,LE11 3TT,United Kingdom

Abstract: Recent convolutional neural networks(CNNs)based deep learning has significantly promoted fire detection.Existing fire detection methods can efficiently recognize and locate the fire.However,the accurate flame boundary and shape information is hard to obtain by them, which makes it difficult to conduct automated fire region analysis, prediction, and early warning.To this end, we propose a fire semantic segmentation method based on Global Position Guidance(GPG)and Multi-path explicit Edge information Interaction (MEI).Specifically, to solve the problem of local segmentation errors in low-level feature space,a top-down global position guidance module is used to restrain the offset of low-level features.Besides, an MEI module is proposed to explicitly extract and utilize the edge information to refine the coarse fire segmentation results.We compare the proposed method with existing advanced semantic segmentation and salient object detection methods.Experimental results demonstrate that the proposed method achieves 94.1%,93.6%, 94.6%, 95.3%, and 95.9% Intersection over Union (IoU)on five test sets respectively which outperforms the suboptimal method by a large margin.In addition,in terms of accuracy,our approach also achieves the best score.

Keywords:Fire semantic segmentation;local segmentation errors;global position guidance;multi-path explicit edge information interaction;feature fusion

Vision-based fire detection is a difficult but particularly important task for public safety.From existing literature,vision-based fire detection methods can be divided into two types.One is to judge whether there is a flame in an image [1-5].The other regards the flame as an object and uses the object detection based method to detect fire[6-8].Compared with the first type,the object detection based fire detection method can not only recognize the existence of fire but also locate the fire.However,it lacks accurate flame edge and shape information which makes it hard to accurately and automatically estimate the fire area.In general, due to the lack of precise area, shape, and location of flame, automated fire intensity analysis, prediction, and early warning are difficult to carry out.Therefore,it is necessary to realize the fire semantic segmentation in an image.

The goal of fire semantic segmentation is to recognize whether the pixel belongs to fire(shown in Fig.1,which is similar to image segmentation tasks.Recently,advances in image processing techniques[9,10]have boosted the state-of-the-art to a new level for many tasks,such as semantic segmentation and salient object detection.However, it is still difficult to accurately resolve flames from a single image.The main reason may be the different backgrounds,multiple scales of fire at different evolving stages, and disturbance by fire-like objects.In this paper, we propose a fire semantic segmentation method based on global position guidance and multi-path explicit edge information interaction.Specifically, to alleviate the problem of local segmentation errors in low-level feature space caused by the disturbance of fire-like objects and background noise,a global position guidance mechanism is proposed.This module uses the accurate top-level position information of top-level features to reconstruct spatial detailed information in a top-down manner.Besides, we employ a multipath explicit edge information interaction module to organically aggregate coarse segmentation results and edge information to refine the fire boundary.In this module, we first explicitly construct edge information extraction through strong supervised learning,and then realize the interaction between edge information and coarse segmentation results through a convolutional layer.

Figure 1:The goal of fire semantic segmentation is to recognize whether the pixel belongs to fire.Each column represents an original image and the corresponding fire semantic segmentation map.The pixels belonging to fire are marked as white,and the others are marked as black

The main contributions of this paper can be summarized as follows:

1)We propose a novel fire semantic segmentation method based on global position guidance and multi-path explicit edge information interaction.The experimental results show that our method achieves 94.7% average IoU on five test sets which outperforms the best semantic segmentation method and salient object detection method by 15.9% and 0.8%, respectively.It demonstrates that our method has better performance on fire segmentation than previous state-of-the-art semantic segmentation and salient object detection methods.

2)In this paper, a global position guidance module is proposed to solve the problem of local segmentation errors in low-level feature space.Besides,a multi-path explicit edge information interaction module based on edge guidance is utilized to organically aggregate coarse segmentation results and edge information to refine the fire boundary.

3)A fire semantic segmentation dataset of 30000 images is established,which is currently the first fire semantic segmentation dataset in this area.This dataset is created by synthesizing the real flame region with normal images.We randomly select 1100 images from[5]and label them to obtain the real flame region.

In this section,we give a summary of related works in Tab.1.Traditional fire detection methods[11-15] mainly focus on handcraft features, such as color, shape, texture, motion, etc.They have some defects, such as lacking robustness, failing to detect fire at a long distance or in a challenging environment, etc.Recent date-driven based deep learning promoted the progress of fire detection.Fire detection methods based on deep learning can be divided into two categories: classificationbased methods[1-5]and object detection-based methods[6-8].Classification-based approaches treat fire detection as an image classification task.These methods can judge whether there is fire in an image, but cannot locate the fire.The object detection-based fire detection methods can not only recognize the existence of fire but also locate the fire.However, the fire position is marked with rectangular boxes.It is unable to provide flame edge and shape information.The goal of fire semantic segmentation is to recognize whether the pixel belongs to fire,which is similar to image segmentation tasks.However, it is difficult to obtain good results by directly applying the existing deep learning based segmentation methods[16-24]to fire detection.These methods are not specially designed for fire semantic segmentation, so the discrimination ability of fire-like objects is relatively weak, and it is difficult to accurately parse the fire boundary.In addition, they have poor performance on local small-scale fires.To this end,we propose a fire semantic segmentation method based on global position guidance and multi-path explicit edge information interaction.The global position guidance mechanism is proposed to alleviate the problem of local segmentation errors in low-level feature space caused by the disturbance of flame-like objects and background noise.It uses the accurate toplevel position information of top-level features to reconstruct spatial detailed information in a topdown manner.Besides,the multipath explicit edge information interaction mechanism is proposed to organically aggregate coarse segmentation results and edge information to refine the fire boundary.

Table 1: Summary of related works

Table 1:Continued

The encoder based on CNN can extract different feature representations.Top-level semantic features preserve precise fire position information.Low-level spatial detail features contain rich fire boundary information.Both of them are vital to fire segmentation.The progressive fusion of different levels of features has a very significant effect on fire segmentation tasks.However,attacked by background noise and flame-like objects, the low-level fire spatial features may arise local segmentation errors.Consequently, the key to improving the performance of fire semantic segmentation is to restrain the offset of low-level spatial features.

As mentioned above,the receptive field of the top-level features is the largest among these encoded features and the fire position information of them is the most accurate.Besides,when the information progressively flows from the top-level to the low level, the accurate position information contained in top-level features is gradually diluted.Thus a top-down global position guidance mechanism to directly deliver top-level position information to low-level feature space to restrain the local segmentation errors is designed.

In this module,the top-level featuresare outputted from the last layer of the encoder.Besides,we define the encoded features fromithlayer as∈(1,t-1).First, two pointwise convolution layers with batch normalization(BN)and ReLU activation function are performed to change the number of channels ofandtoM.Then,a bilinear interpolation function to up-sampleto the same size asThe fused featurescould be denoted as:

where (ωi,bi)and (ωt,bt)are the kernel weight and bias ofandrespectively,Upstands for up-sample,⊗means convolution operation and [...] means concatenation.Next, a same pointwise convolution layer is used to squeeze the channel ofintoM.So far,we obtain the relative position attention mapwhich has accurate position information.

To further enhance the representation capability of,we introduce efficient channel attention.The mapis first compressed by a global pooling operationGto obtain the vectorYwhich has global contextual information.

whereW,Hdenotes the width and height of the input respectively.Then,an efficient fully connected layer is utilized to transform the vectorYinto a reconstruction coefficientω.

whereαjrepresents the weight parameters,σis the sigmoid activation function,represents the set of k adjacent channels ofYm, and C is the number of channels.Next, a channel-wise multiplication operation is employed to reconstruct the,

As shown in Fig.2,the baseline without the GPG module has some wrong segmentation.With the GPG module applied,the local segmentation errors are restrained.

Figure 2: The heat map visualization results of baseline and global position guidance module.They demonstrate that the GPG module can effectively restrain the local segmentation errors

Another challenge of fire semantic segmentation is edge prediction.Different from central pixels that have higher prediction accuracy due to the internal consistency of the fire, pixels near the boundary are more prone to be misdetected.The main reasons are as follows.Compared with central pixels, the edge of fire contains less information.Besides, diverse and complex backgrounds will suppress edge information.Therefore,to solve the problem of edge segmentation error caused by lack of flame edge information.we need to explicitly utilize flame edge information.

To achieve this, the edge information of the flame needs to be extracted explicitly.A simple approach is to construct an edge information extraction branch and train it through strong supervised learning.First, we apply the edge extraction algorithm (e.g., Canny, Sobel, and Laplace operator, etc.)to label imageYlabelto obtain the corresponding edge annotationYedge.To explicitly extract the edge information,the output featuresof the last layer of the decoder are inputted into the edge information extraction branch.This branch consists of a 3×3 convolution layer, a batch normalization,and an activation function.The edge information Iedgecould be denoted as:

whereωeandberepresent the kernel parameters and bias respectively.∅means activation function.Then,we use three loss functions to train them,

whereGr,candIr,cmean the fire edge confidence of the ground truth and prediction map respectively,μxandμyrepresent the average value of prediction and ground truth respectively,σ*means the variance.C1andC2are two small constants.

After the complementary fire edge information is obtained, we aim to aggregate flame edge information and flame object features to achieve information interaction.It is useful for obtaining better flame semantic segmentation results.The decoded features(flame object features)are defined as∈(1,l).Then,the information interaction can be denoted as:

wherestands for refined results.

Algorithm 1:Multi-path Explicit Edge Information Interaction Input:coarse results F(i)d ,i ∈(1,l);edge information Iedge Output:refined fire prediction map Oi d 1: if explicit edge extraction then 2:Iedge ←F(Fld)3:return Iedge 4: if edge information interaction then 5:while i=1;i ≤l;i ←i+1 do 6:Fi images/BZ_772_1001_1812_1020_1858.png, Up(Iedge)7:Oid ←Convimages/BZ_772_888_1871_907_1917.pngimages/BZ_772_907_1871_925_1917.pngFid,Iedged,Iedge ←Upimages/BZ_772_923_1812_942_1858.pngF(i)images/BZ_772_1055_1871_1073_1917.pngimages/BZ_772_1073_1871_1091_1917.png8: return{Oid|i=1,......,l}d

Based on the above ideas,we design a fire semantic segmentation network based on global position guidance and multi-path explicit edge information interaction.The overview of the proposed model is illustrated in Fig.3.It consists of a deep encoder, four global position guidance modules with feature fusion operation, an explicit edge information extraction module, and a multi-path explicit edge information interaction module.The input imageXis fed into the encoder[5]to obtain encoded features,

Figure 3: The overview architecture of the global position guidance and multi-path explicit edge information interaction based fire semantic segmentation networks

It is worth noting that the encoder includes three main parts,namely multi-scale feature extraction,implicit deep supervision, and channel attention mechanism.First, to establish a good feature foundation for the high-level semantic feature and global position information extraction, a multiscale feature extraction module is used.

whereA∈RC×H×Wis the input feature,hk×kmeans the convolution operation with a kernel size ofk×k,andBis the output.Then,three densely connected structures[25]which permit the gradient to flow directly to earlier layers are employed to enhance the feature representation capability.At last,the channel attention widely used in computer vision tasks is utilized.The process of it can be described as:

whereois the final output,means the input,is a vector that includes the global information.ω2andω1are the corresponding weight matrixes.xlbis a reconstruction vector.

When the encoded featureis captured, we use a convolution layer to squeeze the channel of top-level featureinto 256.Then,the featureis fed into the GPG module to restrain the local segmentation errors of low-level feature space.Besides, we aggregate the information progressively from the top level to the low level like the U-Net architecture [26] through a simple feature fusion operation.At last,as mentioned in Section 4,an MEI module is used to refine the coarse segmentation results.The cross-entropy loss based supervision is applied to train the whole network.It can be represented as:

where L represents the total loss,Oidis the fire prediction map,andjis the number of categories.Gstands for the ground truth,αandθare the weight coefficient.

In this section,we first introduce the dataset and evaluation metrics.Then we present the implementation details.Next,a series of ablation studies are conducted to verify the effect of each module.Finally,we carry out reasonable experiments on our created dataset to evaluate the performance of the proposed method.Experimental results demonstrate that our method achieves the best performance compared with the existing semantic segmentation and salient object detection methods.

6.1 Dataset and Evaluation Metrics

In this paper, we create a fire semantic segmentation dataset (FSSD)which consists of 30000 synthetic images and 1100 real fire images.The generation of the dataset is described as follows.First,we randomly select 1100 images from datasets[5]and label them carefully.Then,we extract the real flame region and synthesize them with normal images to create the dataset.Finally,1000 images are used to generate training datasets,and 100 images are used to generate testing datasets.Some real fire images and synthetic images are shown in Fig.4.In this paper, 26000 images are used for training(25000 synthetic images and 1000 real images).Besides, we divide the test images into five test sets(each includes 1000 images).To improve the performance of fire semantic segmentation,we use the dataset[5](except for the 1000 images used to extract the real flame region)to pre-train the encoders of all comparison methods.

Figure 4: Some visual examples of our created fire semantic segmentation dataset.Each column represents an original image and the corresponding annotation

We use three measurements to evaluate all methods.Mean Absolute Error(MAE)is described as the average pixel-wise absolute difference between the prediction map and the ground truth.Therefore,the mathematical formula of the MAE can be expressed as:

wherePdenotes the fire semantic segmentation map,Gis the corresponding ground truth.Interaction over Union (IoU)is widely used in semantic segmentation [27] to evaluate the performance of the algorithm.It represents the degree of overlap between the prediction map and the ground truth.The IoU can be computed by

The third evaluation metric is accuracy,which is defined as the ratio of the number of correctly predicted images(The IoU threshold is set to 0.4)to the total images.The accuracy can be illustrated as:

whereMindicates the images correctly predicted,Nis the total images.

6.2 Implementation Details

In this paper,we adopt EFDNet[5]pre-trained on FSSD(only for encoder)as our backbone.In the training stage,we resize each image to 320×320 with random flipping,then randomly crop a patch with the size of 288×288 for training.We utilize Pytorch to implement our method.The Adaptive moment estimation is applied to optimize the whole parameters of the network with a batch size of 8.The hyperparameter values are shown in Tab.2,referring to the settings in[5].To avoid the model failing into suboptimal, we adopt the “poly”learning rate policy with the initial learning rate 1e-5 for the backbone and 0.001 for the other parts to train our model.Like[21],the maximum iterative epoch of all methods is set to 30.

Table 2: Hyperparameter values

6.3 Ablation Study

In this section,to investigate the effect of the proposed GPG and MEI modules,a series of ablation studies are performed.As illustrated in Tab.3,the baseline which does not contain any optimization achieves 0.008%and 88.3%in terms of MAE and IoU,respectively.With the GPG module applied,both IoU and MAE are improved, where the MAE score is decreased by 50.0% compared with the baseline.The IoU of GPG is 91.5%which outperforms the baseline by 3.2%demonstrating that the idea of using top-level accurate position information to restrain the local fire segmentation errors is very efficient.Besides,when we aggregate MEI and GPG,the performance of the proposed approach is enhanced further.In terms of MAE,the final model achieves 0.002 which brings a 50.0%improvement compared with the baseline.It also outperforms GPG.Furthermore,the final model improves the IoU from 91.5%to 94.1%based on GPG.

Table 3: The quantitative results of the ablation experiment with different components on the DS01

6.4 Compared with Existing Deep Learning Based Segmentation Methods

In this section, to demonstrate the performance of our method, 9 segmentation methods (5 semantic segmentation methods[16-20]and 4 salient object detection methods[21-24])are compared.For a fair comparison, the fire semantic segmentation results of different methods are obtained by running their released codes under the default parameters.Moreover, we pre-train all encoders on FSSD.

The quantitative comparison results on our created benchmark are illustrated in Tabs.4 and 5.Compared with other methods, our method achieves the best performance.In terms of MAE, the proposed method achieves a better performance on five test sets which outperforms the other methods by a large margin.The IoU evaluation metric is widely used in the semantic segmentation task.Our method improves it from 93.2%to 94.1%on DS01.Besides,we use accuracy as an evaluation metric for image-level fire detection.From the results,we can see that our method achieves an accuracy of 96.2%which outperforms other methods by a large margin(ThresholdTis set to 0.6).

Table 4: The quantitative comparison results with existing semantic segmentation methods on the FSSD dataset.The best result of each evaluation metric is highlighted in boldface

Table 5: The quantitative comparison results with existing semantic segmentation methods on the FSSD dataset.The best result of each evaluation metric is highlighted in boldface

To comprehensively compare the performance of different methods,we present some visual results of different methods.As illustrated in Fig.5,our method has a better performance than the previous semantic segmentation methods.Specifically, the proposed method not only highlights the correct fire regions clearly but also well suppresses the background noises.Besides, it is robust in dealing with flame-like objects(row 1)and low contrast background(row 4).Moreover,compared with other methods,the fire boundary generated by the proposed method is more accurate.

Figure 5: Some visual results of different methods.Each row stands for one original image and corresponding fire semantic segmentation maps.Each column represents the predictions of one method

6.5 Analysis of Model Parameters

In this subsection, we analyze the parameters of different methods.The results are illustrated in Tab.6.We can see that the proposed method has only 6.9 MB parameters which is suitable for resource-constrained devices.Compared with the suboptimal method,it decreases 72.9%.

Table 6: The parameter size of different methods

In this paper,a method based on global position guided and multi-path explicit edge information interaction is proposed for fire semantic segmentation.First, existing literature shows that it is challenging to accurately separate the fire from diverse backgrounds and flame-like objects.To this end,considering the accurate position information contained in top-level features,we propose a global position guidance module to restrain the feature offset in low-level feature space thereby correcting the local segmentation errors.Besides,to further get more accurate boundary prediction,we first explicitly extract the edge information through strong supervision.Then,a multi-path information interaction is designed to refine the coarse segmentation.Experimental results on FSSD datasets show that the proposed method outperforms previous state-of-the-art methods under three evaluation metrics.

In the future work,we intend to introduce multitask learning to further improve the performance of the model and multi-scale feature extraction to deal with small flame segmentation.Besides, the fast and small model which can be easily implemented on resource-limited mobile devices will be also considered.

Funding Statement:This work was supported in part by the Important Science and Technology Project of Hainan Province under Grant ZDKJ2020010,and in part by Frontier Exploration Project Independently Deployed by Institute of Acoustics,Chinese Academy of Sciences under Grant QYTS202015 and Grant QYTS202115.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

推荐访问:Learning Deep Optimized
上一篇:Breast,Cancer,Detection,in,Saudi,Arabian,Women,Using,Hybrid,Machine,Learning,on,Mammographic,Images
下一篇:Handling,Big,Data,in,Relational,Database,Management,Systems

Copyright @ 2013 - 2018 优秀啊教育网 All Rights Reserved

优秀啊教育网 版权所有