Caffe时间(3) 实践FCN-AlexNet(一)

用灰度图像训练FCN(一步一步从训练到验证带你实践FCN)

首先我们假定已经成功编译Caffe,并且已经拥有了caffe的Python接口,如果这些没有做好,下面很难成功,配置的话主要还是环境变量的问题,在这里也是遇到了很大的麻烦,关于环境变量怎么配置之后再写一篇说明。

遇到的问题:

首先记录一下可能遇到的一些问题,遇到问题不要慌,多谷歌、多分析、多和别人交流问题就可迎刃而解:

发一个有情链接,几个比较好的Caffe讨论群吧,

  1. status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR Check failure stack trace: Aborted (core dumped) 报这个错,大概是因为当前的CuDNN和CUDA版本不匹配造成的,重新更新一下版本大概可以解决。
  2. warning: libOpenCV_core.so.3.1, needed by //usr/local/lib/libopencv_imgcodecs.so, may conflict with libopencv_core.so.2.4 报这个错是opencv版本冲突,使用命令 sudo apt-get autoremove libopencv-dev 一般可以解决。
  3. Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR 这个问题一般都是因为输入除了问题,请检查一下输入的数据集的格式,一定保证是三通道的,如果是2值单通道的灰度图像可以选择使用matlab进行转换,具体可以参照之前的一篇博客

发一个有情链接,几个比较好的Caffe讨论群吧

  • ​453955686 Caffe 深度学习交流群3
  • 560233379 图像语义分割
  • 287096310 AI人工智能机器学习
  • 255482257 Caffe人工智能Tensorflow
  • 91734887 caffe深度学习交流群

参考链接:

传送门→ 主要参考链接

本篇主要记录的是输入是灰度图片的情况,下一篇再补充输入是RGB图像的情况以及遇到的一些问题。

输入图片是之前一篇文章“对Ultra Sound图像进行预处理”中处理过后的图片,

文件结构:

这里使用的model是官方提供的voc-fcn-alexnet,附上链接→fcn.berkeleyvision.org

对于其它模型大同小异,大家按照实际情况进行更改就可以

还是先看一下文件的结构↓

对文件进行配置:

附图,看一下train集合下的img和cls,val集不做展示(也是差不多的结构)↓

train.txt和val.txt

看一下train.txt和val.txt(第一列的数字是gedit显示的行号,不用理会)

1
2
3
4
5
6
7
8
9
#生成train.txt的代码,这里因为都是数字所以直接就是往文件里面写数字,另一种方法,下一篇中会说明
f=open('train.txt','w')
for i in range(1,10):
f.write('00000'+str(i)+'\n')
for i in range(10,100):
f.write('0000'+str(i)+'\n')
for i in range(100,148):
f.write('000'+str(i)+'\n')
f.close()
1
2
3
4
5
#生成val.txt的代码
f=open('val.txt','w')
for i in range(295,311):
f.write('000'+str(i)+'\n')
f.close()

solver.prototxt

先来配置 solver.prototxt 这里面放的主要是训练过程中的一些参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#下面2行按照自己的情况配置,还有snaapshot的位置也要进行修改,其它的值我没有改动
train_net: "/home/deep/fcn.berkeleyvision.org-master/voc-fcn-alexnet/train.prototxt"
test_net: "/home/deep/fcn.berkeleyvision.org-master/voc-fcn-alexnet/val.prototxt"
test_iter: 736
# make test net, but don't invoke it from the solver itself
test_interval: 999999999
display: 20
average_loss: 20
lr_policy: "fixed"
# lr for normalized softmax
base_lr: 1e-4
# standard momentum
momentum: 0.9
# gradient accumulation
iter_size: 20
max_iter: 100000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "/home/deep/fcn.berkeleyvision.org-master/voc-fcn-alexnet/snapshot/train"
test_initialization: false

train.prototxt

具体说明看注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
layer {
name: "data"
type: "Python"
top: "data"
top: "label"
python_param {
module: "voc_layers"
layer: "SBDDSegDataLayer"
param_str: "{\'sbdd_dir\': \'/home/deep/fcn.berkeleyvision.org-master/voc-fcn-alexnet/Mimage\', \'seed\': 1337, \'split\': \'train\', \'mean\': (125.7807, 125.7807, 125.7807)}" #第一个参数填写测试数据的根目录,如果按上方导图中的结构就像代码中一样填写,其余情况请自行更改,seed这个值没有改动,split后面要填写train(指向train集的根目录),mean的顺序要按BGR的顺序传入,这个在后面的文件中有说明
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
pad: 100
kernel_size: 11
group: 1
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
stride: 1
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 1
stride: 1
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
stride: 1
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
stride: 1
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "Convolution"
bottom: "pool5"
top: "fc6"
convolution_param {
num_output: 4096
pad: 0
kernel_size: 6
group: 1
stride: 1
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "Convolution"
bottom: "fc6"
top: "fc7"
convolution_param {
num_output: 4096
pad: 0
kernel_size: 1
group: 1
stride: 1
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "score_fr_n" #add _n 这里要重新命名,防止直接利用原来的参数,我们要用新的这个配置来初始化网络,这里加了一个"_n"
type: "Convolution"
bottom: "fc7"
top: "score_fr"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 2 #21->2 因为这里我做的是二分类问题,所以输出改成2,按自己情况进行修改,你是几类就写几
pad: 0
kernel_size: 1
}
}
layer {
name: "upscore_n" #这里也要改名
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 2 #21->2 这里也该成了2
bias_term: false
kernel_size: 63
stride: 32
}
}
layer {
name: "score"
type: "Crop"
bottom: "upscore"
bottom: "data"
top: "score"
crop_param {
axis: 2
offset: 18
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "label"
top: "loss"
loss_param {
ignore_label: 255
normalize: true
}
}

val.prototxt

具体说明看注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
layer {
name: "data"
type: "Python"
top: "data"
top: "label"
python_param {
module: "voc_layers"
layer: "VOCSegDataLayer"
param_str: "{\'voc_dir\': \'/home/deep/fcn.berkeleyvision.org-master/voc-fcn-alexnet/Mimage\', \'seed\': 1337, \'split\': \'val\', \'mean\': (126.0610, 126.0610, 126.0610)}" #这里改动和上面类似,一定看清楚配置到路径!!鉴于rgb三个通道一样,所以mean值都相同
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
pad: 100
kernel_size: 11
group: 1
stride: 4
engine: CAFFE
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
stride: 1
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 1
stride: 1
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
stride: 1
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
stride: 1
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "Convolution"
bottom: "pool5"
top: "fc6"
convolution_param {
num_output: 4096
pad: 0
kernel_size: 6
group: 1
stride: 1
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "Convolution"
bottom: "fc6"
top: "fc7"
convolution_param {
num_output: 4096
pad: 0
kernel_size: 1
group: 1
stride: 1
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "score_fr_n" #add _n 这里改名
type: "Convolution"
bottom: "fc7"
top: "score_fr"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 2 #21->2 输出改成2
pad: 0
kernel_size: 1
}
}
layer {
name: "upscore_n" #记得改名
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 2 #21->2 输出改成2
bias_term: false
kernel_size: 63
stride: 32
}
}
layer {
name: "score"
type: "Crop"
bottom: "upscore"
bottom: "data"
top: "score"
crop_param {
axis: 2
offset: 18
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "score"
bottom: "label"
top: "loss"
loss_param {
ignore_label: 255
normalize: true
}
}

voc_layer.py

这个文件是FCN的数据层,我们通过这个文件将数据进行连接,具体说明看注释

这里主要分为两个部分,一个是VOCSegDataLayer,这个类用来指向测试数据。还有一个SBDDSegDataLayer指向训练数据。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
import caffe

import numpy as np
from PIL import Image

import random
#test
class VOCSegDataLayer(caffe.Layer):
"""
Load (input image, label image) pairs from PASCAL VOC
one-at-a-time while reshaping the net to preserve dimensions.

Use this to feed data to a fully convolutional network.
"""

def setup(self, bottom, top):
"""
Setup data layer according to parameters:

- voc_dir: path to PASCAL VOC year dir
- split: train / val / test
- mean: tuple of mean values to subtract
- randomize: load in random order (default: True)
- seed: seed for randomization (default: None / current time)

for PASCAL VOC semantic segmentation.

example

params = dict(voc_dir="/path/to/PASCAL/VOC2011",
mean=(104.00698793, 116.66876762, 122.67891434),
split="val")
"""
# config
params = eval(self.param_str)
self.voc_dir = params['voc_dir']
self.split = params['split']
self.mean = np.array(params['mean'])
self.random = params.get('randomize', True)
self.seed = params.get('seed', None)

# two tops: data and label
if len(top) != 2:
raise Exception("Need to define two tops: data and label.")
# data layers have no bottoms
if len(bottom) != 0:
raise Exception("Do not define a bottom.")

# load indices for images and labels
split_f = '{}/{}.txt'.format(self.voc_dir,
self.split)
self.indices = open(split_f, 'r').read().splitlines()
self.idx = 0

# make eval deterministic
if 'train' not in self.split:
self.random = False

# randomization: seed and pick
if self.random:
random.seed(self.seed)
self.idx = random.randint(0, len(self.indices)-1)


def reshape(self, bottom, top):
# load image + label image pair
self.data = self.load_image(self.indices[self.idx])
self.label = self.load_label(self.indices[self.idx])
# reshape tops to fit (leading 1 is for batch dimension)
top[0].reshape(1, *self.data.shape)
top[1].reshape(1, *self.label.shape)


def forward(self, bottom, top):
# assign output
top[0].data[...] = self.data
top[1].data[...] = self.label

# pick next input
if self.random:
self.idx = random.randint(0, len(self.indices)-1)
else:
self.idx += 1
if self.idx == len(self.indices):
self.idx = 0


def backward(self, top, propagate_down, bottom):
pass


def load_image(self, idx):
"""
Load input image and preprocess for Caffe:
- cast to float
- switch channels RGB -> BGR #在这里我们也可以看到是将图片按照RGB顺序读入,然后转换成BGR,所已我们之前在val.prototxt传入的均值要按照BGR的顺序来写
- subtract mean
- transpose to channel x height x width order
"""
im = Image.open('{}/val/img/{}.png'.format(self.voc_dir, idx))#这里配置val的原始图片路径,按照自己的路径进行配置即可,如果按照上面导图中的来安排,则是按这样书写
in_ = np.array(im, dtype=np.float32)
in_ = in_[:,:,::-1] #这句话的意思是3维分割数组 前2维不变 最后一维倒序,python中切片的知识
in_ -= self.mean
in_ = in_.transpose((2,0,1))
return in_


def load_label(self, idx):
"""
Load label image as 1 x height x width integer array of label indices.
The leading singleton dimension is required by the loss.
"""
im = Image.open('{}/val/cls/{}.png'.format(self.voc_dir, idx))#这里也是配置val的标签文件
label = np.array(im, dtype=np.uint8)
label = label[np.newaxis, ...]
return label

#train
class SBDDSegDataLayer(caffe.Layer):
"""
Load (input image, label image) pairs from the SBDD extended labeling
of PASCAL VOC for semantic segmentation
one-at-a-time while reshaping the net to preserve dimensions.

Use this to feed data to a fully convolutional network.
"""

def setup(self, bottom, top):
"""
Setup data layer according to parameters:

- sbdd_dir: path to SBDD `dataset` dir
- split: train / seg11valid
- mean: tuple of mean values to subtract
- randomize: load in random order (default: True)
- seed: seed for randomization (default: None / current time)

for SBDD semantic segmentation.

N.B.segv11alid is the set of segval11 that does not intersect with SBDD.
Find it here: https://gist.github.com/shelhamer/edb330760338892d511e.

example

params = dict(sbdd_dir="/path/to/SBDD/dataset",
mean=(104.00698793, 116.66876762, 122.67891434),
split="valid")
"""
# config
params = eval(self.param_str)
self.sbdd_dir = params['sbdd_dir']
self.split = params['split']
self.mean = np.array(params['mean'])
self.random = params.get('randomize', True)
self.seed = params.get('seed', None)

# two tops: data and label
if len(top) != 2:
raise Exception("Need to define two tops: data and label.")
# data layers have no bottoms
if len(bottom) != 0:
raise Exception("Do not define a bottom.")

# load indices for images and labels
split_f = '{}/{}.txt'.format(self.sbdd_dir, #这里和上面一样,不赘述
self.split)
self.indices = open(split_f, 'r').read().splitlines()
self.idx = 0

# make eval deterministic
if 'train' not in self.split:
self.random = False

# randomization: seed and pick
if self.random:
random.seed(self.seed)
self.idx = random.randint(0, len(self.indices)-1)


def reshape(self, bottom, top):
# load image + label image pair
self.data = self.load_image(self.indices[self.idx])
self.label = self.load_label(self.indices[self.idx])
# reshape tops to fit (leading 1 is for batch dimension)
top[0].reshape(1, *self.data.shape)
top[1].reshape(1, *self.label.shape)


def forward(self, bottom, top):
# assign output
top[0].data[...] = self.data
top[1].data[...] = self.label

# pick next input
if self.random:
self.idx = random.randint(0, len(self.indices)-1)
else:
self.idx += 1
if self.idx == len(self.indices):
self.idx = 0


def backward(self, top, propagate_down, bottom):
pass


def load_image(self, idx):
"""
Load input image and preprocess for Caffe:
- cast to float
- switch channels RGB -> BGR
- subtract mean
- transpose to channel x height x width order
"""
im = Image.open('{}/train/img/{}.png'.format(self.sbdd_dir, idx)) #传入train的原始图片
in_ = np.array(im, dtype=np.float32)
in_ = in_[:,:,::-1]
in_ -= self.mean
in_ = in_.transpose((2,0,1))
return in_


def load_label(self, idx):
"""
Load label image as 1 x height x width integer array of label indices.
The leading singleton dimension is required by the loss.
"""
im = Image.open('{}/train/cls/{}.png'.format(self.sbdd_dir, idx))#传入train的label
label = np.array(im, dtype=np.uint8)
label = label[np.newaxis, ...]
return label

solve.py

这个文件也是最后我们需要运行的脚本,相当于Caffe在命令行模式下的train.sh文件,详情看注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import caffe
import surgery, score

import numpy as np
import os
import sys

try:
import setproctitle
setproctitle.setproctitle(os.path.basename(os.getcwd()))
except:
pass

weights = '/home/deep/fcn.berkeleyvision.org-master/ilsvrc-nets/fcn-alexnet-pascal.caffemodel' #这里我们导入原始模型,所以其实模型放在那里都可以,只要将目录写正确就可以

# init
#caffe.set_device(int(sys.argv[1]))
#caffe.set_device(int(0))
caffe.set_mode_gpu() #这里使用GPU进行运算,
#caffe.set_mode_cpu()

solver = caffe.SGDSolver('solver.prototxt')
solver.net.copy_from(weights)

# surgeries
interp_layers = [k for k in solver.net.params.keys() if 'up' in k] #这里可以解释为什么要在train.prototxt和val.prototxt改名的问题
surgery.interp(solver.net, interp_layers)

# scoring
val = np.loadtxt('/home/deep/fcn.berkeleyvision.org-master/voc-fcn-alexnet/Mimage/val/val.txt', dtype=str) #这里传入的一定是val的.txt文件!

for _ in range(25):
solver.step(4000)
score.seg_tests(solver, False, val, layer='score')

最后在voc-fcn-alexnet下开启终端输入python solve.py就开始训练吧!

测试生成的模型

在测试这里我们就需要往voc-fcn-alexnet文件夹下面再加如几个文件,在之前的基础上加入infer.py文件、deploy.prototxt文件,这两个文件。其中infer.py可以在fcn.berkeleyvision.org-master下面找到,deploy.prototxt需要我们自己创建。

infer.py

先来修改infer.py文件,修改之后可以实现读入一张测试图片,然后用我们在原始模型上fine-tuning得到的model输出一张该测试图片的ground图。详情见注释。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
from PIL import Image

import caffe
import cv2 #这里引入CV2

# load image, switch to BGR, subtract mean, and make dims C x H x W for Caffe
im = Image.open('/home/deep/fcn.berkeleyvision.org-master/000308.png')#此处填写需要测试的图片的路径
in_ = np.array(im, dtype=np.float32)
in_ = in_[:,:,::-1]
in_ -= np.array((125.92085,125.92085,125.92085))#测试图片的均值BGR传入
in_ = in_.transpose((2,0,1))

# load net
net = caffe.Net('/home/deep/fcn.berkeleyvision.org-master/voc-fcn-alexnet/deploy.prototxt', '/home/deep/fcn.berkeleyvision.org-master/voc-fcn-alexnet/snapshot/train_iter_100000.caffemodel', caffe.TEST)
# shape for input (data blob is N x C x H x W), set data #这里提示我们NCHW分别是
#N:即batch_size
#C:即通道数,channels
#H:即每一个通道的高,height
#W:即每一个通道的宽,width
net.blobs['data'].reshape(1, *in_.shape)
net.blobs['data'].data[...] = in_
# run net and take argmax for prediction
net.forward()
out = net.blobs['score'].data[0].argmax(axis=0)
#添加最后这两行将生成的图片保存下来
result = np.expand_dims(np.array((-255.) * (out-1.)).astype(np.float32), axis = 2)
cv2.imwrite("result308.png", result)

deploy.prototxt

这个文件需要我们自己创建,我们只需要复制一下train.prototxt或者val。prototxt再进行一些修改就可以了。

总结来说的过程就是“换头去尾、改中间”。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
#首先去掉原来.prototxt文件中的数据层,换成下面的Input层,因为这里我只要输入一张图进行测试
layer{
name: "input"
type: "Input"
top: "data"
input_param{
shape{dim: 1 dim: 3 dim: 440 dim: 501}#这里主要改后面两个dim分别是图片的Height和Width
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
pad: 100
kernel_size: 11
group: 1
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
stride: 1
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 1
stride: 1
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
stride: 1
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
stride: 1
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "Convolution"
bottom: "pool5"
top: "fc6"
convolution_param {
num_output: 4096
pad: 0
kernel_size: 6
group: 1
stride: 1
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
#通过官方给出的的voc-fcn8s的那个完整的demo可以看到在deploy.prototxt文件中还删除了所有的drop层,在fcn-alexnet中一共有两处,我门对应删除就可以了
layer {
name: "fc7"
type: "Convolution"
bottom: "fc6"
top: "fc7"
convolution_param {
num_output: 4096
pad: 0
kernel_size: 1
group: 1
stride: 1
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
#这里也删除了一个drop层
layer {
name: "score_fr_n" #这里也记得改名
type: "Convolution"
bottom: "fc7"
top: "score_fr"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 2
pad: 0
kernel_size: 1
}
}
layer {
name: "upscore_n" #这里也要像之前的.prototxt文件一样记得改名
type: "Deconvolution"
bottom: "score_fr"
top: "upscore"
param {
lr_mult: 0
}
convolution_param {
num_output: 2
bias_term: false
kernel_size: 63
stride: 32
}
}
layer {
name: "score"
type: "Crop"
bottom: "upscore"
bottom: "data"
top: "score"
crop_param {
axis: 2
offset: 18
}
}
#可以看到删除了原来的loss层

在terminal中执行 python infer.py就可以看到输出的结果了(在此过程中遇到的问题请参考下面“笔者在参考博客下面的留言“这部分)。

通过test发现最终标记的还是不够理想,因为数据量还是太少了。所以请继续看下一篇文章,继续对FCN进行训练。

笔者在参考博客下面的留言

是在最上面的那篇博客下面的回复,也是记录了当初自己解决问题的一些历程,希望大家可以顺利的跑出实验结果

自我回复,应该是解决了,对于输出是白色图片的问题,最终感觉是迭代次数不够造成的,因为着急看看model的效果,所以在模型一边训练的过程中就用中间snapshot的生成的中间模型做了测试,但是经过了19个小时吧,10w次迭代全部完成,我再使用最终的模型测试的时候发现就是正常的了。
所以总结起来大概要注意一下几点,希望帮到大家:

  • 要将最后的那layer的num_output由原来的21转变成2(即最终的分类数目,如果是3类就改成3,以此类推)
  • deconvolution层一定要像楼主写的一样,要改名,但是别删除up
  • deploy文件中也要如上面第2点中一样对deconvolution层改名,此外对照官方给出的voc-fcn8的demo还发现deploy文件除了改变了第一层为Input、将最后一层删除还将Dropout层都删除了,所以同样记得将其修改,其它网络同样适用)
  • 最好是等到模型全跑完用最后迭代出来的model做测试(幸亏耐心等到了最后,其实我的model跑到一半我看测试的结果不好都想要放弃了,耐心,耐心)
  • p.s.然后发现了一个软件teamviewer,就让机子放一晚上自己跑去吧,自己可以远程用这个软件在其他地方远程查看结果

这些大概就是自己的一些新的与体会,希望大家都能成功训练出自己满意的模型,最后还是非常感谢博主这篇文章,Thanks♪(・ω・)ノ

希望大家多多支持 XD

坚持原创分享,您的支持将鼓励我继续创作!