OpenPose源码阅读¶
OpenPose的定位是,骨架识别,(connected) keypoints。方法层面类似Yolo和SSD,是一次预测,不是MaskRCNN那样逐渐细化,所以速度制胜,精度差一点点。
论文:OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,17年论文,引用1.3w次,看完很受启发,可直接跳转总结章节。
代码:pytorch版本, 官方openpose
1、数据集¶
coco2017数据整理方式
$ cat person_keypoints_val2017.json|jq '.|keys'|less
[
"annotations",
"categories",
"images",
"info",
"licenses"
]
image
对象重点就是id
和path
$ cat person_keypoints_val2017.json|jq '.|.images[]'|less
{
"license": 4,
"file_name": "000000397133.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
"height": 427,
"width": 640,
"date_captured": "2013-11-14 17:02:52",
"flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
"id": 397133
}
这个文件只有一种category
,是分类的meta信息,不是具体标注
$ cat person_keypoints_val2017.json|jq '.|.categories'
[
{
"supercategory": "person",
"id": 1,
"name": "person",
"keypoints": [
"nose",
"left_eye",
"right_eye",
"left_ear",
"right_ear",
"left_shoulder",
"right_shoulder",
"left_elbow",
"right_elbow",
"left_wrist",
"right_wrist",
"left_hip",
"right_hip",
"left_knee",
"right_knee",
"left_ankle",
"right_ankle"
],
"skeleton": [
[
16,
14
],
[
14,
12
],
...
[
5,
7
]
]
}
]
重点信息在annotation
字段中
$ cat person_keypoints_val2017.json|jq '.|.annotations[]|keys'|less
[
"area",
"bbox",
"category_id",
"id",
"image_id",
"iscrowd",
"keypoints",
"num_keypoints",
"segmentation"
]
$ cat person_keypoints_val2017.json|jq '.|.annotations[0]'
{
"segmentation": [
[
125.12,
539.69,
140.94,
522.43,
100.67,
496.54,
...
145.26,
567.01,
117.93,
551.19,
133.75,
532.49
]
],
"num_keypoints": 10,
"area": 47803.27955,
"iscrowd": 0,
"keypoints": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
142,
309,
1,
177,
320,
2,
191,
398,
2,
237,
317,
2,
233,
426,
2,
306,
233,
2,
92,
452,
2,
123,
468,
2,
0,
0,
0,
251,
469,
2,
0,
0,
0,
162,
551,
2
],
"image_id": 425226,
"bbox": [
73.35,
206.02,
300.58,
372.5
],
"category_id": 1,
"id": 183126
}
segmentation
是多边形对人体的描边,不用管,这个keypoints
是人体关键点,51个点,reshape之后就是(17, 3)
,17个关键点,每个点为(x, y, weight)
,weight=0
表示没标记,1表示遮挡,2表示可见标注,1和2是可用的。
如果一张图有多个人,那么会有多个annotation
对象,每个里面都包含上面的信息,靠image_id
来关联到同一张图片。这种扁平化结构之前没有想到,我以为是按照图片来组织。好处是清晰,就像一棵树,也可以表示成node list,文件不用复杂嵌套。在代码中恢复成一棵树。
测试数据中annotation.id
一共11004个,对应annotation.image_id
一共2693个。平均每张图4个人。
用pyCOCO读取上面category/image/annotation
的代码如下,
class CocoKeypoints(torch.utils.data.Dataset):
"""`MS Coco Detection <http://mscoco.org/dataset/#detections-challenge2016>`_ Dataset.
Based on `torchvision.dataset.CocoDetection`.
Caches preprocessing.
Args:
root (string): Root directory where images are downloaded to.
annFile (string): Path to json annotation file.
transform (callable, optional): A function/transform that takes in an PIL image
and returns a transformed version. E.g, ``transforms.ToTensor``
target_transform (callable, optional): A function/transform that takes in the
target and transforms it.
"""
def __init__(self, root, annFile, image_transform=None, target_transforms=None,
n_images=None, preprocess=None, all_images=False, all_persons=False, input_y=368, input_x=368, stride=8):
from pycocotools.coco import COCO
self.root = root
self.coco = COCO(annFile)
self.cat_ids = self.coco.getCatIds(catNms=['person'])
if all_images:
self.ids = self.coco.getImgIds()
elif all_persons:
self.ids = self.coco.getImgIds(catIds=self.cat_ids)
else:
self.ids = self.coco.getImgIds(catIds=self.cat_ids)
self.filter_for_keypoint_annotations()
if n_images:
self.ids = self.ids[:n_images]
print('Images: {}'.format(len(self.ids)))
def filter_for_keypoint_annotations(self):
print('filter for keypoint annotations ...')
def has_keypoint_annotation(image_id):
ann_ids = self.coco.getAnnIds(imgIds=image_id, catIds=self.cat_ids)
anns = self.coco.loadAnns(ann_ids)
for ann in anns:
if 'keypoints' not in ann:
continue
if any(v > 0.0 for v in ann['keypoints'][2::3]):
return True
return False
self.ids = [image_id for image_id in self.ids
if has_keypoint_annotation(image_id)]
print('... done.')
2、标签制作¶
核心逻辑在这里,看怎么生成关键点热度图和躯干向量场。
2.1 dataset.__getitem__
¶
class CocoKeypoints(torch.utils.data.Dataset):
...
def __getitem__(self, index):
"""
Args:
index (int): Index
Returns:
tuple: Tuple (image, target). target is the object returned by ``coco.loadAnns``.
"""
image_id = self.ids[index]
# 获取同一张图片的所有annotations的ids
ann_ids = self.coco.getAnnIds(imgIds=image_id, catIds=self.cat_ids)
# ids -> json
anns = self.coco.loadAnns(ann_ids)
anns = copy.deepcopy(anns)
image_info = self.coco.loadImgs(image_id)[0]
self.log.debug(image_info)
with open(os.path.join(self.root, image_info['file_name']), 'rb') as f:
image = Image.open(f).convert('RGB')
meta_init = {
'dataset_index': index,
'image_id': image_id,
'file_name': image_info['file_name'],
}
image, anns, meta = self.preprocess(image, anns, None)
json解析成np.array
class Normalize(Preprocess):
@staticmethod
def normalize_annotations(anns):
anns = copy.deepcopy(anns)
# convert as much data as possible to numpy arrays to avoid every float
# being turned into its own torch.Tensor()
for ann in anns:
ann['keypoints'] = np.asarray(ann['keypoints'], dtype=np.float32).reshape(-1, 3)
ann['bbox'] = np.asarray(ann['bbox'], dtype=np.float32)
ann['bbox_original'] = np.copy(ann['bbox'])
del ann['segmentation']
return anns
def __call__(self, image, anns, meta):
anns = self.normalize_annotations(anns)
if meta is None:
w, h = image.size
meta = {
'offset': np.array((0.0, 0.0)),
'scale': np.array((1.0, 1.0)),
'valid_area': np.array((0.0, 0.0, w, h)),
'hflip': False,
'width_height': np.array((w, h)),
}
return image, anns, meta
读入后的anns
为列表,每个形如
{
'num_keypoints': 9,
'area': 14853.30435,
'iscrowd': 0,
'keypoints': array([[ 199.37424, 249.82243, 2. ], [ 201.70392, 251.37383, 2. ..., 17.88785, 0. ]], dtype=float32),
'image_id': 327701,
'bbox': array([126.61331 , 225.02682 , 119.978905, 118.364204], dtype=float32),
'category_id': 1,
'id': 444736,
'bbox_original': array([329.16, 266.89, 154.5 , 152.59], dtype=float32),
'valid_area': array([ 0., 18., 368., 332.])
}
2.2 关键点热度图¶
图片image及图中所有人物标记anns
,生成关键点heatmap和躯干paf。
heatmap的逻辑是,17个关键点+脖子+背景,一共18个关键点,19维,每个关键点生成一张图,叠加了图中所有人的相同部位。生成热度图的维度是(46, 46, 19)
class CocoKeypoints(torch.utils.data.Dataset):
def get_ground_truth(self, anns):
grid_y = int(self.input_y / self.stride)
grid_x = int(self.input_x / self.stride) # 这步是先把原300尺寸的图,降低到backbone之后的(46,46)的小图上标注
channels_heat = (self.HEATMAP_COUNT + 1)
channels_paf = 2 * len(self.LIMB_IDS)
heatmaps = np.zeros((int(grid_y), int(grid_x), channels_heat))
pafs = np.zeros((int(grid_y), int(grid_x), channels_paf))
keypoints = []
for ann in anns:
single_keypoints = np.array(ann['keypoints']).reshape(17,3)
single_keypoints = self.add_neck(single_keypoints)
keypoints.append(single_keypoints)
keypoints = np.array(keypoints)
keypoints = self.remove_illegal_joint(keypoints)
# confidance maps for body parts
for i in range(self.HEATMAP_COUNT):
joints = [jo[i] for jo in keypoints]#每一种关节点
for joint in joints:#遍历每一个点
if joint[2] > 0.5:#1是标注被遮挡 2是标注且没被遮挡
center = joint[:2]#点坐标
gaussian_map = heatmaps[:, :, i]
heatmaps[:, :, i] = putGaussianMaps(
center, gaussian_map,
7.0, grid_y, grid_x, self.stride)
# paf
...
# background
heatmaps[:, :, -1] = np.maximum(
1 - np.max(heatmaps[:, :, :self.HEATMAP_COUNT], axis=2),
0.
)
return heatmaps, pafs
Gauss核函数的热度图,用0-1截断。
def putGaussianMaps(center, accumulate_confid_map, sigma, grid_y, grid_x, stride):
start = stride / 2.0 - 0.5
y_range = [i for i in range(int(grid_y))]
x_range = [i for i in range(int(grid_x))]
xx, yy = np.meshgrid(x_range, y_range)
xx = xx * stride + start
yy = yy * stride + start
d2 = (xx - center[0]) ** 2 + (yy - center[1]) ** 2
exponent = d2 / 2.0 / sigma / sigma
mask = exponent <= 4.6052
cofid_map = np.exp(-exponent)
cofid_map = np.multiply(mask, cofid_map)
accumulate_confid_map += cofid_map # 多个点会叠加的
accumulate_confid_map[accumulate_confid_map > 1.0] = 1.0
return accumulate_confid_map
2.3 躯干向量场¶
轮训组成躯干的(centerA, centerB)
,如果有点没标注,就continue。每个人体设置有19个躯干,所以维度是(46, 46, 19, 2)
,2
是因为向量图保存的是(x, y)
坐标。程序里当成reshape(46, 46, 38)
来处理的。预估一个“场”这个想法有点意思,有点像解微分方程的方法。
class CocoKeypoints(torch.utils.data.Dataset):
...
# pafs
for i, (k1, k2) in enumerate(self.LIMB_IDS):
# limb
count = np.zeros((int(grid_y), int(grid_x)), dtype=np.uint32) # 表示该位置是否被计算了多次(计算的数量)
for joint in keypoints:
if joint[k1, 2] > 0.5 and joint[k2, 2] > 0.5:
centerA = joint[k1, :2]
centerB = joint[k2, :2]
vec_map = pafs[:, :, 2 * i:2 * (i + 1)] #每一个躯干位置,选择x和y两个方向
pafs[:, :, 2 * i:2 * (i + 1)], count = putVecMaps(
centerA=centerA,
centerB=centerB,
accumulate_vec_map=vec_map,
count=count, grid_y=grid_y, grid_x=grid_x, stride=self.stride
)
热度图在多个人物时候,是用sum来aggregate,躯干矩形是向量图,如果多个人物的躯干经过同一个点,则该点处的向量为过这个点向量的平均值。
def putVecMaps(centerA, centerB, accumulate_vec_map, count, grid_y, grid_x, stride):
centerA = centerA.astype(float)
centerB = centerB.astype(float)
thre = 1 # limb width
centerB = centerB / stride #映射到特征图中
centerA = centerA / stride
limb_vec = centerB - centerA
norm = np.linalg.norm(limb_vec)#求范数
if (norm == 0.0):
# print 'limb is too short, ignore it...'
return accumulate_vec_map, count
limb_vec_unit = limb_vec / norm #单位向量
# print 'limb unit vector: {}'.format(limb_vec_unit)
# To make sure not beyond the border of this two points
min_x = max(int(round(min(centerA[0], centerB[0]) - thre)), 0)# 得到所有可能区域
max_x = min(int(round(max(centerA[0], centerB[0]) + thre)), grid_x)
min_y = max(int(round(min(centerA[1], centerB[1]) - thre)), 0)
max_y = min(int(round(max(centerA[1], centerB[1]) + thre)), grid_y)
range_x = list(range(int(min_x), int(max_x), 1))
range_y = list(range(int(min_y), int(max_y), 1))
xx, yy = np.meshgrid(range_x, range_y)
ba_x = xx - centerA[0] # the vector from (x,y) to centerA 根据位置判断是否在该区域上(分别得到X和Y方向的)
ba_y = yy - centerA[1]
limb_width = np.abs(ba_x * limb_vec_unit[1] - ba_y * limb_vec_unit[0]) # 每个点在躯干的法向量上的投影距离,也就是宽度
mask = limb_width < thre # mask is 2D # 小于阈值的表示在该区域上
vec_map = np.copy(accumulate_vec_map) * 0.0 #本次计算
vec_map[yy, xx] = np.repeat(mask[:, :, np.newaxis], 2, axis=2)
vec_map[yy, xx] *= limb_vec_unit[np.newaxis, np.newaxis, :] #在该区域上的都用对应的方向向量表示(根据mask结果表示是否在)
mask = np.logical_or.reduce(
(np.abs(vec_map[:, :, 0]) > 0, np.abs(vec_map[:, :, 1]) > 0)) #在特征图中(46*46)中 哪些区域是该躯干所在区域
accumulate_vec_map = np.multiply(
accumulate_vec_map, count[:, :, np.newaxis]) #每次返回的accumulate_vec_map都是平均值,现在还原成实际值
accumulate_vec_map += vec_map # 加上当前关键点位置形成的向量
count[mask == True] += 1 # 该区域计算次数都+1
mask = count == 0
count[mask == True] = 1 # 没有被计算过的地方就等于自身(因为一会要除法)
accumulate_vec_map = np.divide(accumulate_vec_map, count[:, :, np.newaxis])# 算平均向量
count[mask == True] = 0 # 还原回去
return accumulate_vec_map, count
这块代码最有意思的就是,如何获取一个长方形,方法是计算方向的法向量,然后将向量坐标投影到法向量上,就是模长了。
3、训练¶
至此,train_loader
可以生成每张图片,及对应的label,也就是heatmap和paf。之后,传入模型进行forward,经过backbone之后image转化成为和label尺寸相同的(46, 46, 19)
和(46, 46, 38)
。最后,就可以计算L2 loss了(MSELoss
)
img = img.to(device)
heatmap_target = heatmap_target.to(device)
paf_target = paf_target.to(device)
# compute output
_,saved_for_loss = model(img)
total_loss, saved_for_log = get_loss(saved_for_loss, heatmap_target, paf_target)
# compute gradient and do SGD step
optimizer.zero_grad()
total_loss.backward()
optimizer.step()
saved_for_loss
保存了6组转化后的特征图,除了backbone以外,还有经过stride=1
的Conv的另外5层网络结果,尺寸保持了相同,但是可视野逐渐增加。
4、预测¶
这部分我感觉是最难的,看代码前我带的困惑是,
- 如何能从热度图还原成点,难道是topK么,就不怕都取到一个gauss的圈里吗,还是用Kmean之类算出K个中心?
- 匈牙利算法匹配是否要求头的数量和脖子数量相同,如果上面一步没法做到怎么办,比如半身图中看不到这个人的脚
- 匈牙利算法的边,用的是两个点在预估的paf场上的投影长度,这块代码怎么写
答案
- 使用的图像工具
max_filter
模糊化取局部极值,即使超过K也无所谓,后续匹配过程会过滤掉次优的中心。 - 匈牙利算法不要求方阵,可以有空着的人没有任务,也可以有空着的任务没人接。
其他疑惑,
- 预测时,图片不一定是方块图,结果是短边缩短到了46(对应原图先把短边缩短到368,长边对应缩放)
- 训练时分层输出了loss,预测的19d和38d都是取模型最后一层的,前面的只计算loss用
# model.forward
>>> predicted_outputs[0].shape
torch.Size([1, 38, 94, 46])
>>> predicted_outputs[1].shape
torch.Size([1, 19, 94, 46])
# reshape, H, W, C
>>> heatmap.shape
(94, 46, 19)
>>> paf.shape
(94, 46, 38)
4.1 demo¶
- 倒置会导致识别不到,估计训练集没涉及
- 缺少的部位不会识别点和边,不会出错;
- 遮挡但存在的部位却能正确识别,很厉害,不知道匈牙利是怎么处理的
- 也有识别错的,比如我的右手手腕,远处行人的手腕
4.2 模型预估热度图可视化¶
直观感受下模型的端到端的结果,首先是关键点热度图,
# plot 18 headmaps using subplots
num_joins = heatmap.shape[-1] - 1
fig = plt.figure(figsize=(8, 10))
for i in range(1, num_joins+1):
fig.add_subplot(6, 3, i)
plt.imshow(heatmap[:, :, i])
点预估的还是比较精确的,方差不大,挺意外的,没有像VAE那样加高斯的限制,只靠l2的Loss就能把预估结果限制得这么好了。
然后,画出PAF向量x轴上的分量,
# plot 19 pafs using subplots
num_pafs = paf.shape[-1] // 2
fig = plt.figure(figsize=(8, 10))
for i in range(1, num_pafs+1):
fig.add_subplot(6, 4, i)
plt.imshow(paf[:, :, i])
躯干预估的矩形也很清晰了,矩形的宽度能看到,估计也是loss限制出来的。缺失的躯干在这一步也就会不预测,就像没识别到的人一样。如此,通过积分之后设定一个阈值,就可以保留可信躯干了(猜测)。
4.3 热度图到点集合¶
Non Maxima Suppression <=> 正着来说,就是找到热度图的所有局部极大值,所以NMS只是一个过程名,不是个算法名
def NMS(heatmaps, upsampFactor=1., bool_refine_center=True, bool_gaussian_filt=False, config=None):
"""
NonMaximaSuppression: find peaks (local maxima) in a set of grayscale images
:param heatmaps: set of grayscale images on which to find local maxima (3d np.array,
with dimensions image_height x image_width x num_heatmaps)
:param upsampFactor: Size ratio between CPM heatmap output and the input image size.
Eg: upsampFactor=16 if original image was 480x640 and heatmaps are 30x40xN
:param bool_refine_center: Flag indicating whether:
- False: Simply return the low-res peak found upscaled by upsampFactor (subject to grid-snap)
- True: (Recommended, very accurate) Upsample a small patch around each low-res peak and
fine-tune the location of the peak at the resolution of the original input image
:param bool_gaussian_filt: Flag indicating whether to apply a 1d-GaussianFilter (smoothing)
to each upsampled patch before fine-tuning the location of each peak.
:return: a NUM_JOINTS x 4 np.array where each row represents a joint type (0=nose, 1=neck...)
and the columns indicate the {x,y} position, the score (probability) and a unique id (counter)
"""
# MODIFIED BY CARLOS: Instead of upsampling the heatmaps to heatmap_avg and
# then performing NMS to find peaks, this step can be sped up by ~25-50x by:
# (9-10ms [with GaussFilt] or 5-6ms [without GaussFilt] vs 250-280ms on RoG
# 1. Perform NMS at (low-res) CPM's output resolution
# 1.1. Find peaks using scipy.ndimage.filters.maximum_filter
# 2. Once a peak is found, take a patch of 5x5 centered around the peak, upsample it, and
# fine-tune the position of the actual maximum.
# '-> That's equivalent to having found the peak on heatmap_avg, but much faster because we only
# upsample and scan the 5x5 patch instead of the full (e.g.) 480x640
joint_list_per_joint_type = []
cnt_total_joints = 0
# For every peak found, win_size specifies how many pixels in each
# direction from the peak we take to obtain the patch that will be
# upsampled. Eg: win_size=1 -> patch is 3x3; win_size=2 -> 5x5
# (for BICUBIC interpolation to be accurate, win_size needs to be >=2!)
win_size = 2
for joint in range(config.MODEL.NUM_KEYPOINTS):
map_orig = heatmaps[:, :, joint]
peak_coords = find_peaks(config.TEST.THRESH_HEATMAP, map_orig)
peaks = np.zeros((len(peak_coords), 4))
for i, peak in enumerate(peak_coords):
if bool_refine_center:
x_min, y_min = np.maximum(0, peak - win_size)
x_max, y_max = np.minimum(
np.array(map_orig.T.shape) - 1, peak + win_size)
# Take a small patch around each peak and only upsample that
# tiny region
patch = map_orig[y_min:y_max + 1, x_min:x_max + 1]
map_upsamp = cv2.resize(
patch, None, fx=upsampFactor, fy=upsampFactor, interpolation=cv2.INTER_CUBIC)
# Gaussian filtering takes an average of 0.8ms/peak (and there might be
# more than one peak per joint!) -> For now, skip it (it's
# accurate enough)
map_upsamp = gaussian_filter(
map_upsamp, sigma=3) if bool_gaussian_filt else map_upsamp
# Obtain the coordinates of the maximum value in the patch
location_of_max = np.unravel_index(
map_upsamp.argmax(), map_upsamp.shape)
# Remember that peaks indicates [x,y] -> need to reverse it for
# [y,x]
location_of_patch_center = compute_resized_coords(
peak[::-1] - [y_min, x_min], upsampFactor)
# Calculate the offset wrt to the patch center where the actual
# maximum is
refined_center = (location_of_max - location_of_patch_center)
peak_score = map_upsamp[location_of_max]
else:
refined_center = [0, 0]
# Flip peak coordinates since they are [x,y] instead of [y,x]
peak_score = map_orig[tuple(peak[::-1])]
peaks[i, :] = tuple(
x for x in compute_resized_coords(peak_coords[i], upsampFactor) + refined_center[::-1]) + (
peak_score, cnt_total_joints)
cnt_total_joints += 1
joint_list_per_joint_type.append(peaks)
return joint_list_per_joint_type
调用的函数,
from scipy.ndimage.filters import gaussian_filter, maximum_filter
from scipy.ndimage.morphology import generate_binary_structure
def find_peaks(param, img):
"""
Given a (grayscale) image, find local maxima whose value is above a given
threshold (param['thre1'])
:param img: Input image (2d array) where we want to find peaks
:return: 2d np.array containing the [x,y] coordinates of each peak found
in the image
"""
peaks_binary = (maximum_filter(img, footprint=generate_binary_structure(
2, 1)) == img) * (img > param)
# Note reverse ([::-1]): we return [[x y], [x y]...] instead of [[y x], [y
# x]...]
return np.array(np.nonzero(peaks_binary)[::-1]).T
peak_coords
array([[32, 18],
[41, 21],
[52, 22],
[ 7, 26]])
max_filter
和gaussian_filter
都是图像变换,分别max卷积和gauss卷积,下面的图片展示了局部极大值是怎么找到的,
小结一下
find_peaks
做的事是,回答了最开始的问题,如何从高斯热度图还原回中心点
- 用二值化,
0.1
阈值将热度图转换为黑白,黑的置0,白的值不动 - 现成的
max_filter
进行模糊,平抑模型直接预估的结果,像CRF对NER的作用。 - 选取模糊后与模糊前值相同的位置,基本就是局部最高值了
这样处理之后还是会在同一个区域出现多个点,不一定是局部最高值,NMS函数做的事是
- 分而治之,用上一步定位的局部区域,再圈出一个扩展的方块,
- 在方块里做一次gaussion模糊,然后取argmax的位置,认为是中心点
- 可能还是会一个人的关节识别成多个点,后面怎么处理?=> 即使头识别出3个,脖子只有2个,那么PAF场的匹配也只会让脖子找到最佳的那个头,抑制掉相邻的另一个头(见下一小节)
>>> len(joint_list_per_joint_type)
18
>>> joint_list_per_joint_type[1]
array([[254. , 164. , 0.9705171 , 4. ],
[331. , 183. , 0.91956604, 5. ],
[428. , 203. , 0.9459275 , 6. ],
[ 60. , 214. , 0.89996499, 7. ]])
>>> joint_list_per_joint_type[2]
array([[244. , 159. , 0.89307928, 8. ],
[321. , 179. , 0.88130075, 9. ],
[412. , 204. , 0.89099789, 10. ],
[ 54. , 214. , 0.93888676, 11. ]])
>>> joint_list_per_joint_type[3]
array([[218. , 157. , 0.92710817, 12. ],
[308. , 160. , 0.88837695, 13. ],
[403. , 170. , 0.70073223, 14. ],
[ 51. , 221. , 0.77465415, 15. ],
[414. , 233. , 0.71171707, 16. ]])
这里能看出来joint_3
就识别了5个峰值点,其他都是识别到了4个人的位置。
关节点识别完,就该生成两种关节点之间的最优匹配了,比如type2和3之间,应该怎么连,传入PAF预估
4.4 关键点到链接¶
pafprocess.process_paf(joint_list, heatmap_upsamp, paf_upsamp)
>>> joint_list.shape
(1, 74, 5)
>>> heatmap_upsamp.shape
(368, 616, 19)
>>> paf_upsamp.shape
(368, 616, 38)
joint_list
是1张图片的74个关键点,每个关键点是个tuple(5)
表示,像上面出现过的,x, y, score, rank, part_id
process_paf
是一个cpp程序,我们看下
vector <Peak> peak_infos_line;
const int NUM_PART = 18;
#define PEAKS(i, j, k) peaks[k+p3*(j+p2*i)]
#define HEAT(i, j, k) heatmap[k+h3*(j+h2*i)]
#define PAF(i, j, k) pafmap[k+f3*(j+f2*i)] // 这些是文字替换功能,从flatten的矩阵中取出index位置的数
int process_paf(int p1, int p2, int p3, float *peaks, int h1, int h2, int h3, float *heatmap, int f1, int f2, int f3,
float *pafmap) {
vector <Peak> peak_infos[NUM_PART];
int peak_cnt = 0;
for (int img_id = 0; img_id < p1; img_id++){
for (int peak_index = 0; peak_index < p2; peak_index++) {
Peak info;
info.id = peak_cnt++;
info.x = PEAKS(img_id, peak_index, 0);
info.y = PEAKS(img_id, peak_index, 1);
info.score = PEAKS(img_id, peak_index, 2);
int part_id = PEAKS(img_id, peak_index, 4);
peak_infos[part_id].push_back(info);
}
}
peak_infos_line.clear();
for (int part_id = 0; part_id < NUM_PART; part_id++) {
for (int i = 0; i < (int) peak_infos[part_id].size(); i++) {
peak_infos_line.push_back(peak_infos[part_id][i]);
}
}
首先就是解析joint_list
的tuple为Peak
对象,变成一个peak_infos
字典(实际是列表),peak_infos_line
就是flatten了,变成rank->Peak的字典,方便后面直接根据rank_id
取。
接下来
// Start to Connect
for (int pair_id = 0; pair_id < COCOPAIRS_SIZE; pair_id++) {
vector <ConnectionCandidate> candidates;
vector <Peak> &peak_a_list = peak_infos[COCOPAIRS[pair_id][0]];
vector <Peak> &peak_b_list = peak_infos[COCOPAIRS[pair_id][1]];
// 这里就是取limb的两个joint,然后把两层的joint_list取出来,准备做之间的匹配
if (peak_a_list.size() == 0 || peak_b_list.size() == 0) {
continue;
}
for (int peak_a_id = 0; peak_a_id < (int) peak_a_list.size(); peak_a_id++) {
Peak &peak_a = peak_a_list[peak_a_id];
for (int peak_b_id = 0; peak_b_id < (int) peak_b_list.size(); peak_b_id++) {
Peak &peak_b = peak_b_list[peak_b_id];
// 例如,joint2和joint3之间,轮训 4 * 5次
// calculate vector(direction)
VectorXY vec;
vec.x = peak_b.x - peak_a.x;
vec.y = peak_b.y - peak_a.y;
float norm = (float) sqrt(vec.x * vec.x + vec.y * vec.y);
if (norm < 1e-12) continue;
vec.x = vec.x / norm;
vec.y = vec.y / norm;
vector <VectorXY> paf_vecs = get_paf_vectors(pafmap, COCOPAIRS_NET[pair_id][0],
COCOPAIRS_NET[pair_id][1], f2, f3, peak_a, peak_b);
float scores = 0.0f;
// criterion 1 : score treshold count
int criterion1 = 0;
for (int i = 0; i < STEP_PAF; i++) {
float score = vec.x * paf_vecs[i].x + vec.y * paf_vecs[i].y;
scores += score;
if (score > THRESH_VECTOR_SCORE) criterion1 += 1;
}
float criterion2 = scores / STEP_PAF + min(0.0, 0.5 * h1 / norm - 1.0);
// 根据后面的python代码,这里是两个条件
// 1. 格子店上的点,至少80%的点超过阈值
// 2. 均值(打压过长的长度之后)> 0
// 满足则把一对joint放入召回候选,均值当成匈牙利算法的weight
if (criterion1 > THRESH_VECTOR_CNT1 && criterion2 > 0) {
ConnectionCandidate candidate;
candidate.idx1 = peak_a_id;
candidate.idx2 = peak_b_id;
candidate.score = criterion2;
candidate.etc = criterion2 + peak_a.score + peak_b.score;
candidates.push_back(candidate);
}
}
}
}
// 这个函数是解析pafmap变成VectorXY: (x, y)对象,ch_id1和ch_id2就是两个joint_id,比如joint2[0]和joint3[1],
// 然后这条线段在paf热度图上的值取出来,理论上应该是一条线上的路径积分,但是这个图实际是离散的,就把格子点上的paf向量取出来就行
// [(x1, y1), (x2, y2), ...]
// 再和joint2[0],joint3[1]构成的向量vec进行内积,求和
vector <VectorXY>
get_paf_vectors(float *pafmap, const int &ch_id1, const int &ch_id2, int &f2, int &f3, Peak &peak1, Peak &peak2) {
vector <VectorXY> paf_vectors;
const float STEP_X = (peak2.x - peak1.x) / float(STEP_PAF);
const float STEP_Y = (peak2.y - peak1.y) / float(STEP_PAF);
for (int i = 0; i < STEP_PAF; i++) {
int location_x = roundpaf(peak1.x + i * STEP_X);
int location_y = roundpaf(peak1.y + i * STEP_Y);
VectorXY v;
v.x = PAF(location_y, location_x, ch_id1);
v.y = PAF(location_y, location_x, ch_id2);
paf_vectors.push_back(v);
}
return paf_vectors;
}
获得两层间的匈牙利算法的指派权重后,计算最佳指派
const int COCOPAIRS_SIZE = 19;
vector <Connection> connection_all[COCOPAIRS_SIZE];
...
// 最终指派结果保存在connection_all里
vector <Connection> &conns = connection_all[pair_id];
// 两层间的召回候选按照lambda x:x.score排序,comp_candidate就是lambda
sort(candidates.begin(), candidates.end(), comp_candidate);
for (int c_id = 0; c_id < (int) candidates.size(); c_id++) {
ConnectionCandidate &candidate = candidates[c_id];
// 从weight高到低,轮训candidate,
bool assigned = false;
// 看candidate的起点和终点是否已经在conns结果里了,
// 如果不在,则增加到conns结果,如果不在就pass
// 因此,假设joint2是4个点,joint3是5个候选,且前面都最佳匹配上了,那剩下的第5个点将因为起点已经在结果中,而丢掉就行了,这回答了上面的问题
for (int conn_id = 0; conn_id < (int) conns.size(); conn_id++) {
if (conns[conn_id].peak_id1 == candidate.idx1) {
// already assigned
assigned = true;
break;
}
if (assigned) break;
if (conns[conn_id].peak_id2 == candidate.idx2) {
// already assigned
assigned = true;
break;
}
if (assigned) break;
}
if (assigned) continue;
Connection conn;
conn.peak_id1 = candidate.idx1;
conn.peak_id2 = candidate.idx2;
conn.score = candidate.score;
conn.cid1 = peak_a_list[candidate.idx1].id;
conn.cid2 = peak_b_list[candidate.idx2].id;
conns.push_back(conn);
论文中使用的是匈牙利算法,这个代码里用的是贪心算法,(论文则是发现差异不大,速度最快)
-
对candidate按照匹配分数倒排
-
看candidate的起点和终点是否已经在
conns
结果里了, - 如果不在,则增加到
conns
结果 - 如果不在就pass
- 因此,假设
joint_2
是4个点,joint_3
是5个候选,且前面都最佳匹配上了,那剩下的第5个点将因为起点已经在结果中,而丢掉就行了,这回答了上面的问题。
P.S.匈牙利算法也可以使用,可以调用scipy.optimize.linear_sum_assignment
还剩下一点,是根据获得的连接,把18个关键点拆分成对应的人,有点复杂,但属于功能性代码,不细看了
vector <vector<float> > subset;
...
// Generate subset
subset.clear();
for (int pair_id = 0; pair_id < COCOPAIRS_SIZE; pair_id++) {
// 轮训limbs
vector <Connection> &conns = connection_all[pair_id];
int part_id1 = COCOPAIRS[pair_id][0]; // 头
int part_id2 = COCOPAIRS[pair_id][1]; // 眼睛
for (int conn_id = 0; conn_id < (int) conns.size(); conn_id++) {
// 轮训一种limb的所有出现的object
int found = 0;
int subset_idx1 = 0, subset_idx2 = 0;
for (int subset_id = 0; subset_id < (int) subset.size(); subset_id++) {
if (subset[subset_id][part_id1] == conns[conn_id].cid1 ||
subset[subset_id][part_id2] == conns[conn_id].cid2) {
if (found == 0) subset_idx1 = subset_id;
if (found == 1) subset_idx2 = subset_id;
found += 1;
}
}
if (found == 1) {
if (subset[subset_idx1][part_id2] != conns[conn_id].cid2) {
subset[subset_idx1][part_id2] = conns[conn_id].cid2;
subset[subset_idx1][19] += 1;
subset[subset_idx1][18] += peak_infos_line[conns[conn_id].cid2].score + conns[conn_id].score;
}
} else if (found == 2) {
int membership = 0;
for (int subset_id = 0; subset_id < 18; subset_id++) {
if (subset[subset_idx1][subset_id] > 0 && subset[subset_idx2][subset_id] > 0) {
membership = 2;
}
}
if (membership == 0) {
for (int subset_id = 0; subset_id < 18; subset_id++)
subset[subset_idx1][subset_id] += (subset[subset_idx2][subset_id] + 1);
subset[subset_idx1][19] += subset[subset_idx2][19];
subset[subset_idx1][18] += subset[subset_idx2][18];
subset[subset_idx1][18] += conns[conn_id].score;
subset.erase(subset.begin() + subset_idx2);
} else {
subset[subset_idx1][part_id2] = conns[conn_id].cid2;
subset[subset_idx1][19] += 1;
subset[subset_idx1][18] += peak_infos_line[conns[conn_id].cid2].score + conns[conn_id].score;
}
} else if (found == 0 && pair_id < 18) {
vector<float> row(20);
for (int i = 0; i < 20; i++) row[i] = -1;
row[part_id1] = conns[conn_id].cid1;
row[part_id2] = conns[conn_id].cid2;
row[19] = 2;
row[18] = peak_infos_line[conns[conn_id].cid1].score +
peak_infos_line[conns[conn_id].cid2].score +
conns[conn_id].score;
subset.push_back(row);
}
}
}
// delete some rows
for (int i = subset.size() - 1; i >= 0; i--) {
if (subset[i][19] < THRESH_PART_CNT || subset[i][18] / subset[i][19] < THRESH_HUMAN_SCORE)
subset.erase(subset.begin() + i);
}
弄好就可以调用如下函数
int get_num_humans() {
return subset.size();
}
int get_part_cid(int human_id, int part_id) {
return subset[human_id][part_id];
}
float get_score(int human_id) {
return subset[human_id][18] / subset[human_id][19];
}
4.5 补充:python版本程序¶
def paf_to_pose(heatmaps, pafs, config):
# Bottom-up approach:
# Step 1: find all joints in the image (organized by joint type: [0]=nose,
# [1]=neck...)
joint_list_per_joint_type = NMS(heatmaps, upsampFactor=config.MODEL.DOWNSAMPLE, config=config)
# joint_list is an unravel'd version of joint_list_per_joint, where we add
# a 5th column to indicate the joint_type (0=nose, 1=neck...)
joint_list = np.array([tuple(peak) + (joint_type,) for joint_type,
joint_peaks in enumerate(joint_list_per_joint_type) for peak in joint_peaks])
# import ipdb
# ipdb.set_trace()
# Step 2: find which joints go together to form limbs (which wrists go
# with which elbows)
paf_upsamp = cv2.resize(
pafs, None, fx=config.MODEL.DOWNSAMPLE, fy=config.MODEL.DOWNSAMPLE, interpolation=cv2.INTER_CUBIC)
connected_limbs = find_connected_joints(paf_upsamp, joint_list_per_joint_type,
config.TEST.NUM_INTERMED_PTS_BETWEEN_KEYPOINTS, config)
# Step 3: associate limbs that belong to the same person
person_to_joint_assoc = group_limbs_of_same_person(
connected_limbs, joint_list, config)
return joint_list, person_to_joint_assoc
匈牙利算法这次是用python实现,如下
def find_connected_joints(paf_upsamp, joint_list_per_joint_type, num_intermed_pts=10, config=None):
"""
For every type of limb (eg: forearm, shin, etc.), look for every potential
pair of joints (eg: every wrist-elbow combination) and evaluate the PAFs to
determine which pairs are indeed body limbs.
:param paf_upsamp: PAFs upsampled to the original input image resolution
:param joint_list_per_joint_type: See 'return' doc of NMS()
:param num_intermed_pts: Int indicating how many intermediate points to take
between joint_src and joint_dst, at which the PAFs will be evaluated
:return: List of NUM_LIMBS rows. For every limb_type (a row) we store
a list of all limbs of that type found (eg: all the right forearms).
For each limb (each item in connected_limbs[limb_type]), we store 5 cells:
# {joint_src_id,joint_dst_id}: a unique number associated with each joint,
# limb_score_penalizing_long_dist: a score of how good a connection
of the joints is, penalized if the limb length is too long
# {joint_src_index,joint_dst_index}: the index of the joint within
all the joints of that type found (eg: the 3rd right elbow found)
"""
connected_limbs = []
# Auxiliary array to access paf_upsamp quickly
limb_intermed_coords = np.empty((4, num_intermed_pts), dtype=np.intp)
for limb_type in range(NUM_LIMBS):
# List of all joints of type A found, where A is specified by limb_type
# (eg: a right forearm starts in a right elbow)
joints_src = joint_list_per_joint_type[joint_to_limb_heatmap_relationship[limb_type][0]]
# List of all joints of type B found, where B is specified by limb_type
# (eg: a right forearm ends in a right wrist)
joints_dst = joint_list_per_joint_type[joint_to_limb_heatmap_relationship[limb_type][1]]
# print(joint_to_limb_heatmap_relationship[limb_type][0])
# print(joint_to_limb_heatmap_relationship[limb_type][1])
# print(paf_xy_coords_per_limb[limb_type][0])
# print(paf_xy_coords_per_limb[limb_type][1])
if len(joints_src) == 0 or len(joints_dst) == 0:
# No limbs of this type found (eg: no right forearms found because
# we didn't find any right wrists or right elbows)
connected_limbs.append([])
else:
connection_candidates = []
# Specify the paf index that contains the x-coord of the paf for
# this limb
limb_intermed_coords[2, :] = paf_xy_coords_per_limb[limb_type][0]
# And the y-coord paf index
limb_intermed_coords[3, :] = paf_xy_coords_per_limb[limb_type][1]
for i, joint_src in enumerate(joints_src):
# Try every possible joints_src[i]-joints_dst[j] pair and see
# if it's a feasible limb
for j, joint_dst in enumerate(joints_dst):
# Subtract the position of both joints to obtain the
# direction of the potential limb
limb_dir = joint_dst[:2] - joint_src[:2]
# Compute the distance/length of the potential limb (norm
# of limb_dir)
limb_dist = np.sqrt(np.sum(limb_dir ** 2)) + 1e-8
limb_dir = limb_dir / limb_dist # Normalize limb_dir to be a unit vector
# Linearly distribute num_intermed_pts points from the x
# coordinate of joint_src to the x coordinate of joint_dst
limb_intermed_coords[1, :] = np.round(np.linspace(
joint_src[0], joint_dst[0], num=num_intermed_pts))
limb_intermed_coords[0, :] = np.round(np.linspace(
joint_src[1], joint_dst[1], num=num_intermed_pts)) # Same for the y coordinate
intermed_paf = paf_upsamp[limb_intermed_coords[0, :],
limb_intermed_coords[1, :], limb_intermed_coords[2:4, :]].T
score_intermed_pts = intermed_paf.dot(limb_dir)
score_penalizing_long_dist = score_intermed_pts.mean(
) + min(0.5 * paf_upsamp.shape[0] / limb_dist - 1, 0)
# Criterion 1: At least 80% of the intermediate points have
# a score higher than thre2
criterion1 = (np.count_nonzero(
score_intermed_pts > config.TEST.THRESH_PAF) > 0.8 * num_intermed_pts)
# Criterion 2: Mean score, penalized for large limb
# distances (larger than half the image height), is
# positive
criterion2 = (score_penalizing_long_dist > 0)
if criterion1 and criterion2:
# Last value is the combined paf(+limb_dist) + heatmap
# scores of both joints
connection_candidates.append(
[i, j, score_penalizing_long_dist,
score_penalizing_long_dist + joint_src[2] + joint_dst[2]])
# Sort connection candidates based on their
# score_penalizing_long_dist
connection_candidates = sorted(
connection_candidates, key=lambda x: x[2], reverse=True)
connections = np.empty((0, 5))
# There can only be as many limbs as the smallest number of source
# or destination joints (eg: only 2 forearms if there's 5 wrists
# but 2 elbows)
max_connections = min(len(joints_src), len(joints_dst))
# Traverse all potential joint connections (sorted by their score)
for potential_connection in connection_candidates:
i, j, s = potential_connection[0:3]
# Make sure joints_src[i] or joints_dst[j] haven't already been
# connected to other joints_dst or joints_src
if i not in connections[:, 3] and j not in connections[:, 4]:
# [joint_src_id, joint_dst_id, limb_score_penalizing_long_dist, joint_src_index, joint_dst_index]
connections = np.vstack(
[connections, [joints_src[i][3], joints_dst[j][3], s, i, j]])
# Exit if we've already established max_connections
# connections (each joint can't be connected to more than
# one joint)
if len(connections) >= max_connections:
break
connected_limbs.append(connections)
return connected_limbs
将识别的链接上的链表进行标号,human_id
def group_limbs_of_same_person(connected_limbs, joint_list, config):
"""
Associate limbs belonging to the same person together.
:param connected_limbs: See 'return' doc of find_connected_joints()
:param joint_list: unravel'd version of joint_list_per_joint [See 'return' doc of NMS()]
:return: 2d np.array of size num_people x (NUM_JOINTS+2). For each person found:
# First NUM_JOINTS columns contain the index (in joint_list) of the joints associated
with that person (or -1 if their i-th joint wasn't found)
# 2nd-to-last column: Overall score of the joints+limbs that belong to this person
# Last column: Total count of joints found for this person
"""
person_to_joint_assoc = []
for limb_type in range(NUM_LIMBS):
joint_src_type, joint_dst_type = joint_to_limb_heatmap_relationship[limb_type]
for limb_info in connected_limbs[limb_type]:
person_assoc_idx = []
for person, person_limbs in enumerate(person_to_joint_assoc):
if person_limbs[joint_src_type] == limb_info[0] or person_limbs[joint_dst_type] == limb_info[1]:
person_assoc_idx.append(person)
# If one of the joints has been associated to a person, and either
# the other joint is also associated with the same person or not
# associated to anyone yet:
if len(person_assoc_idx) == 1:
person_limbs = person_to_joint_assoc[person_assoc_idx[0]]
# If the other joint is not associated to anyone yet,
if person_limbs[joint_dst_type] != limb_info[1]:
# Associate it with the current person
person_limbs[joint_dst_type] = limb_info[1]
# Increase the number of limbs associated to this person
person_limbs[-1] += 1
# And update the total score (+= heatmap score of joint_dst
# + score of connecting joint_src with joint_dst)
person_limbs[-2] += joint_list[limb_info[1]
.astype(int), 2] + limb_info[2]
elif len(person_assoc_idx) == 2: # if found 2 and disjoint, merge them
person1_limbs = person_to_joint_assoc[person_assoc_idx[0]]
person2_limbs = person_to_joint_assoc[person_assoc_idx[1]]
membership = ((person1_limbs >= 0) & (person2_limbs >= 0))[:-2]
if not membership.any(): # If both people have no same joints connected, merge into a single person
# Update which joints are connected
person1_limbs[:-2] += (person2_limbs[:-2] + 1)
# Update the overall score and total count of joints
# connected by summing their counters
person1_limbs[-2:] += person2_limbs[-2:]
# Add the score of the current joint connection to the
# overall score
person1_limbs[-2] += limb_info[2]
person_to_joint_assoc.pop(person_assoc_idx[1])
else: # Same case as len(person_assoc_idx)==1 above
person1_limbs[joint_dst_type] = limb_info[1]
person1_limbs[-1] += 1
person1_limbs[-2] += joint_list[limb_info[1]
.astype(int), 2] + limb_info[2]
else: # No person has claimed any of these joints, create a new person
# Initialize person info to all -1 (no joint associations)
row = -1 * np.ones(config.MODEL.NUM_KEYPOINTS + 2)
# Store the joint info of the new connection
row[joint_src_type] = limb_info[0]
row[joint_dst_type] = limb_info[1]
# Total count of connected joints for this person: 2
row[-1] = 2
# Compute overall score: score joint_src + score joint_dst + score connection
# {joint_src,joint_dst}
row[-2] = sum(joint_list[limb_info[:2].astype(int), 2]
) + limb_info[2]
person_to_joint_assoc.append(row)
# Delete people who have very few parts connected
people_to_delete = []
for person_id, person_info in enumerate(person_to_joint_assoc):
if person_info[-1] < 3 or person_info[-2] / person_info[-1] < 0.2:
people_to_delete.append(person_id)
# Traverse the list in reverse order so we delete indices starting from the
# last one (otherwise, removing item for example 0 would modify the indices of
# the remaining people to be deleted!)
for index in people_to_delete[::-1]:
person_to_joint_assoc.pop(index)
# Appending items to a np.array can be costly (allocating new memory, copying over the array, then adding new row)
# Instead, we treat the set of people as a list (fast to append items) and
# only convert to np.array at the end
return np.array(person_to_joint_assoc)
有点模拟信号转化为数字信号的意思,先用网络近似模拟信号,然后再解码成digital的点坐标和human_id
4.6 补充:匈牙利算法¶
线性整数规划问题,约束条件是整数0/1
>>> import numpy as np
>>> cost = np.array([[4, 1, 3, 1], [2, 0, 5, 2], [3, 2, 2, 2]])
>>> cost
array([[4, 1, 3, 1],
[2, 0, 5, 2],
[3, 2, 2, 2]])
>>> from scipy.optimize import linear_sum_assignment
>>> row_ind, col_ind = linear_sum_assignment(cost)
>>> row_ind
array([0, 1, 2])
>>> col_ind
array([3, 1, 2])
>>> cost[row_ind, col_ind]
array([1, 0, 2])
5、总结¶
- 关于建模
- 模型端到端也不是直接出结果,而是尽量预测出关键状态量,后面总归是要利用状态量做策略的
- 模型拆解真的很棒,从原先物体识别,物体关键点检测的固有思维中跳出来
- 层间匹配的方式,和tracking也是类似的;这里的层是头和脖子,那里是不同帧
- 通过匹配的方式,可以容许模型的关键点个数超过实际,匹配算法会顺便过滤掉多出来的“次优解”,解决了头疼问题
- Backbone之后,图片会缩小,那就在缩小的图上计算label,保证能对齐就行
- 热度图计算点的位置,用L2就相当精准了
- 关于向量场,
- 从结果看
(x, y)
简化成模长值感觉也差不多,毕竟非躯干的位置基本都是0,积分后就能确定一段线段是否是躯干了 - 但是,次优解可能就没法区分开了。比如一个人的头,识别到了相近的A,B位置,NMS没有抑制住。这时如果只有模长,估计积分结果无法区分。但用上方向场就能更准确,从而去掉另一个。
- 从结果看
- 匹配的方式,不一定用匈牙利,可以用贪心法