Skip to content

OpenPose源码阅读

OpenPose的定位是,骨架识别,(connected) keypoints。方法层面类似Yolo和SSD,是一次预测,不是MaskRCNN那样逐渐细化,所以速度制胜,精度差一点点。

论文:OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,17年论文,引用1.3w次,看完很受启发,可直接跳转总结章节。

代码:pytorch版本官方openpose

1、数据集

coco2017数据整理方式

$ cat person_keypoints_val2017.json|jq '.|keys'|less 
[
  "annotations",
  "categories",
  "images",
  "info",
  "licenses"
]

image对象重点就是idpath

$ cat person_keypoints_val2017.json|jq '.|.images[]'|less
{
  "license": 4,
  "file_name": "000000397133.jpg",
  "coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
  "height": 427,
  "width": 640,
  "date_captured": "2013-11-14 17:02:52",
  "flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
  "id": 397133
}

这个文件只有一种category,是分类的meta信息,不是具体标注

$ cat person_keypoints_val2017.json|jq '.|.categories'   
[
  {
    "supercategory": "person",
    "id": 1,
    "name": "person",
    "keypoints": [
      "nose",
      "left_eye",
      "right_eye",
      "left_ear",
      "right_ear",
      "left_shoulder",
      "right_shoulder",
      "left_elbow",
      "right_elbow",
      "left_wrist",
      "right_wrist",
      "left_hip",
      "right_hip",
      "left_knee",
      "right_knee",
      "left_ankle",
      "right_ankle"
    ],
    "skeleton": [
      [
        16,
        14
      ],
      [
        14,
        12
      ],
      ...
      [
        5,
        7
      ]
    ]
  }
]

重点信息在annotation字段中

$ cat person_keypoints_val2017.json|jq '.|.annotations[]|keys'|less
[
  "area",
  "bbox",
  "category_id",
  "id",
  "image_id",
  "iscrowd",
  "keypoints",
  "num_keypoints",
  "segmentation"
]

$ cat person_keypoints_val2017.json|jq '.|.annotations[0]'
{
  "segmentation": [
    [
      125.12,
      539.69,
      140.94,
      522.43,
      100.67,
      496.54,
      ...   
      145.26,
      567.01,
      117.93,
      551.19,
      133.75,
      532.49
    ]
  ],
  "num_keypoints": 10,
  "area": 47803.27955,
  "iscrowd": 0,
  "keypoints": [
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    142,
    309,
    1,
    177,
    320,
    2,
    191,
    398,
    2,
    237,
    317,
    2,
    233,
    426,
    2,
    306,
    233,
    2,
    92,
    452,
    2,
    123,
    468,
    2,
    0,
    0,
    0,
    251,
    469,
    2,
    0,
    0,
    0,
    162,
    551,
    2
  ],
  "image_id": 425226,
  "bbox": [
    73.35,
    206.02,
    300.58,
    372.5
  ],
  "category_id": 1,
  "id": 183126
}

segmentation是多边形对人体的描边,不用管,这个keypoints是人体关键点,51个点,reshape之后就是(17, 3),17个关键点,每个点为(x, y, weight)weight=0表示没标记,1表示遮挡,2表示可见标注,1和2是可用的。

如果一张图有多个人,那么会有多个annotation对象,每个里面都包含上面的信息,靠image_id来关联到同一张图片。这种扁平化结构之前没有想到,我以为是按照图片来组织。好处是清晰,就像一棵树,也可以表示成node list,文件不用复杂嵌套。在代码中恢复成一棵树。

测试数据中annotation.id一共11004个,对应annotation.image_id一共2693个。平均每张图4个人。

用pyCOCO读取上面category/image/annotation的代码如下,

lib/datasets/datasets.py
class CocoKeypoints(torch.utils.data.Dataset):
    """`MS Coco Detection <http://mscoco.org/dataset/#detections-challenge2016>`_ Dataset.

    Based on `torchvision.dataset.CocoDetection`.

    Caches preprocessing.

    Args:
        root (string): Root directory where images are downloaded to.
        annFile (string): Path to json annotation file.
        transform (callable, optional): A function/transform that  takes in an PIL image
            and returns a transformed version. E.g, ``transforms.ToTensor``
        target_transform (callable, optional): A function/transform that takes in the
            target and transforms it.
    """

    def __init__(self, root, annFile, image_transform=None, target_transforms=None,
                 n_images=None, preprocess=None, all_images=False, all_persons=False, input_y=368, input_x=368, stride=8):
        from pycocotools.coco import COCO
        self.root = root
        self.coco = COCO(annFile)

        self.cat_ids = self.coco.getCatIds(catNms=['person'])
        if all_images:
            self.ids = self.coco.getImgIds()
        elif all_persons:
            self.ids = self.coco.getImgIds(catIds=self.cat_ids)
        else:
            self.ids = self.coco.getImgIds(catIds=self.cat_ids)
            self.filter_for_keypoint_annotations()
        if n_images:
            self.ids = self.ids[:n_images]
        print('Images: {}'.format(len(self.ids)))

    def filter_for_keypoint_annotations(self):
        print('filter for keypoint annotations ...')
        def has_keypoint_annotation(image_id):
            ann_ids = self.coco.getAnnIds(imgIds=image_id, catIds=self.cat_ids)
            anns = self.coco.loadAnns(ann_ids)
            for ann in anns:
                if 'keypoints' not in ann:
                    continue
                if any(v > 0.0 for v in ann['keypoints'][2::3]):
                    return True
            return False

        self.ids = [image_id for image_id in self.ids
                    if has_keypoint_annotation(image_id)]
        print('... done.')

2、标签制作

核心逻辑在这里,看怎么生成关键点热度图和躯干向量场。

2.1 dataset.__getitem__

lib/datasets/datasets.py
class CocoKeypoints(torch.utils.data.Dataset):
    ...

    def __getitem__(self, index):
        """
        Args:
            index (int): Index

        Returns:
            tuple: Tuple (image, target). target is the object returned by ``coco.loadAnns``.
        """
        image_id = self.ids[index]
        # 获取同一张图片的所有annotations的ids
        ann_ids = self.coco.getAnnIds(imgIds=image_id, catIds=self.cat_ids)
        # ids -> json
        anns = self.coco.loadAnns(ann_ids)
        anns = copy.deepcopy(anns)

        image_info = self.coco.loadImgs(image_id)[0]
        self.log.debug(image_info)
        with open(os.path.join(self.root, image_info['file_name']), 'rb') as f:
            image = Image.open(f).convert('RGB')

        meta_init = {
            'dataset_index': index,
            'image_id': image_id,
            'file_name': image_info['file_name'],
        }

        image, anns, meta = self.preprocess(image, anns, None)

json解析成np.array

lib/datasets/transforms.py
class Normalize(Preprocess):
    @staticmethod
    def normalize_annotations(anns):
        anns = copy.deepcopy(anns)

        # convert as much data as possible to numpy arrays to avoid every float
        # being turned into its own torch.Tensor()
        for ann in anns:
            ann['keypoints'] = np.asarray(ann['keypoints'], dtype=np.float32).reshape(-1, 3)
            ann['bbox'] = np.asarray(ann['bbox'], dtype=np.float32)
            ann['bbox_original'] = np.copy(ann['bbox'])
            del ann['segmentation']

        return anns

    def __call__(self, image, anns, meta):
        anns = self.normalize_annotations(anns)

        if meta is None:
            w, h = image.size
            meta = {
                'offset': np.array((0.0, 0.0)),
                'scale': np.array((1.0, 1.0)),
                'valid_area': np.array((0.0, 0.0, w, h)),
                'hflip': False,
                'width_height': np.array((w, h)),
            }

        return image, anns, meta

读入后的anns为列表,每个形如

{
  'num_keypoints': 9, 
  'area': 14853.30435, 
  'iscrowd': 0, 
  'keypoints': array([[ 199.37424,  249.82243,    2.     ], [ 201.70392,  251.37383,    2.   ...,   17.88785,    0.     ]], dtype=float32), 
  'image_id': 327701, 
  'bbox': array([126.61331 , 225.02682 , 119.978905, 118.364204], dtype=float32), 
  'category_id': 1, 
  'id': 444736, 
  'bbox_original': array([329.16, 266.89, 154.5 , 152.59], dtype=float32), 
  'valid_area': array([  0.,  18., 368., 332.])
}

2.2 关键点热度图

图片image及图中所有人物标记anns,生成关键点heatmap和躯干paf。

heatmap的逻辑是,17个关键点+脖子+背景,一共18个关键点,19维,每个关键点生成一张图,叠加了图中所有人的相同部位。生成热度图的维度是(46, 46, 19)

lib/datasets/datasets.py
class CocoKeypoints(torch.utils.data.Dataset):

    def get_ground_truth(self, anns):

        grid_y = int(self.input_y / self.stride)
        grid_x = int(self.input_x / self.stride) # 这步是先把原300尺寸的图,降低到backbone之后的(46,46)的小图上标注
        channels_heat = (self.HEATMAP_COUNT + 1)
        channels_paf = 2 * len(self.LIMB_IDS)
        heatmaps = np.zeros((int(grid_y), int(grid_x), channels_heat))
        pafs = np.zeros((int(grid_y), int(grid_x), channels_paf))

        keypoints = []
        for ann in anns:
            single_keypoints = np.array(ann['keypoints']).reshape(17,3)
            single_keypoints = self.add_neck(single_keypoints)
            keypoints.append(single_keypoints)
        keypoints = np.array(keypoints)
        keypoints = self.remove_illegal_joint(keypoints)

        # confidance maps for body parts
        for i in range(self.HEATMAP_COUNT):
            joints = [jo[i] for jo in keypoints]#每一种关节点
            for joint in joints:#遍历每一个点
                if joint[2] > 0.5:#1是标注被遮挡 2是标注且没被遮挡
                    center = joint[:2]#点坐标
                    gaussian_map = heatmaps[:, :, i]
                    heatmaps[:, :, i] = putGaussianMaps(
                        center, gaussian_map,
                        7.0, grid_y, grid_x, self.stride)

        # paf
        ...

        # background
        heatmaps[:, :, -1] = np.maximum(
            1 - np.max(heatmaps[:, :, :self.HEATMAP_COUNT], axis=2),
            0.
        )
        return heatmaps, pafs

Gauss核函数的热度图,用0-1截断。

lib/datasets/heatmap.py
def putGaussianMaps(center, accumulate_confid_map, sigma, grid_y, grid_x, stride):

    start = stride / 2.0 - 0.5
    y_range = [i for i in range(int(grid_y))]
    x_range = [i for i in range(int(grid_x))]
    xx, yy = np.meshgrid(x_range, y_range)
    xx = xx * stride + start
    yy = yy * stride + start
    d2 = (xx - center[0]) ** 2 + (yy - center[1]) ** 2
    exponent = d2 / 2.0 / sigma / sigma
    mask = exponent <= 4.6052
    cofid_map = np.exp(-exponent)
    cofid_map = np.multiply(mask, cofid_map)
    accumulate_confid_map += cofid_map # 多个点会叠加的
    accumulate_confid_map[accumulate_confid_map > 1.0] = 1.0

    return accumulate_confid_map

2.3 躯干向量场

轮训组成躯干的(centerA, centerB),如果有点没标注,就continue。每个人体设置有19个躯干,所以维度是(46, 46, 19, 2)2是因为向量图保存的是(x, y)坐标。程序里当成reshape(46, 46, 38)来处理的。预估一个“场”这个想法有点意思,有点像解微分方程的方法。

lib/datasets/datasets.py
class CocoKeypoints(torch.utils.data.Dataset):
    ...   
        # pafs
        for i, (k1, k2) in enumerate(self.LIMB_IDS):
            # limb
            count = np.zeros((int(grid_y), int(grid_x)), dtype=np.uint32) # 表示该位置是否被计算了多次(计算的数量)
            for joint in keypoints:
                if joint[k1, 2] > 0.5 and joint[k2, 2] > 0.5:
                    centerA = joint[k1, :2]
                    centerB = joint[k2, :2]
                    vec_map = pafs[:, :, 2 * i:2 * (i + 1)] #每一个躯干位置,选择x和y两个方向

                    pafs[:, :, 2 * i:2 * (i + 1)], count = putVecMaps(
                        centerA=centerA,
                        centerB=centerB,
                        accumulate_vec_map=vec_map,
                        count=count, grid_y=grid_y, grid_x=grid_x, stride=self.stride
                    )

热度图在多个人物时候,是用sum来aggregate,躯干矩形是向量图,如果多个人物的躯干经过同一个点,则该点处的向量为过这个点向量的平均值。

image-20240808172146315

lib/datasets/paf.py
def putVecMaps(centerA, centerB, accumulate_vec_map, count, grid_y, grid_x, stride):
    centerA = centerA.astype(float)
    centerB = centerB.astype(float)

    thre = 1  # limb width
    centerB = centerB / stride #映射到特征图中
    centerA = centerA / stride

    limb_vec = centerB - centerA
    norm = np.linalg.norm(limb_vec)#求范数
    if (norm == 0.0):
        # print 'limb is too short, ignore it...'
        return accumulate_vec_map, count
    limb_vec_unit = limb_vec / norm #单位向量
    # print 'limb unit vector: {}'.format(limb_vec_unit)

    # To make sure not beyond the border of this two points
    min_x = max(int(round(min(centerA[0], centerB[0]) - thre)), 0)# 得到所有可能区域
    max_x = min(int(round(max(centerA[0], centerB[0]) + thre)), grid_x)
    min_y = max(int(round(min(centerA[1], centerB[1]) - thre)), 0)
    max_y = min(int(round(max(centerA[1], centerB[1]) + thre)), grid_y)

    range_x = list(range(int(min_x), int(max_x), 1))
    range_y = list(range(int(min_y), int(max_y), 1))
    xx, yy = np.meshgrid(range_x, range_y)
    ba_x = xx - centerA[0]  # the vector from (x,y) to centerA 根据位置判断是否在该区域上(分别得到X和Y方向的)
    ba_y = yy - centerA[1]
    limb_width = np.abs(ba_x * limb_vec_unit[1] - ba_y * limb_vec_unit[0]) # 每个点在躯干的法向量上的投影距离,也就是宽度
    mask = limb_width < thre  # mask is 2D # 小于阈值的表示在该区域上

    vec_map = np.copy(accumulate_vec_map) * 0.0 #本次计算

    vec_map[yy, xx] = np.repeat(mask[:, :, np.newaxis], 2, axis=2)
    vec_map[yy, xx] *= limb_vec_unit[np.newaxis, np.newaxis, :] #在该区域上的都用对应的方向向量表示(根据mask结果表示是否在)

    mask = np.logical_or.reduce(
        (np.abs(vec_map[:, :, 0]) > 0, np.abs(vec_map[:, :, 1]) > 0)) #在特征图中(46*46)中 哪些区域是该躯干所在区域

    accumulate_vec_map = np.multiply(
        accumulate_vec_map, count[:, :, np.newaxis]) #每次返回的accumulate_vec_map都是平均值,现在还原成实际值
    accumulate_vec_map += vec_map # 加上当前关键点位置形成的向量
    count[mask == True] += 1 # 该区域计算次数都+1

    mask = count == 0

    count[mask == True] = 1 # 没有被计算过的地方就等于自身(因为一会要除法)

    accumulate_vec_map = np.divide(accumulate_vec_map, count[:, :, np.newaxis])# 算平均向量
    count[mask == True] = 0 # 还原回去

    return accumulate_vec_map, count

这块代码最有意思的就是,如何获取一个长方形,方法是计算方向的法向量,然后将向量坐标投影到法向量上,就是模长了。

3、训练

至此,train_loader可以生成每张图片,及对应的label,也就是heatmap和paf。之后,传入模型进行forward,经过backbone之后image转化成为和label尺寸相同的(46, 46, 19)(46, 46, 38)。最后,就可以计算L2 loss了(MSELoss)

train_VGG19.py
        img = img.to(device)
        heatmap_target = heatmap_target.to(device)
        paf_target = paf_target.to(device)
        # compute output
        _,saved_for_loss = model(img)

        total_loss, saved_for_log = get_loss(saved_for_loss, heatmap_target, paf_target)

        # compute gradient and do SGD step
        optimizer.zero_grad()
        total_loss.backward()
        optimizer.step()

saved_for_loss保存了6组转化后的特征图,除了backbone以外,还有经过stride=1的Conv的另外5层网络结果,尺寸保持了相同,但是可视野逐渐增加。

4、预测

这部分我感觉是最难的,看代码前我带的困惑是,

  • 如何能从热度图还原成点,难道是topK么,就不怕都取到一个gauss的圈里吗,还是用Kmean之类算出K个中心?
  • 匈牙利算法匹配是否要求头的数量和脖子数量相同,如果上面一步没法做到怎么办,比如半身图中看不到这个人的脚
  • 匈牙利算法的边,用的是两个点在预估的paf场上的投影长度,这块代码怎么写
答案
  • 使用的图像工具max_filter模糊化取局部极值,即使超过K也无所谓,后续匹配过程会过滤掉次优的中心。
  • 匈牙利算法不要求方阵,可以有空着的人没有任务,也可以有空着的任务没人接。

其他疑惑,

  • 预测时,图片不一定是方块图,结果是短边缩短到了46(对应原图先把短边缩短到368,长边对应缩放)
  • 训练时分层输出了loss,预测的19d和38d都是取模型最后一层的,前面的只计算loss用
evaluate/coco_eval.py
# model.forward
>>> predicted_outputs[0].shape
torch.Size([1, 38, 94, 46])
>>> predicted_outputs[1].shape
torch.Size([1, 19, 94, 46])
# reshape, H, W, C
>>> heatmap.shape
(94, 46, 19)
>>> paf.shape
(94, 46, 38)

4.1 demo

image-20240809165343899

image-20240809165116913

  • 倒置会导致识别不到,估计训练集没涉及
  • 缺少的部位不会识别点和边,不会出错;
  • 遮挡但存在的部位却能正确识别,很厉害,不知道匈牙利是怎么处理的
  • 也有识别错的,比如我的右手手腕,远处行人的手腕

4.2 模型预估热度图可视化

直观感受下模型的端到端的结果,首先是关键点热度图,

# plot 18 headmaps using subplots
num_joins = heatmap.shape[-1] - 1
fig = plt.figure(figsize=(8, 10))
for i in range(1, num_joins+1):
    fig.add_subplot(6, 3, i)
    plt.imshow(heatmap[:, :, i])

image-20240809165551239

点预估的还是比较精确的,方差不大,挺意外的,没有像VAE那样加高斯的限制,只靠l2的Loss就能把预估结果限制得这么好了。

然后,画出PAF向量x轴上的分量,

# plot 19 pafs using subplots
num_pafs = paf.shape[-1] // 2
fig = plt.figure(figsize=(8, 10))
for i in range(1, num_pafs+1):
    fig.add_subplot(6, 4, i)
    plt.imshow(paf[:, :, i])

image-20240809170044254

躯干预估的矩形也很清晰了,矩形的宽度能看到,估计也是loss限制出来的。缺失的躯干在这一步也就会不预测,就像没识别到的人一样。如此,通过积分之后设定一个阈值,就可以保留可信躯干了(猜测)。

4.3 热度图到点集合

Non Maxima Suppression <=> 正着来说,就是找到热度图的所有局部极大值,所以NMS只是一个过程名,不是个算法名

lib/utils/paf_to_pose.py
def NMS(heatmaps, upsampFactor=1., bool_refine_center=True, bool_gaussian_filt=False, config=None):
    """
    NonMaximaSuppression: find peaks (local maxima) in a set of grayscale images
    :param heatmaps: set of grayscale images on which to find local maxima (3d np.array,
    with dimensions image_height x image_width x num_heatmaps)
    :param upsampFactor: Size ratio between CPM heatmap output and the input image size.
    Eg: upsampFactor=16 if original image was 480x640 and heatmaps are 30x40xN
    :param bool_refine_center: Flag indicating whether:
     - False: Simply return the low-res peak found upscaled by upsampFactor (subject to grid-snap)
     - True: (Recommended, very accurate) Upsample a small patch around each low-res peak and
     fine-tune the location of the peak at the resolution of the original input image
    :param bool_gaussian_filt: Flag indicating whether to apply a 1d-GaussianFilter (smoothing)
    to each upsampled patch before fine-tuning the location of each peak.
    :return: a NUM_JOINTS x 4 np.array where each row represents a joint type (0=nose, 1=neck...)
    and the columns indicate the {x,y} position, the score (probability) and a unique id (counter)
    """
    # MODIFIED BY CARLOS: Instead of upsampling the heatmaps to heatmap_avg and
    # then performing NMS to find peaks, this step can be sped up by ~25-50x by:
    # (9-10ms [with GaussFilt] or 5-6ms [without GaussFilt] vs 250-280ms on RoG
    # 1. Perform NMS at (low-res) CPM's output resolution
    # 1.1. Find peaks using scipy.ndimage.filters.maximum_filter
    # 2. Once a peak is found, take a patch of 5x5 centered around the peak, upsample it, and
    # fine-tune the position of the actual maximum.
    #  '-> That's equivalent to having found the peak on heatmap_avg, but much faster because we only
    #      upsample and scan the 5x5 patch instead of the full (e.g.) 480x640

    joint_list_per_joint_type = []
    cnt_total_joints = 0

    # For every peak found, win_size specifies how many pixels in each
    # direction from the peak we take to obtain the patch that will be
    # upsampled. Eg: win_size=1 -> patch is 3x3; win_size=2 -> 5x5
    # (for BICUBIC interpolation to be accurate, win_size needs to be >=2!)
    win_size = 2

    for joint in range(config.MODEL.NUM_KEYPOINTS):
        map_orig = heatmaps[:, :, joint]
        peak_coords = find_peaks(config.TEST.THRESH_HEATMAP, map_orig)
        peaks = np.zeros((len(peak_coords), 4))
        for i, peak in enumerate(peak_coords):
            if bool_refine_center:
                x_min, y_min = np.maximum(0, peak - win_size)
                x_max, y_max = np.minimum(
                    np.array(map_orig.T.shape) - 1, peak + win_size)

                # Take a small patch around each peak and only upsample that
                # tiny region
                patch = map_orig[y_min:y_max + 1, x_min:x_max + 1]
                map_upsamp = cv2.resize(
                    patch, None, fx=upsampFactor, fy=upsampFactor, interpolation=cv2.INTER_CUBIC)

                # Gaussian filtering takes an average of 0.8ms/peak (and there might be
                # more than one peak per joint!) -> For now, skip it (it's
                # accurate enough)
                map_upsamp = gaussian_filter(
                    map_upsamp, sigma=3) if bool_gaussian_filt else map_upsamp

                # Obtain the coordinates of the maximum value in the patch
                location_of_max = np.unravel_index(
                    map_upsamp.argmax(), map_upsamp.shape)
                # Remember that peaks indicates [x,y] -> need to reverse it for
                # [y,x]
                location_of_patch_center = compute_resized_coords(
                    peak[::-1] - [y_min, x_min], upsampFactor)
                # Calculate the offset wrt to the patch center where the actual
                # maximum is
                refined_center = (location_of_max - location_of_patch_center)
                peak_score = map_upsamp[location_of_max]
            else:
                refined_center = [0, 0]
                # Flip peak coordinates since they are [x,y] instead of [y,x]
                peak_score = map_orig[tuple(peak[::-1])]
            peaks[i, :] = tuple(
                x for x in compute_resized_coords(peak_coords[i], upsampFactor) + refined_center[::-1]) + (
                              peak_score, cnt_total_joints)
            cnt_total_joints += 1
        joint_list_per_joint_type.append(peaks)

    return joint_list_per_joint_type

调用的函数,

lib/utils/paf_to_pose.py
from scipy.ndimage.filters import gaussian_filter, maximum_filter
from scipy.ndimage.morphology import generate_binary_structure

def find_peaks(param, img):
    """
    Given a (grayscale) image, find local maxima whose value is above a given
    threshold (param['thre1'])
    :param img: Input image (2d array) where we want to find peaks
    :return: 2d np.array containing the [x,y] coordinates of each peak found
    in the image
    """

    peaks_binary = (maximum_filter(img, footprint=generate_binary_structure(
        2, 1)) == img) * (img > param)
    # Note reverse ([::-1]): we return [[x y], [x y]...] instead of [[y x], [y
    # x]...]
    return np.array(np.nonzero(peaks_binary)[::-1]).T

peak_coords
array([[32, 18],
       [41, 21],
       [52, 22],
       [ 7, 26]])

max_filtergaussian_filter都是图像变换,分别max卷积和gauss卷积,下面的图片展示了局部极大值是怎么找到的,

image-20240809175200433

image-20240809180343433

小结一下

find_peaks做的事是,回答了最开始的问题,如何从高斯热度图还原回中心点

  • 用二值化,0.1阈值将热度图转换为黑白,黑的置0,白的值不动
  • 现成的max_filter进行模糊,平抑模型直接预估的结果,像CRF对NER的作用。
  • 选取模糊后与模糊前值相同的位置,基本就是局部最高值了

这样处理之后还是会在同一个区域出现多个点,不一定是局部最高值,NMS函数做的事是

  • 分而治之,用上一步定位的局部区域,再圈出一个扩展的方块,
  • 在方块里做一次gaussion模糊,然后取argmax的位置,认为是中心点
  • 可能还是会一个人的关节识别成多个点,后面怎么处理?=> 即使头识别出3个,脖子只有2个,那么PAF场的匹配也只会让脖子找到最佳的那个头,抑制掉相邻的另一个头(见下一小节)
>>> len(joint_list_per_joint_type)
18
>>> joint_list_per_joint_type[1]
array([[254.        , 164.        ,   0.9705171 ,   4.        ],
       [331.        , 183.        ,   0.91956604,   5.        ],
       [428.        , 203.        ,   0.9459275 ,   6.        ],
       [ 60.        , 214.        ,   0.89996499,   7.        ]])
>>> joint_list_per_joint_type[2]
array([[244.        , 159.        ,   0.89307928,   8.        ],
       [321.        , 179.        ,   0.88130075,   9.        ],
       [412.        , 204.        ,   0.89099789,  10.        ],
       [ 54.        , 214.        ,   0.93888676,  11.        ]])
>>> joint_list_per_joint_type[3]
array([[218.        , 157.        ,   0.92710817,  12.        ],
       [308.        , 160.        ,   0.88837695,  13.        ],
       [403.        , 170.        ,   0.70073223,  14.        ],
       [ 51.        , 221.        ,   0.77465415,  15.        ],
       [414.        , 233.        ,   0.71171707,  16.        ]])

这里能看出来joint_3就识别了5个峰值点,其他都是识别到了4个人的位置。

关节点识别完,就该生成两种关节点之间的最优匹配了,比如type2和3之间,应该怎么连,传入PAF预估

4.4 关键点到链接

lib/utils/paf_to_pose.py
        pafprocess.process_paf(joint_list, heatmap_upsamp, paf_upsamp)
>>> joint_list.shape
(1, 74, 5)
>>> heatmap_upsamp.shape
(368, 616, 19)
>>> paf_upsamp.shape
(368, 616, 38)

joint_list是1张图片的74个关键点,每个关键点是个tuple(5)表示,像上面出现过的,x, y, score, rank, part_id

process_paf是一个cpp程序,我们看下

lib/pafprocess/pafprocess.cpp
vector <Peak> peak_infos_line;
const int NUM_PART = 18;
#define PEAKS(i, j, k) peaks[k+p3*(j+p2*i)]
#define HEAT(i, j, k) heatmap[k+h3*(j+h2*i)]
#define PAF(i, j, k) pafmap[k+f3*(j+f2*i)] // 这些是文字替换功能,从flatten的矩阵中取出index位置的数

int process_paf(int p1, int p2, int p3, float *peaks, int h1, int h2, int h3, float *heatmap, int f1, int f2, int f3,
                float *pafmap) {
    vector <Peak> peak_infos[NUM_PART];
    int peak_cnt = 0;
    for (int img_id = 0; img_id < p1; img_id++){
        for (int peak_index = 0; peak_index < p2; peak_index++) {
            Peak info;
            info.id = peak_cnt++;
            info.x = PEAKS(img_id, peak_index, 0);
            info.y = PEAKS(img_id, peak_index, 1);
            info.score = PEAKS(img_id, peak_index, 2);
            int part_id = PEAKS(img_id, peak_index, 4);
            peak_infos[part_id].push_back(info);
        }
    }

    peak_infos_line.clear();
    for (int part_id = 0; part_id < NUM_PART; part_id++) {
        for (int i = 0; i < (int) peak_infos[part_id].size(); i++) {
            peak_infos_line.push_back(peak_infos[part_id][i]);
        }
    }

首先就是解析joint_list的tuple为Peak对象,变成一个peak_infos字典(实际是列表),peak_infos_line就是flatten了,变成rank->Peak的字典,方便后面直接根据rank_id取。

接下来

lib/pafprocess/pafprocess.cpp
    // Start to Connect
    for (int pair_id = 0; pair_id < COCOPAIRS_SIZE; pair_id++) {
        vector <ConnectionCandidate> candidates;
        vector <Peak> &peak_a_list = peak_infos[COCOPAIRS[pair_id][0]];
        vector <Peak> &peak_b_list = peak_infos[COCOPAIRS[pair_id][1]];
        // 这里就是取limb的两个joint,然后把两层的joint_list取出来,准备做之间的匹配

        if (peak_a_list.size() == 0 || peak_b_list.size() == 0) {
            continue;
        }

        for (int peak_a_id = 0; peak_a_id < (int) peak_a_list.size(); peak_a_id++) {
            Peak &peak_a = peak_a_list[peak_a_id];
            for (int peak_b_id = 0; peak_b_id < (int) peak_b_list.size(); peak_b_id++) {
                Peak &peak_b = peak_b_list[peak_b_id];
                        // 例如,joint2和joint3之间,轮训 4 * 5次

                // calculate vector(direction)
                VectorXY vec;
                vec.x = peak_b.x - peak_a.x;
                vec.y = peak_b.y - peak_a.y;
                float norm = (float) sqrt(vec.x * vec.x + vec.y * vec.y);
                if (norm < 1e-12) continue;
                vec.x = vec.x / norm;
                vec.y = vec.y / norm;

                vector <VectorXY> paf_vecs = get_paf_vectors(pafmap, COCOPAIRS_NET[pair_id][0],
                                                             COCOPAIRS_NET[pair_id][1], f2, f3, peak_a, peak_b);
                float scores = 0.0f;

                // criterion 1 : score treshold count
                int criterion1 = 0;
                for (int i = 0; i < STEP_PAF; i++) {
                    float score = vec.x * paf_vecs[i].x + vec.y * paf_vecs[i].y;
                    scores += score;

                    if (score > THRESH_VECTOR_SCORE) criterion1 += 1;
                }

                float criterion2 = scores / STEP_PAF + min(0.0, 0.5 * h1 / norm - 1.0);
                // 根据后面的python代码,这里是两个条件
                // 1. 格子店上的点,至少80%的点超过阈值
                // 2. 均值(打压过长的长度之后)> 0
                // 满足则把一对joint放入召回候选,均值当成匈牙利算法的weight
                if (criterion1 > THRESH_VECTOR_CNT1 && criterion2 > 0) {
                    ConnectionCandidate candidate;
                    candidate.idx1 = peak_a_id;
                    candidate.idx2 = peak_b_id;
                    candidate.score = criterion2;
                    candidate.etc = criterion2 + peak_a.score + peak_b.score;
                    candidates.push_back(candidate);
                }
            }
        }
    }

// 这个函数是解析pafmap变成VectorXY: (x, y)对象,ch_id1和ch_id2就是两个joint_id,比如joint2[0]和joint3[1],
// 然后这条线段在paf热度图上的值取出来,理论上应该是一条线上的路径积分,但是这个图实际是离散的,就把格子点上的paf向量取出来就行
// [(x1, y1), (x2, y2), ...]
// 再和joint2[0],joint3[1]构成的向量vec进行内积,求和
vector <VectorXY>
get_paf_vectors(float *pafmap, const int &ch_id1, const int &ch_id2, int &f2, int &f3, Peak &peak1, Peak &peak2) {
    vector <VectorXY> paf_vectors;

    const float STEP_X = (peak2.x - peak1.x) / float(STEP_PAF);
    const float STEP_Y = (peak2.y - peak1.y) / float(STEP_PAF);

    for (int i = 0; i < STEP_PAF; i++) {
        int location_x = roundpaf(peak1.x + i * STEP_X);
        int location_y = roundpaf(peak1.y + i * STEP_Y);

        VectorXY v;
        v.x = PAF(location_y, location_x, ch_id1);
        v.y = PAF(location_y, location_x, ch_id2);
        paf_vectors.push_back(v);
    }

    return paf_vectors;
}

获得两层间的匈牙利算法的指派权重后,计算最佳指派

lib/pafprocess/pafprocess.cpp
    const int COCOPAIRS_SIZE = 19;
    vector <Connection> connection_all[COCOPAIRS_SIZE];

        ...
        // 最终指派结果保存在connection_all里
        vector <Connection> &conns = connection_all[pair_id];
        // 两层间的召回候选按照lambda x:x.score排序,comp_candidate就是lambda
        sort(candidates.begin(), candidates.end(), comp_candidate);
        for (int c_id = 0; c_id < (int) candidates.size(); c_id++) {
            ConnectionCandidate &candidate = candidates[c_id];
            // 从weight高到低,轮训candidate,
            bool assigned = false;
            // 看candidate的起点和终点是否已经在conns结果里了,
            // 如果不在,则增加到conns结果,如果不在就pass
            // 因此,假设joint2是4个点,joint3是5个候选,且前面都最佳匹配上了,那剩下的第5个点将因为起点已经在结果中,而丢掉就行了,这回答了上面的问题
            for (int conn_id = 0; conn_id < (int) conns.size(); conn_id++) {
                if (conns[conn_id].peak_id1 == candidate.idx1) {
                    // already assigned
                    assigned = true;
                    break;
                }
                if (assigned) break;
                if (conns[conn_id].peak_id2 == candidate.idx2) {
                    // already assigned
                    assigned = true;
                    break;
                }
                if (assigned) break;
            }
            if (assigned) continue;

            Connection conn;
            conn.peak_id1 = candidate.idx1;
            conn.peak_id2 = candidate.idx2;
            conn.score = candidate.score;
            conn.cid1 = peak_a_list[candidate.idx1].id;
            conn.cid2 = peak_b_list[candidate.idx2].id;
            conns.push_back(conn);

论文中使用的是匈牙利算法,这个代码里用的是贪心算法,(论文则是发现差异不大,速度最快)

  • 对candidate按照匹配分数倒排

  • 看candidate的起点和终点是否已经在conns结果里了,

  • 如果不在,则增加到conns结果
  • 如果不在就pass
  • 因此,假设joint_2是4个点,joint_3是5个候选,且前面都最佳匹配上了,那剩下的第5个点将因为起点已经在结果中,而丢掉就行了,这回答了上面的问题。

P.S.匈牙利算法也可以使用,可以调用scipy.optimize.linear_sum_assignment

还剩下一点,是根据获得的连接,把18个关键点拆分成对应的人,有点复杂,但属于功能性代码,不细看了

lib/pafprocess/pafprocess.cpp
vector <vector<float> > subset;
    ...
    // Generate subset
    subset.clear();
    for (int pair_id = 0; pair_id < COCOPAIRS_SIZE; pair_id++) {
        // 轮训limbs
        vector <Connection> &conns = connection_all[pair_id];
        int part_id1 = COCOPAIRS[pair_id][0]; // 头
        int part_id2 = COCOPAIRS[pair_id][1]; // 眼睛

        for (int conn_id = 0; conn_id < (int) conns.size(); conn_id++) {
            // 轮训一种limb的所有出现的object
            int found = 0;
            int subset_idx1 = 0, subset_idx2 = 0;
            for (int subset_id = 0; subset_id < (int) subset.size(); subset_id++) {
                if (subset[subset_id][part_id1] == conns[conn_id].cid1 ||
                    subset[subset_id][part_id2] == conns[conn_id].cid2) {
                    if (found == 0) subset_idx1 = subset_id;
                    if (found == 1) subset_idx2 = subset_id;
                    found += 1;
                }
            }

            if (found == 1) {
                if (subset[subset_idx1][part_id2] != conns[conn_id].cid2) {
                    subset[subset_idx1][part_id2] = conns[conn_id].cid2;
                    subset[subset_idx1][19] += 1;
                    subset[subset_idx1][18] += peak_infos_line[conns[conn_id].cid2].score + conns[conn_id].score;
                }
            } else if (found == 2) {
                int membership = 0;
                for (int subset_id = 0; subset_id < 18; subset_id++) {
                    if (subset[subset_idx1][subset_id] > 0 && subset[subset_idx2][subset_id] > 0) {
                        membership = 2;
                    }
                }

                if (membership == 0) {
                    for (int subset_id = 0; subset_id < 18; subset_id++)
                        subset[subset_idx1][subset_id] += (subset[subset_idx2][subset_id] + 1);

                    subset[subset_idx1][19] += subset[subset_idx2][19];
                    subset[subset_idx1][18] += subset[subset_idx2][18];
                    subset[subset_idx1][18] += conns[conn_id].score;
                    subset.erase(subset.begin() + subset_idx2);
                } else {
                    subset[subset_idx1][part_id2] = conns[conn_id].cid2;
                    subset[subset_idx1][19] += 1;
                    subset[subset_idx1][18] += peak_infos_line[conns[conn_id].cid2].score + conns[conn_id].score;
                }
            } else if (found == 0 && pair_id < 18) {
                vector<float> row(20);
                for (int i = 0; i < 20; i++) row[i] = -1;
                row[part_id1] = conns[conn_id].cid1;
                row[part_id2] = conns[conn_id].cid2;
                row[19] = 2;
                row[18] = peak_infos_line[conns[conn_id].cid1].score +
                         peak_infos_line[conns[conn_id].cid2].score +
                         conns[conn_id].score;
                subset.push_back(row);
            }
        }
    }

    // delete some rows
    for (int i = subset.size() - 1; i >= 0; i--) {
        if (subset[i][19] < THRESH_PART_CNT || subset[i][18] / subset[i][19] < THRESH_HUMAN_SCORE)
            subset.erase(subset.begin() + i);
    }

弄好就可以调用如下函数

lib/pafprocess/pafprocess.cpp
int get_num_humans() {
    return subset.size();
}

int get_part_cid(int human_id, int part_id) {
    return subset[human_id][part_id];
}

float get_score(int human_id) {
    return subset[human_id][18] / subset[human_id][19];
}

4.5 补充:python版本程序

lib/utils/paf_to_pose.py
def paf_to_pose(heatmaps, pafs, config):
    # Bottom-up approach:
    # Step 1: find all joints in the image (organized by joint type: [0]=nose,
    # [1]=neck...)
    joint_list_per_joint_type = NMS(heatmaps, upsampFactor=config.MODEL.DOWNSAMPLE, config=config)
    # joint_list is an unravel'd version of joint_list_per_joint, where we add
    # a 5th column to indicate the joint_type (0=nose, 1=neck...)
    joint_list = np.array([tuple(peak) + (joint_type,) for joint_type,
                                                           joint_peaks in enumerate(joint_list_per_joint_type) for peak in joint_peaks])

    # import ipdb
    # ipdb.set_trace()
    # Step 2: find which joints go together to form limbs (which wrists go
    # with which elbows)
    paf_upsamp = cv2.resize(
        pafs, None, fx=config.MODEL.DOWNSAMPLE, fy=config.MODEL.DOWNSAMPLE, interpolation=cv2.INTER_CUBIC)
    connected_limbs = find_connected_joints(paf_upsamp, joint_list_per_joint_type,
                                            config.TEST.NUM_INTERMED_PTS_BETWEEN_KEYPOINTS, config)

    # Step 3: associate limbs that belong to the same person
    person_to_joint_assoc = group_limbs_of_same_person(
        connected_limbs, joint_list, config)

    return joint_list, person_to_joint_assoc

匈牙利算法这次是用python实现,如下

lib/utils/paf_to_pose.py
def find_connected_joints(paf_upsamp, joint_list_per_joint_type, num_intermed_pts=10, config=None):
    """
    For every type of limb (eg: forearm, shin, etc.), look for every potential
    pair of joints (eg: every wrist-elbow combination) and evaluate the PAFs to
    determine which pairs are indeed body limbs.
    :param paf_upsamp: PAFs upsampled to the original input image resolution
    :param joint_list_per_joint_type: See 'return' doc of NMS()
    :param num_intermed_pts: Int indicating how many intermediate points to take
    between joint_src and joint_dst, at which the PAFs will be evaluated
    :return: List of NUM_LIMBS rows. For every limb_type (a row) we store
    a list of all limbs of that type found (eg: all the right forearms).
    For each limb (each item in connected_limbs[limb_type]), we store 5 cells:
    # {joint_src_id,joint_dst_id}: a unique number associated with each joint,
    # limb_score_penalizing_long_dist: a score of how good a connection
    of the joints is, penalized if the limb length is too long
    # {joint_src_index,joint_dst_index}: the index of the joint within
    all the joints of that type found (eg: the 3rd right elbow found)
    """
    connected_limbs = []

    # Auxiliary array to access paf_upsamp quickly
    limb_intermed_coords = np.empty((4, num_intermed_pts), dtype=np.intp)
    for limb_type in range(NUM_LIMBS):
        # List of all joints of type A found, where A is specified by limb_type
        # (eg: a right forearm starts in a right elbow)
        joints_src = joint_list_per_joint_type[joint_to_limb_heatmap_relationship[limb_type][0]]
        # List of all joints of type B found, where B is specified by limb_type
        # (eg: a right forearm ends in a right wrist)
        joints_dst = joint_list_per_joint_type[joint_to_limb_heatmap_relationship[limb_type][1]]
        # print(joint_to_limb_heatmap_relationship[limb_type][0])
        # print(joint_to_limb_heatmap_relationship[limb_type][1])
        # print(paf_xy_coords_per_limb[limb_type][0])
        # print(paf_xy_coords_per_limb[limb_type][1])
        if len(joints_src) == 0 or len(joints_dst) == 0:
            # No limbs of this type found (eg: no right forearms found because
            # we didn't find any right wrists or right elbows)
            connected_limbs.append([])
        else:
            connection_candidates = []
            # Specify the paf index that contains the x-coord of the paf for
            # this limb
            limb_intermed_coords[2, :] = paf_xy_coords_per_limb[limb_type][0]
            # And the y-coord paf index
            limb_intermed_coords[3, :] = paf_xy_coords_per_limb[limb_type][1]
            for i, joint_src in enumerate(joints_src):
                # Try every possible joints_src[i]-joints_dst[j] pair and see
                # if it's a feasible limb
                for j, joint_dst in enumerate(joints_dst):
                    # Subtract the position of both joints to obtain the
                    # direction of the potential limb
                    limb_dir = joint_dst[:2] - joint_src[:2]
                    # Compute the distance/length of the potential limb (norm
                    # of limb_dir)
                    limb_dist = np.sqrt(np.sum(limb_dir ** 2)) + 1e-8
                    limb_dir = limb_dir / limb_dist  # Normalize limb_dir to be a unit vector

                    # Linearly distribute num_intermed_pts points from the x
                    # coordinate of joint_src to the x coordinate of joint_dst
                    limb_intermed_coords[1, :] = np.round(np.linspace(
                        joint_src[0], joint_dst[0], num=num_intermed_pts))
                    limb_intermed_coords[0, :] = np.round(np.linspace(
                        joint_src[1], joint_dst[1], num=num_intermed_pts))  # Same for the y coordinate
                    intermed_paf = paf_upsamp[limb_intermed_coords[0, :],
                                              limb_intermed_coords[1, :], limb_intermed_coords[2:4, :]].T

                    score_intermed_pts = intermed_paf.dot(limb_dir)
                    score_penalizing_long_dist = score_intermed_pts.mean(
                    ) + min(0.5 * paf_upsamp.shape[0] / limb_dist - 1, 0)
                    # Criterion 1: At least 80% of the intermediate points have
                    # a score higher than thre2
                    criterion1 = (np.count_nonzero(
                        score_intermed_pts > config.TEST.THRESH_PAF) > 0.8 * num_intermed_pts)
                    # Criterion 2: Mean score, penalized for large limb
                    # distances (larger than half the image height), is
                    # positive
                    criterion2 = (score_penalizing_long_dist > 0)
                    if criterion1 and criterion2:
                        # Last value is the combined paf(+limb_dist) + heatmap
                        # scores of both joints
                        connection_candidates.append(
                            [i, j, score_penalizing_long_dist,
                             score_penalizing_long_dist + joint_src[2] + joint_dst[2]])

            # Sort connection candidates based on their
            # score_penalizing_long_dist
            connection_candidates = sorted(
                connection_candidates, key=lambda x: x[2], reverse=True)
            connections = np.empty((0, 5))
            # There can only be as many limbs as the smallest number of source
            # or destination joints (eg: only 2 forearms if there's 5 wrists
            # but 2 elbows)
            max_connections = min(len(joints_src), len(joints_dst))
            # Traverse all potential joint connections (sorted by their score)
            for potential_connection in connection_candidates:
                i, j, s = potential_connection[0:3]
                # Make sure joints_src[i] or joints_dst[j] haven't already been
                # connected to other joints_dst or joints_src
                if i not in connections[:, 3] and j not in connections[:, 4]:
                    # [joint_src_id, joint_dst_id, limb_score_penalizing_long_dist, joint_src_index, joint_dst_index]
                    connections = np.vstack(
                        [connections, [joints_src[i][3], joints_dst[j][3], s, i, j]])
                    # Exit if we've already established max_connections
                    # connections (each joint can't be connected to more than
                    # one joint)
                    if len(connections) >= max_connections:
                        break
            connected_limbs.append(connections)

    return connected_limbs

将识别的链接上的链表进行标号,human_id

lib/utils/paf_to_pose.py
def group_limbs_of_same_person(connected_limbs, joint_list, config):
    """
    Associate limbs belonging to the same person together.
    :param connected_limbs: See 'return' doc of find_connected_joints()
    :param joint_list: unravel'd version of joint_list_per_joint [See 'return' doc of NMS()]
    :return: 2d np.array of size num_people x (NUM_JOINTS+2). For each person found:
    # First NUM_JOINTS columns contain the index (in joint_list) of the joints associated
    with that person (or -1 if their i-th joint wasn't found)
    # 2nd-to-last column: Overall score of the joints+limbs that belong to this person
    # Last column: Total count of joints found for this person
    """
    person_to_joint_assoc = []

    for limb_type in range(NUM_LIMBS):
        joint_src_type, joint_dst_type = joint_to_limb_heatmap_relationship[limb_type]

        for limb_info in connected_limbs[limb_type]:
            person_assoc_idx = []
            for person, person_limbs in enumerate(person_to_joint_assoc):
                if person_limbs[joint_src_type] == limb_info[0] or person_limbs[joint_dst_type] == limb_info[1]:
                    person_assoc_idx.append(person)

            # If one of the joints has been associated to a person, and either
            # the other joint is also associated with the same person or not
            # associated to anyone yet:
            if len(person_assoc_idx) == 1:
                person_limbs = person_to_joint_assoc[person_assoc_idx[0]]
                # If the other joint is not associated to anyone yet,
                if person_limbs[joint_dst_type] != limb_info[1]:
                    # Associate it with the current person
                    person_limbs[joint_dst_type] = limb_info[1]
                    # Increase the number of limbs associated to this person
                    person_limbs[-1] += 1
                    # And update the total score (+= heatmap score of joint_dst
                    # + score of connecting joint_src with joint_dst)
                    person_limbs[-2] += joint_list[limb_info[1]
                                                       .astype(int), 2] + limb_info[2]
            elif len(person_assoc_idx) == 2:  # if found 2 and disjoint, merge them
                person1_limbs = person_to_joint_assoc[person_assoc_idx[0]]
                person2_limbs = person_to_joint_assoc[person_assoc_idx[1]]
                membership = ((person1_limbs >= 0) & (person2_limbs >= 0))[:-2]
                if not membership.any():  # If both people have no same joints connected, merge into a single person
                    # Update which joints are connected
                    person1_limbs[:-2] += (person2_limbs[:-2] + 1)
                    # Update the overall score and total count of joints
                    # connected by summing their counters
                    person1_limbs[-2:] += person2_limbs[-2:]
                    # Add the score of the current joint connection to the
                    # overall score
                    person1_limbs[-2] += limb_info[2]
                    person_to_joint_assoc.pop(person_assoc_idx[1])
                else:  # Same case as len(person_assoc_idx)==1 above
                    person1_limbs[joint_dst_type] = limb_info[1]
                    person1_limbs[-1] += 1
                    person1_limbs[-2] += joint_list[limb_info[1]
                                                        .astype(int), 2] + limb_info[2]
            else:  # No person has claimed any of these joints, create a new person
                # Initialize person info to all -1 (no joint associations)
                row = -1 * np.ones(config.MODEL.NUM_KEYPOINTS + 2)
                # Store the joint info of the new connection
                row[joint_src_type] = limb_info[0]
                row[joint_dst_type] = limb_info[1]
                # Total count of connected joints for this person: 2
                row[-1] = 2
                # Compute overall score: score joint_src + score joint_dst + score connection
                # {joint_src,joint_dst}
                row[-2] = sum(joint_list[limb_info[:2].astype(int), 2]
                              ) + limb_info[2]
                person_to_joint_assoc.append(row)

    # Delete people who have very few parts connected
    people_to_delete = []
    for person_id, person_info in enumerate(person_to_joint_assoc):
        if person_info[-1] < 3 or person_info[-2] / person_info[-1] < 0.2:
            people_to_delete.append(person_id)
    # Traverse the list in reverse order so we delete indices starting from the
    # last one (otherwise, removing item for example 0 would modify the indices of
    # the remaining people to be deleted!)
    for index in people_to_delete[::-1]:
        person_to_joint_assoc.pop(index)

    # Appending items to a np.array can be costly (allocating new memory, copying over the array, then adding new row)
    # Instead, we treat the set of people as a list (fast to append items) and
    # only convert to np.array at the end
    return np.array(person_to_joint_assoc)

有点模拟信号转化为数字信号的意思,先用网络近似模拟信号,然后再解码成digital的点坐标和human_id

4.6 补充:匈牙利算法

线性整数规划问题,约束条件是整数0/1

>>> import numpy as np
>>> cost = np.array([[4, 1, 3, 1], [2, 0, 5, 2], [3, 2, 2, 2]])
>>> cost
array([[4, 1, 3, 1],
       [2, 0, 5, 2],
       [3, 2, 2, 2]])
>>> from scipy.optimize import linear_sum_assignment
>>> row_ind, col_ind = linear_sum_assignment(cost)
>>> row_ind
array([0, 1, 2])
>>> col_ind
array([3, 1, 2])
>>> cost[row_ind, col_ind]
array([1, 0, 2])

5、总结

  • 关于建模
    • 模型端到端也不是直接出结果,而是尽量预测出关键状态量,后面总归是要利用状态量做策略的
    • 模型拆解真的很棒,从原先物体识别,物体关键点检测的固有思维中跳出来
    • 层间匹配的方式,和tracking也是类似的;这里的层是头和脖子,那里是不同帧
    • 通过匹配的方式,可以容许模型的关键点个数超过实际,匹配算法会顺便过滤掉多出来的“次优解”,解决了头疼问题
  • Backbone之后,图片会缩小,那就在缩小的图上计算label,保证能对齐就行
  • 热度图计算点的位置,用L2就相当精准了
  • 关于向量场,
    • 从结果看(x, y)简化成模长值感觉也差不多,毕竟非躯干的位置基本都是0,积分后就能确定一段线段是否是躯干了
    • 但是,次优解可能就没法区分开了。比如一个人的头,识别到了相近的A,B位置,NMS没有抑制住。这时如果只有模长,估计积分结果无法区分。但用上方向场就能更准确,从而去掉另一个。
  • 匹配的方式,不一定用匈牙利,可以用贪心法

Comments