BEV模型深度估计模块Overview

Multi-view 3D

系统梳理

发布日期: 2022-07-26

文章字数: 586

阅读时长: 2 分

本文对BEV模型深度估计模块进行了梳理。

1.图像特征直接卷积

LSS、Fiery、BEVDet

深度估计模块（均为原始的LSS的模块，这里以Fiery的模块为例）

截屏2022-09-28 14.57.39.png

loss计算

没有单独计算Depth部分的loss，直接计算的3D box的loss

2.Camera-Aware卷积

BEVDepth

深度估计模块

loss计算

[H/16, W/16]的特征图和相应的真值之间计算BCE loss。
真值是将LiDAR点云先通过内外参投影到像素坐标系，经过resize处理后得到[256, 704]的depth真值。划分16×16的patch，每个patch取有深度点的最小深度值，若不在深度范围内则赋值0。得到[16, 44]的gt_depth。

LiDAR点云真值downsample代码

def get_downsampled_gt_depth(self, gt_depths):
    B, N, H, W = gt_depths.shape
    gt_depths = gt_depths.view(B*N, H//self.downsample_factor,
        self.downsample_factor,
        W // self.downsample_factor, self.downsample_factor, 1)
    gt_depths = gt_depths.permute(0, 1, 3, 5, 2, 4).contiguous()
    gt_depths = gt_depths.view(
        -1, self.downsample_factor * self.downsample_factor)
    gt_depths_tmp = torch.where(gt_depths == 0.0,
                                1e5 * torch.ones_like(gt_depths),
                                gt_depths)
    gt_depths = torch.min(gt_depths_tmp, dim=-1).values
    gt_depths = gt_depths.view(B * N, H // self.downsample_factor,
                               W // self.downsample_factor)
    gt_depths = (gt_depths -
                 (self.dbound[0] - self.dbound[2])) / self.dbound[2]
    gt_depths = torch.where(
        (gt_depths < self.depth_channels + 1) & (gt_depths >= 0.0),
        gt_depths, torch.zeros_like(gt_depths))   # 6, 16, 44
    gt_depths = F.one_hot(gt_depths.long(),
                          num_classes=self.depth_channels + 1).view(
                              -1, self.depth_channels + 1)[:, 1:]
    return gt_depths.float()

单目上Camera-Aware卷积的模型

DD3D

深度估计模块

先进行深度网络预训练。
“Two paths in DD3D from the input image to the 3D bounding box and to the dense depth prediction differ only in the last 3×3 convolutional layer, and thus share nearly all parameters.”