MongoDB 聚合优化

聚合管道优化

1、$group 不使用索引，想要使用要借助 $match、$sort

2、$match 、 $sort 都可以使用索引，但是 key 要小于等于索引

3、$match 和 $sort 共存时，聚合优化器会自动把 $match 提到 $sort 之前

4、可以使用 $match 强行匹配索引，里面有的字段不确定的条件为不等于 1 个不太可能中签的值

实例

1、想实现按 type 和 vertical 聚合操作，然后统计数量

按 type vertical 创建索引

db.getCollection('articleCollection').aggregate([
{
    $group: {
    _id: {
        type: "$type",
        vertical: "$vertical"
    },
    count: {$sum: 1}
    }
}])

数据库里有不到 5w 条数据, 然后需要 6s！

2、通过 explain 分析，发现没有使用索引

db.getCollection('articleCollection').aggregate([
{
    $group: {
    _id: {
        type: "$type",
        vertical: "$vertical"
    },
    count: {$sum: 1}
    }
}], {explain: true})

通过查看 winningPlan 里的 stage 为 COLLSCAN，意思是未使用索引

{
    ...,
    "winningPlan" : {
        "stage" : "COLLSCAN",
        "direction" : "forward"
    }
    ...,
}

3、经过查询文档，发现 $group 是不使用索引，要想使用得需要使用别的办法

这里用了 $sort

db.getCollection('articleCollection').aggregate([
    {
        $sort: {
            type : 1 ,
            vertical : 1
         }
     },
     {
         $group: {
            _id: {
               type: "$type",
               vertical: "$vertical"
            },
            count: {$sum: 1}
         }
     }
])

查询时间为 0.187s! 基本能解决了我的问题

4、但是我还有其他的参数，比如有个 score、pushTime, 都需要作为 $match 参数搜索

当 $sort 和 $match 一起使用时

db.getCollection('articleCollection').aggregate([
    {
        $sort: {
            type : 1 ,
            vertical : 1
         }
     },
    {
        $match: {
            score : {
                $gte: 100
            },
            pushTime : {
                $gte: 100
            }
         }
     },
     {
         $group: {
            _id: {
               type: "$type",
               vertical: "$vertical"
            },
            count: {$sum: 1}
         }
     }
])

发现搜索时间又变成了 6s！

经过查询文档，发现 mongodb 的聚合优化器会把 $match 提到 $sort 之前，导致了不能使用索引

5、多次试验后使用的方案:

创建了 type vertical score publishTime 作为索引

db.getCollection('articleCollection').aggregate([
    {
        $match: {
            type: {$ne: "随便写的"},
            vertical: {$ne: "随便写的"},
            score: {$gte: 100},
            publishTime: {$gte: 0}
         }
     },
     {
         $group: {
            _id: {
               type: "$type",
               vertical: "$vertical"
            },
            count: {$sum: 1}
         }
     }
])

其中 type vertical 只要不等于给定的值就可以

成功实现了 0.2s 内返回数据!

6、最后的方案

经过数据对比，发现少统计了一部分数据

原来的有的数据是没有 score、publishTime 字段的，group 的时候会变成 null, 但是我们限定了 >= 0 ,所以不能被搜索到

索性都改成不等于 1 个值

db.getCollection('articleCollection').aggregate([
    {
        $match: {
            type: {$ne: "随便写的"},
            vertical: {$ne: "随便写的"},
            score: {$ne: "随便写的"},
            publishTime: {$ne: "随便写的"},
         }
     },
     {
         $group: {
            _id: {
               type: "$type",
               vertical: "$vertical"
            },
            count: {$sum: 1}
         }
     }
])

完美的实现了我们的需求

聚合管道优化 #

实例 #

1、想实现 按 type 和 vertical 聚合操作，然后 统计 数量 #

2、通过 explain 分析，发现没有使用索引 #

3、经过 查询文档，发现 $group 是不使用 索引，要想使用 得需要 使用 别的办法 #

4、但是 我 还有 其他的 参数，比如 有个 score、pushTime, 都需要 作为 $match 参数 搜索 #

5、多次试验后 使用的 方案: #

6、最后的 方案 #