Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
219 views
in Technique[技术] by (71.8m points)

indexing - MongoDB Time-Spatial Query is slower with 2dsphere Index

Recently, I started to investigate the performance of MongoDB with AIS Data. I used a collection with 19m documents with proper field types as described in the definition. I also created a new geoloc field with type: (Point) from coordinates (lon,lat) in this same collection.

The query under investigation is:

db.nari_dynamic.explain('executionStats').aggregate
(
[
  {
      "$match": {
           "geoloc": {
               "$geoWithin": {
                   "$geometry": {
                       "type" : "Polygon" ,
                       "coordinates": [ [ [ -5.00, 45.00 ], [ +0.00, 45.00 ], [ +0.00, 50.00 ], [ -5.00, 50.00 ], [ -5.00, 45.00 ] ] ]
              }}}}
  },

  { "$group": {"_id": "$sourcemmsi", "PointCount": {"$sum" : 1}, "MinDatePoint": {"$min" : {"date": "$t3" }}, "MaxDatePoint": {"$max" : {"date": "$t3" }} }},
  { "$sort": {"_id":1} },
  { "$limit":100 },
  { "$project": {"_id":1, "PointCount":1, "MinDatePoint":1, "MaxDatePoint":1} }
],
{ explain:true}
)

During investigation and testing I found the following:

  1. Without any index: 94s
  2. With geoloc-2dsphere index: 280s

Here are the Execution Stats: Without the Index

{ stages: 
   [ { '$cursor': 
        { queryPlanner: 
           { plannerVersion: 1,
             namespace: 'mscdata.nari_dynamic',
             indexFilterSet: false,
             parsedQuery: 
              { geoloc: 
                 { '$geoWithin': 
                    { '$geometry': 
                       { type: 'Polygon',
                         coordinates: [ [ [ -5, 45 ], [ 0, 45 ], [ 0, 50 ], [ -5, 50 ], [ -5, 45 ] ] ] } } } },
             queryHash: '6E2EAB94',
             planCacheKey: '6E2EAB94',
             winningPlan: 
              { stage: 'PROJECTION_SIMPLE',
                transformBy: { sourcemmsi: 1, t3: 1, _id: 0 },
                inputStage: 
                 { stage: 'COLLSCAN',
                   filter: 
                    { geoloc: 
                       { '$geoWithin': 
                          { '$geometry': 
                             { type: 'Polygon',
                               coordinates: [ [ [ -5, 45 ], [ 0, 45 ], [ 0, 50 ], [ -5, 50 ], [ -5, 45 ] ] ] } } } },
                   direction: 'forward' } },
             rejectedPlans: [] } } },
     { '$group': 
        { _id: '$sourcemmsi',
          PointCount: { '$sum': { '$const': 1 } },
          MinDatePoint: { '$min': { date: '$t3' } },
          MaxDatePoint: { '$max': { date: '$t3' } } } },
     { '$sort': { sortKey: { _id: 1 }, limit: 100 } },
     { '$project': 
        { _id: true,
          PointCount: true,
          MaxDatePoint: true,
          MinDatePoint: true } } ],
  serverInfo: 
   { host: 'ubuntu16',
     port: 27017,
     version: '4.4.1',
     gitVersion: 'ad91a93a5a31e175f5cbf8c69561e788bbc55ce1' },
  ok: 1 }

Here are the Execution Stats: With the Index

{ stages: 
   [ { '$cursor': 
        { queryPlanner: 
           { plannerVersion: 1,
             namespace: 'mscdata.nari_dynamic',
             indexFilterSet: false,
             parsedQuery: 
              { geoloc: 
                 { '$geoWithin': 
                    { '$geometry': 
                       { type: 'Polygon',
                         coordinates: [ [ [ -5, 45 ], [ 0, 45 ], [ 0, 50 ], [ -5, 50 ], [ -5, 45 ] ] ] } } } },
             queryHash: '6E2EAB94',
             planCacheKey: 'F35B194B',
             winningPlan: 
              { stage: 'PROJECTION_SIMPLE',
                transformBy: { sourcemmsi: 1, t3: 1, _id: 0 },
                inputStage: 
                 { stage: 'FETCH',
                   filter: 
                    { geoloc: 
                       { '$geoWithin': 
                          { '$geometry': 
                             { type: 'Polygon',
                               coordinates: [ [ [ -5, 45 ], [ 0, 45 ], [ 0, 50 ], [ -5, 50 ], [ -5, 45 ] ] ] } } } },
                   inputStage: 
                    { stage: 'IXSCAN',
                      keyPattern: { geoloc: '2dsphere' },
                      indexName: 'geoloc-field',
                      isMultiKey: false,
                      multiKeyPaths: { geoloc: [] },
                      isUnique: false,
                      isSparse: false,
                      isPartial: false,
                      indexVersion: 2,
                      direction: 'forward',
                      indexBounds: 
                       { geoloc: 
                          [ '[936748722493063168, 936748722493063168]',
                            '[954763121002545152, 954763121002545152]',
                            '[959266720629915648, 959266720629915648]',
                            '[960392620536758272, 960392620536758272]',
                            '[960674095513468928, 960674095513468928]',
                            '[960744464257646592, 960744464257646592]',
                            '[960762056443691008, 960762056443691008]',
                            '[960766454490202112, 960766454490202112]',
                            '[960767554001829888, 960767554001829888]',
                            '[960767828879736832, 960767828879736832]',
                            '[960767897599213568, 960767897599213568]',
                            '[960767914779082752, 960767914779082752]',
                            '[960767919074050048, 960767919074050048]',
                            '[960767920147791872, 960767920147791872]',
                            '[960767920416227328, 960767920416227328]',
                            '[960767920483336192, 960767920483336192]',
                            '[960767920500113408, 960767920500113408]',
                            '[960767920504307712, 960767920504307712]',
                            '[960767920505356288, 960767920505356288]',
                            '[960767920505618432, 960767920505618432]',
                            '[960767920505683968, 960767920505683968]',
                            '[960767920505683969, 960767920505716735]',
                            '[1345075088707977217, 1345075088708009983]',
                            '[1345075088708009984, 1345075088708009984]',
                            '[1345075088708075520, 1345075088708075520]',
                            '[1345075088708337664, 1345075088708337664]',
                            '[1345075088709386240, 1345075088709386240]',
                            '[1345075088713580544, 1345075088713580544]',
                            '[1345075088730357760, 1345075088730357760]',
                            '[1345075088797466624, 1345075088797466624]',
                            '[1345075089065902080, 1345075089065902080]',
                            '[1345075090139643904, 1345075090139643904]',
                            '[1345075094434611200, 1345075094434611200]',
                            '[1345075111614480384, 1345075111614480384]',
                            '[1345075180333957120, 1345075180333957120]',
                            '[1345075455211864064, 1345075455211864064]',
                            '[1345076554723491840, 1345076554723491840]',
                            '[1345080952770002944, 1345080952770002944]',
                            '[1345098544956047360, 1345098544956047360]',
                            '[1345168913700225024, 1345168913700225024]',
                            '[1345450388676935680, 1345450388676935680]',
                            '[1346576288583778304, 1346576288583778304]',
                            '[1351079888211148800, 1351079888211148800]',
                            '[1369094286720630784, 1369094286720630784]',
                            '[5116089176692883456, 5116089176692883456]',
                            '[5170132372221329408, 5170132372221329408]',
                            '[5179139571476070401, 5179702521429491711]',
                            '[5179702521429491713, 5180265471382913023]',
                            '[5180265471382913024, 5180265471382913024]',
                            '[5183643171103440896, 5183643171103440896]',
                            '[5187020870823968768, 5187020870823968768]',
                            '[5187020870823968769, 5187583820777390079]',
                            '[5187583820777390081, 5188146770730811391]',
                            '[5188146770730811393, 5197153969985552383]',
                            '[5206161169240293376, 5206161169240293376]',
                            '[5218264593238851584, 5218264593238851584]',
                            '[5218264593238851585, 5218405330727206911]',
                            '[5218546068215562240, 5218546068215562240]',
                            '[5218546068215562241, 5219109018168983551]',
                            '[5219671968122404864, 5219671968122404864]',
                            '[5220234918075826177, 5220797868029247487]',
                            '[5220797868029247488, 5220797868029247488]',
                            '[5220938605517602817, 5221079343005958143]',
                            '[5221079343005958144, 5221079343005958144]',
                            '[5260204364768739328, 5260204364768739328]' ] } } } },
             rejectedPlans: [] } } },
     { '$group': 
        { _id: '$sourcemmsi',
          PointCount: { '$sum': { '$const': 1 } },
          MinDatePoint: { '$min': { date: '$t3' } },
          MaxDatePoint: { '$max': { date: '$t3' } } } },
     { '$sort': { sortKey: { _id: 1 }, limit: 100 } },
     { '$project': 
        { _id: true,
          MinDatePoint: true,
          MaxDatePoint: true,
          PointCount: true } } ],
  serverInfo: 
   { host: 'ubuntu16',
     port: 27017,
     version: '4.4.1',
     gitVersion: 'ad91a93a5a31e175f5cbf8c69561e788bbc55ce1' },
  ok: 1 }

Of course, I understand that is more complex as the query has a grouping function, but the idea is that usually, we will get something quicker and not slower with the index unless the index causes a different sorting inside the engine as geoNear does.

Also, there is a complete analysis from MongoDB if how the queries and index improvements have an impact on the queries, but not so much info for geoWithin. MongoDB states that the results are not sorted with GeoWithin, so I don't find the reason for the delay. https://www.mongodb.com/blog/post/geospatial-performance-improvements-in-mongodb-3-2

Any ideas or opinions, why the query with the index is slower?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

After a lot of investigation, appears that once a query is requesting more than 70% of the dataset, in this case, 95% having an index is slower than not having that index.

This situation is also present with other indexes than geospatial, like simple indexes in numeric or descriptive columns (ship_name, ship_number, or timestamp).

This is happening because the RDBMS has to search the keys of the index and also the keys of the documents, this results in higher execution times.

On the other hand, this should not be happening as the Mongo-Planner should be able to address this problem and not give the index for further use, keeping the accessing of the keys low.

The issue opened in MongoDB support, and can be found here:

https://jira.mongodb.org/browse/SERVER-53709


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...