Monday, February 11, 2013

Aggregation framework in MongoDB.

Aggregation framework was added to MongoDB version 2.2 to give functionality of aggregation of data in the collection and its processing.
Basic syntax looks like this:
  db.<collection name>.aggregate([...])
where <collection name> – a placeholder for your collection,
[...] – pipelines of operators, described below.

There are 7 different aggregation stages:
  • $project – select keys we interested in, reshape the document.
  • $match – filter document.
  • $group – aggregate documents by certain keys.
  • $sort – sort documents.
  • $skip – skip number of documents.
  • $limit – limit result to specific number of documents.
  • $unwind – unjoin data, produces docs for every element in unwinding array in a document.
All stages are executes one by one and as an input receives result of executing of previous stage, and it's result is passed to next one stage.

$group

Base syntax:
$group: {
  _id: "$<group_by_field_name>",
  <operator_result_name>: {
    <operator>: "$<field name>"
  }
} 

_id points on a key grouping should be performed by. If we need to group by few fields as a value for _id should be passed document containing all needed fields:
_id: {
  <group_key1>: "$<group_by_field_name1>",
  <group_key2>: "$<group_by_field_name2>"
}

There are many operators acceptable to group stage. Result of executing every of them is assigned to operator_result_name. Here some of them:
  • $sum – sum by the specified key field or simple value.
  • Sum values of field:
    $sum: "$<field>"
    
    Add 1 for every document in a group:
    $sum: 1
    
  • $avg – average value:
  • $avg: "$<field>"
    
  • $min/$max – minimum/maximum value in a group:
  • $min: "$<field>"
    
  • $push – add every value in a group to array. Produces array of values with duplicates. To get array of unique values next operator used.
  • $addToSet – add every value in a group to a set of unique elements.
  • $first/$last – the first/last of the values in a group. Have sense to use after $sort stage.

    $project

    Reshape input document:
    • remove keys
    • add new keys
    • reshape keys
    • use simple functions on keys. Operators acceptable for $group stage can be used here:
      • $toUpper – translate field value to lower case:
        <field name>: { $toUpper: "$<field>" }
      • $toLower – translate field value to upper case
      • $add – add to field value. For example add 10 to existing field and write result to new field (source and result field names can be same):
        <field name>: { $add: ["$<field>", 10] }
      • ...
    Base syntax:
    db.<collection>.aggregate([
      { 
        $project: {
          <new_field>: <new_value>,
          <field>: { $toLower: "$<field>" },
          <field_to_exclude>: 0,
          <field_to_pass_through>: 1,
          <field_new_name>: <field_old_name>,
          <doc_field>: {
            <new_field>: "$<field>",
            <new_field>: { $multiply: ["$<field>", <value>] }
          },
          ...
        }
      }
    ])
    
    To exclude field from the projection simply do:
    <field_to_exclude>: 0
    
    To include field as is:
    <field_to_pass>: 1
    
    By default all fields but _id are excluded.

    $match

    Return only documents which satisfy matching expression.
    Performs as a filter. Has reducing effect.
    { $match: { <field>: <looking value> } }
    

    $sort

    Sorts documents in a specified order by a set of fields.
    (!) Can be a real memory hug.
    { $sort: { <field>: <order> } }
    
    Order element can have values:
    • 1 – ASC/ascending order,
    • -1 – DESC/descending order.

    $skip and $limit

    Performs as usual skip and limit functions in MongoDB: skips/limits output to first n documents.
    Not very useful without $sort stage.
    { $skip: <skip value> }
    
    { $limit: <limit value> }

    $unwind

    Unwinds arrays in the document on a couple of new documents with only one value for array field instead of a whole array.
    Example:
    { a:1, b:2, c:['c1', 'c2', 'c3'] }
    
    Unwinding of field c:
    { $unwind: "$c" }
    
    will result into three separate documents:
    { a:1, b:2, c: "c1" },
    { a:1, b:2, c: "c2" },
    { a:1, b:2, c: "c3" }
    

    Conclusion

    Aggregation Framework is a powerful feature of MongoDB but it has to be used with wisdom. It gives almost all functionality as SQL has, but has several limitations except binding to one collection:
    • result set limited to 16MB of memory;
    • cannot use more then 10% of memory on a machine;
    • on sharding environment after first $group or $sort data should be collected on mongos.
    SQL to Aggregation Framework:

    WHERE
    GROUP BY
    HAVING
    SELECT
    ORDER BY
    LIMIT
    SUM()
    COUNT()
    JOIN

    More at Aggregation Framework Reference.