Using mongodb aggregation pipeline

Share

Until recently, relational databases were the only option, however, over the past few years nosql bases are becoming increasingly used

Often we choose inertia of known technology, but it is not always a good choice. When you planning a new project, it is preferable to assume that the most important developments of the same, as well as its use, which are the specific features that are important to us.

In this article I will talk about mongodb – NoSQL database that is not relational, that stores json (in fact bson – binary JSON) documents, specifically on its “aggregation pipeline”.

What are the benefits of MongoDB and when it makes sense to choose for a project?

You need a big speed: MongoDB with the default settings (so-called “fire and forget” mode of entry) favors speed in relation to the reliability of transactions, so if the data is not critical and can not afford to be a negligible percentage of them is lost, there is mongo right choice. If you want to be sure to enroll in mongo database, can be sent to option: safe = true and thus at the expense of speed of entry to get the certainty that every data entered.

Mongodb is naturally distributed, and is suitable for horizontal scaling, so if you expect to work with a large amount of data it can be a good choice.

Adding a new column to the collection (the equivalent table in relational databases) is trivial, just add field json file and it is recorded in the mongo collection, while other data can be the same without the field.

One should be careful in writing the query and as documents of different formats in the same collection may bring some problems.

Query in mongo are written in JavaScript, the logic is different and demanding writing queries for us greatly facilitates the “aggregation pipeline.”

It consists of 10 types of transformations that are connected to each other (in that pipeline, as “piping” in unix terminal).

These are:

  •     $geoNear
  •     $match
  •     $project
  •     $redact
  •     $unwind
  •     $group
  •     $limit
  •     $skip
  •     $sort
  •     $out

In order not to expand too much the topic by explaining what each of them does, here’s a link to the documentation http://docs.mongodb.org/manual/reference/operator/aggregation/ and here I will show several examples of use.

First fill some data collection:

db.Student.insert ({Student_Name:"Kalki",  Class: "2", Mark_Scored:100, Subject: ["Tamil", "English", "Maths"]})

db.Student.insert ({Student_Name:"Matsya", Class: "1", Mark_Scored:10,  Subject: ["Tamil", "English"]})


db.Student.insert ({Student_Name:"Krishna",Class: "1", Mark_Scored:50,  Subject: ["Tamil"]})

db.Student.insert ({Student_Name:"Buddha", Class: "2", Mark_Scored:60,  Subject: ["Tamil"]})

db.Student.insert ({Student_Name:"Rama",   Class: "2", Mark_Scored:80,  Subject: ["Tamil"]})


db.Student.insert ({Student_Name:"Krishna",Class: "1", Mark_Scored:50,  Subject: ["English"]})

db.Student.insert ({Student_Name:"Buddha", Class: "2", Mark_Scored:60,  Subject: ["English"]})

db.Student.insert ({Student_Name:"Rama",   Class: "2", Mark_Scored:80,  Subject: ["English"]})


db.Student.insert ({Student_Name:"Matsya", Class: "1", Mark_Scored:67,  Subject: ["Maths"]})

db.Student.insert ({Student_Name:"Krishna",Class: "1", Mark_Scored:95,  Subject: ["Maths"]})

db.Student.insert ({Student_Name:"Buddha", Class: "2", Mark_Scored:88,  Subject: ["Maths"]})

db.Student.insert ({Student_Name:"Rama",   Class: "2", Mark_Scored:40,  Subject: ["Maths"]})

Collection Student enrollment is automatically created in the first document, and used as an index _id field by default.

1) We want to reach all of the students who attended Class 2 and having more than 80 points on the test

db.Student.aggregate ([

  {

     "$match":

     {

        "Class":"2",

     }

  },

  {

     "$match":

     {

        "Mark_Scored":

        {

           "$gte": 80

        }

     }

  }

  ])

gives the result:

{

            "result" : [

                        {

                                    "_id" : ObjectId("554a5921c1be050f65fbae96"),

                                    "Student_Name" : "Kalki",

                                    "Class" : "2",

                                    "Mark_Scored" : 100,

                                    "Subject" : [

                                                "Tamil",

                                                "English",

                                                "Maths"

                                    ]

                        },

                        {

                                    "_id" : ObjectId("554a5921c1be050f65fbae9a"),

                                    "Student_Name" : "Rama",

                                    "Class" : "2",

                                    "Mark_Scored" : 80,

                                    "Subject" : [

                                                "Tamil"

                                    ]

                        },

                        {

                                    "_id" : ObjectId("554a5921c1be050f65fbae9d"),

                                    "Student_Name" : "Rama",

                                    "Class" : "2",

                                    "Mark_Scored" : 80,

                                    "Subject" : [

                                                "English"

                                    ]

                        },

                        {

                                    "_id" : ObjectId("554a5921c1be050f65fbaea0"),

                                    "Student_Name" : "Buddha",

                                    "Class" : "2",

                                    "Mark_Scored" : 88,

                                    "Subject" : [

                                                "Maths"

                                    ]

                        }

            ],

            "ok" : 1

}

2) When data is stored in the form of a series, we need $ unwind:

db.Student.aggregate ([

   {

      "$match":

      {

         "Student_Name": "Kalki",

      }

   },

   {

      "$unwind": "$Subject"

   }

])


gives :


{

            "result" : [

                        {

                                    "_id" : ObjectId("554a5921c1be050f65fbae96"),

                                    "Student_Name" : "Kalki",

                                    "Class" : "2",

                                    "Mark_Scored" : 100,

                                    "Subject" : "Tamil"

                        },

                        {

                                    "_id" : ObjectId("554a5921c1be050f65fbae96"),

                                    "Student_Name" : "Kalki",

                                    "Class" : "2",

                                    "Mark_Scored" : 100,

                                    "Subject" : "English"

                        },

                        {

                                    "_id" : ObjectId("554a5921c1be050f65fbae96"),

                                    "Student_Name" : "Kalki",

                                    "Class" : "2",

                                    "Mark_Scored" : 100,

                                    "Subject" : "Maths"

                        }

            ],

            "ok" : 1

}

– We can see that we get the individual documents for each array element in which is applied $ unwind (unwinding literally).

3) This is an example of aggregate functions:

– average grades of students in class 2

db.Student.aggregate ([

   {

      "$match":

      {

         "Class": "2"

      }

   },

   {

      "$unwind": "$Subject"

   },

   {

      "$group":

      {

         "_id":

         {

            "Student_Name" : "$Student_Name"

         },

         "Avg_Marks":

         {

            "$avg": "$Mark_Scored"

         }

      }

   }

])

gives result:

{

            "result" : [

                        {

                                    "_id" : {

                                                "Student_Name" : "Rama"

                                    },

                                    "Avg_Marks" : 66.66666666666667

                        },

                        {

                                    "_id" : {

                                                "Student_Name" : "Buddha"

                                    },

                                    "Avg_Marks" : 69.33333333333333

                        },

                        {

                                    "_id" : {

                                                "Student_Name" : "Kalki"

                                    },

                                    "Avg_Marks" : 100

                        }

            ],

            "ok" : 1

}

and if we want to get more orderly output data, at the end add $ project team that projects what we need in the new pipe in a more orderly form

db.Student.aggregate ([

   {

      "$match":

      {

         "Class": "2"

      }

   },

   {

      "$unwind": "$Subject"

   },

   {

      "$group":

      {

         "_id":

         {

            "Student_Name" : "$Student_Name"

         },

         "Avg_Marks":

         {

            "$avg": "$Mark_Scored"

         }

      }

   },

   {

      "$project":

      {

         "_id":0,

         "Name":  "$_id.Student_Name",

         "Average": "$Avg_Marks"

      }

   }

])

gives result

{

            "result" : [

                        {

                                    "Name" : "Rama",

                                    "Average" : 66.66666666666667

                        },

                        {

                                    "Name" : "Buddha",

                                    "Average" : 69.33333333333333

                        },

                        {

                                    "Name" : "Kalki",

                                    "Average" : 100

                        }

            ],

            "ok" : 1

}

and if we want to sort it eventually add $ sort pajp, where 1 means asc -1 means desc

db.Student.aggregate ([

   {

      "$match":

      {

         "Class": "2"

      }

   },

   {

      "$unwind": "$Subject"

   },

   {

      "$group":

      {

         "_id":

         {

            "Student_Name" : "$Student_Name"

         },

         "Avg_Marks":

         {

            "$avg": "$Mark_Scored"

         }

      }

   },

   {

      "$project":

      {

         "_id":0,

         "Name":  "$_id.Student_Name",

         "Average": "$Avg_Marks"

      }

   },

   {

      "$sort":

      {

         "Average": -1

      }

   }

])

and gives result:

{

            "result" : [

                        {

                                    "Name" : "Kalki",

                                    "Average" : 100

                        },

                        {

                                    "Name" : "Buddha",

                                    "Average" : 69.33333333333333

                        },

                        {

                                    "Name" : "Rama",

                                    "Average" : 66.66666666666667

                        }

            ],

            "ok" : 1

}

I hope it will be interesting to play around with this and maybe mongo choose as a base for a project.

Good luck with piping!

 

Share

Sign in to get new blogs and news first:

Leave a Reply

Miodrag Opačić

Senior Back-end developer @Cosmic Development
mm

Miodrag Opacic has 10 years of experience and has a great knowledge in different areas of information technology. He works one year in company Cosmic Development as a senior back-end developer. In his spare time makes a variety of culinary specialties and enjoys listening panel.

Sign in to get new blogs and news first.

Categories