/ mongodb

MongoDB - find and findOne with nested array filtering. Finally!

There is a TL;DR at the bottom. Cheers!

There are some unique issues that come with taking the document database approach to your app. One of them, that caught be by surprise, was filtering a nested array.

Consider the following very simple Posts collection:

{
  "author": {},
  "message": "This is post1",
  "comments": [
    {
      "message": "Im number 1!!!",
      "state": {
        "deleted": false
      }
    },
    {
      "message": "YOU MOTHERF****R",
      "state": {
        "deleted": true
      }
    },
    {
      "message": "tHIS IS GREAT!",
      "state": {
        "deleted": false
      }
    },
    {
      "message": "I can type better than you guys",
      "state": {
        "deleted": false
      }
    }
  ]
},
{
  "author": {},
  "message": "This is post 2",
  "comments": [
    {
      "message": "This is bulls**t",
      "state": {
        "deleted": true
      }
    },
    {
      "message": "YOU MOTHERF****R",
      "state": {
        "deleted": true
      }
    },
    {
      "message": "I hate u!",
      "state": {
        "deleted": true
      }
    },
    {
      "message": "I wanna have your children",
      "state": {
        "deleted": false
      }
    }
  ]
}

Suppose I want to display all the posts, coupled with the comments that were not deleted. I.E. filter out the deleted comments from the comments array.
Well - this turns out to be not so trivial.

The simple solution

In the MongoDB world it is OK to distribute a job between the client and the server. The official approach to these type of issues is to do the filtering on the client side.
So one would db.getCollection('posts').find() whatever posts are needed and then filter the nested comments array using Array.map() (yes, we're JSing now).

The problem

Turns out that my specific case has 1000s of deleted comments per post. Transferring so much redundant data over the network just to have it thrown away by the client seems wrong. So I had to find out a way of doing the filtering on the server.

So many options, so many fail

If you're just here for the solution, jump over to the next paragraph.

I'll briefly go over the solutions one might find when Googling this issue:
db.getCollection('posts').find({"comments.state.deleted": false}) returns only posts that have legal comments, but also the entire comments array per post document. There is no way of projecting just the necessary data.

Some SO answers point to aggregation. So I've tried $elemMatch but it only returns the first element in the array.

Lots of answers advise you to $unwind the array first (meaning to flatten it) and then filter and group the results. This creates many problems, most of them could be summed up by not being able to end up with an array structure that is identical to the original one.

MongoDB does have an aggregation filtering stage called $redact but sadly it enforces the same condition on all of the document levels.
So

{
     $redact: {
         $cond: {
              if: {$eq: ["$state.deleted", false]},
              then: "$$DESCEND",
              else: "$$PRUNE"
          },
    },
},

Requires you to have a state object on the post document as well. And also on every nested document.

This got me thinking about a way to make $redact aware of its position in the tree. And apparently I'm not the only one. But alas, this is not supported by MongoDB:

 {$ne: ["$$ROOT", "$$CURRENT"]}] // We dont wanna $redact the top level

Finally, I don't know what this is all about, but I sure can spot an overkill when I see one. So $map is not the way to go.

Simply $filtering the results does what $match does - mainly filter the top level posts. Not the comments, or vise versa - filter the comments array alone.

pexels-photo-408517

TL;DR: The solution

So it turns out that aggregation is the way to go. We just need to employ a little trick.

MongoDB has an aggregation pipe called $addFields which allows you to... add fields to the result. And also the afformentioned $filter pipe.

What if we could filter the array only, but instead of returning the filtered array as the result, use it to override the original one? Turns out we can:

db.getCollection('posts').aggregate(
    {$match: {"author.id": authorId}},
	{$addFields : {"comments":{$filter:{ // We override the existing field!
		input: "$comments",
		as: "comment",
		cond: {$eq: ["$$comment.state.deleted", false]}
	}}}}
);

That's it. This little trick allows nested array filtering. I also suspect that I could patch it for nested nested array filtering, but that's for another post.

B.t.w the results:

{
  "author": {},
  "message": "This is post1",
  "comments": [
    {
      "message": "Im number 1!!!",
      "state": {
        "deleted": false
      }
    },
    {
      "message": "tHIS IS GREAT!",
      "state": {
        "deleted": false
      }
    },
    {
      "message": "I can type better than you guys",
      "state": {
        "deleted": false
      }
    }
  ]
},
{
  "author": {},
  "message": "This is post 2",
  "comments": [
    {
      "message": "I wanna have your children",
      "state": {
        "deleted": false
      }
    }
  ]
}