Eloquent ORM Chunk method and very weird results

First, let me say that this is not an issue with Eloquent ORM, but an issue with how Databases return results, and could be an issue with SQLSRV more than anything. However, I have not tested anything on MySQL or Maria DB just yet.

So let's establish the use case for chunk and why I am using it to begin with.

The client project has many users that have many related records...essentially like an Employer and Employees, where some Employers have up to 3k or more Employee records. Now you could query and grab all the records at once, but depending on your setup, you might run out of allocated memory for PHP. Which is where the chunk method enters. Chunk allows you to take your 3k results, and chunk them into sets of your choice, I chose 50. Then you can act on the collection of 50 records as you would loop over all 3k, but avoid running out of memory.

This is great, and generally works perfectly. And you should reach for it to make sure your applications run smoothly.

There are a couple pitfalls to be aware of though and thinking through why the pitfalls exist, actually can help you realize a little more about how relational databases work, or should work.

Chunk() and deleting records

I think a great example that is easy to see what is going on, is if you have many records and you want to delete them. Imagine your set, some Five Thousand records are in this set, and you need to delete them. You determine you can't simply get the the entire set because of memory limits, and you add the chunk method to accomplish your task.

You run your code, and then you check the count, and for some reason you are left with roughly half the results. You think for a moment, but why, I wanted it to delete all of the records.

You think about it, and eventually you think about what is happening in the queries, that the Chunk method introduced. Each time the Chunk method executes, it is grabbing another set of data. This seems ok, but you have to really think about the details of that sentence. The first chunk method, grabs "Page 1" (first 50) of results from the whole set of 5k...ok, no problem. But the second time the Chunk method executes, it grabs "Page 2" of the set that is now 4,950. This easy-to-overlook detail that the result set is changing for each Chunk execution, means that a page of results is being skipped each iteration.

The solution of adding a while loop around the query means that it will continue to execute until all the results are gone. A solution that works, even if it's not as clean as we might like.

The example above shows that you need to be cautious when using the chunk method, but the next example will really make you aware.

Chunk() and selecting records

The issue I experienced though was not as easy to figure out as the example with deleting, and that's because selecting records only (and maybe updating them) will result in the same number of records existing and therefore seems to work unless you really dig into the data. When deleting, you know that something is happening because your initial result set is reducing in size. One that stays the same, is a lot less clear as to what is happening.

Additionally, I was running the code in a local environment, a stage environment and a production environment. Locally, things were working as expected, where stage and production were inconsistent.

Working with a co-worker, the idea to inspect a few columns for the result set might prove useful. We created a script that would grab the results. echo an html table, as well as create an md5 hash of the data which would allow us to see if the results are in fact changing. Built it in local env, ran it a few times, the md5 hash was matching every time. This indicated that we are in fact getting the correct results from our query (Eloquent calls) and that our results were consistent.

We moved the exact same code to stage and checked our hash. The first time we ran it, we got something different than we got locally, and I will point out that the data in the two environments matched, because we had made a backup and restored the db locally. The data was identical. We ran it again, and got a different hash than the first hash we ran in stage. A third time, a fourth time, a fifth time, all different hashes than the one we got locally, and all different than each other.

Discussions were happening, we could not put our finger on why any of this was happening. We checked versions for SQLSRV, PHP, Apache, IIS*, and anything else we could think of. Our next plan was, let's get the raw sql query from Eloquent and see what results we get from the DB in a GUI interface. Removing any potential issues that PHP or Apache/IIS might be introducing.

Dumped the sql, replaced the prepared statement '?' and ran it locally. The results always stayed the same. We then ran it on stage, and the data initially appeared to be different each time we ran the query. We dug into the data...we had an ID field, so we could see if we were getting duplicates or not...and to our surprise, we were not getting duplicates...so we sorted the results and compared to local results and they matched.

So we were getting the same raw data back, but something was just off. Then since we had the data in the same order, we ran it through an md5 hash again, and the hash now matched our original hash locally that did not change.

And the "Aha" moment hit us

It seems the data is right, but the order is changing every time in the stage environment. We quickly added an "Order By" statement in the GUI DB program on stage, and with four executions of the sql, our result order matched every time. Quickly updated our Eloquent calls with an orderBy() call, and checked our md5 hash....matched every time. "Aha."

A Google search later, and our recent discovery is confirmed, the database returns results however it wants unless a specific order is requested.

But why is it an issue? Why does the order matter?

Thinking about the situation again, I would suggest thinking about the example when deleting records. The first chunk is deleted, leaving a set of 4950...the second chunk offsets by 50 meaning it skips the first 50 the second time around.

It might help to think about it more literally. Imagine you have a bag with 5000 stones in it, some black, some white, some gray, some green, some blue. The first 50 you pull out, will be a mix of the stones, but the second time you go to pull 50, since the stones are pulled randomly, you have to put the original 50 you pulled back into the bag. The second 50 are pulled, and it's possible you have re-pulled some of the first 50, while pulling some new ones. Then this is repeated 100 times because you know there are 5000 stones exactly. This means there will be some stones/rows that are pulled multiple times, while others are not pulled at all, which results in very wild/unpredictable results.

Now if you ordered the stones in the bag, a specific way (by color or id), then pulled the first 50, then the next 50, then the next 50, and so on. You would work your way through all the stones/rows in the set. No stones/rows pulled multiple times, and all stones/rows will be pulled.

Wrapping up

I did some Googling, trying to find out why this was happening locally and not in the stage environments, and was not able to find any conclusive information. Maybe there was a change between the SQLSRV versions locally and the stage/production version, but was not able to find anything talking about a change like that. I am also not sure if this is how all relational databases work or if it is specific to SQLSRV. Though if the number of rows are matching, but data is mismatching, then this could be the culprit.

We were running Apache locally and IIS on stage/prod...circumstance of the project.