I discovered that MySQL sometimes returns the same entries multiple times when paginating a result set. I noticed that it does not happen when sorting in ascending order or when leaving out the order by clause from the query.
MySQL returns duplicates when sorting in descending order.
So I got a mail from a client this morning saying that there was an error in the API I’m working on because it was returning duplicates. The client noticed that in the result set that was paginated over four pages, pages two and three had two duplicate IDs in them. This shouldn’t be possible since these IDs are in an auto-increment column in the database table.
This application is built with the Laravel based framework Lumen so I figured that it had to have something to do with the logic in the Laravel ORM, Eloquent. I thought that it might have been some bug that got fixed in more recent versions of the framework so I started googling for similar cases. I couldn’t find any information on similar issues so I figured that I would just extract the raw queries from the ORM and run them directly in MySQL so I fired up Sequel Pro and started querying the database.
I’m running MySQL 5.7.21 in Docker and haven’t tested other MySQL versions so I’m not sure if this is specific to this version. However, once I ran the queries in the database I found out that the problem was not in the Eloquent ORM but in MySQL itself. I was relieved that it was not the application but at the same time perplexed by not knowing how to fix this issue since I do not know the internals of MySQL.
The dataset that I was working with has a lot of created_at timestamps with the exact same date. MySQL returned the correct results when I was not using offset and limit to paginate the result. The same went for when I was sorting the data in ascending order and also when leaving out the order by clause in the query. The problem occurred when using offset and limit to paginate the result set and at the same time sorting it by created_at timestamp in descending order.
The following four queries were the ones that I executed to paginate the total 190 entries into four pages. Sure enough, the IDs 21990 and 21991 appeared on both page number 2 and 3.
I’m pretty sure this will not happen if you do not have many entries with the same value. That’s because if you are sorting on one column with many duplicates the order of them is not guaranteed if you are not specifying a secondary sorting column. So, in this case, the duplicate data was because of this specific dataset. That said, it is good practice to always sort by a second column as well, especially when working with pagination that is sorted on a column with many duplicates.
I sure learned my lesson and I hope that this was helpful for anyone out there in a similar situation.
I’ll finish the article by dropping the whole dataset here, have a good one!