This is something I ran across as part of my job. The query at first seems to work by magic, but some thought into what it must be doing helped. I share my conclusions here. I looked up some Microsoft documentation on it, which was tantalizing and confusing at the same time, as is typical.
Here's the Microsoft documentation: http://www.databasejournal.com/features/mssql/article.php/3485291
Here's the science behind the magic.
Let's talk about this query:
select * from credential c
where exists (select null from credential c2
where c.key = c2.key and c.id > c2.id
What this query does is that given the case where a set of records have a certain key, this query will return all records from that set EXCEPT for the one with the lowest id.
Where this is particularly powerful is in a case like this:
delete c from credential c
where exists (select null from credential c2
where c.key = c2.key and c.id > c2.id
That is, for a given set of records with a certain key, this query will DELETE all records from that set EXCEPT the one with the lowest id (presumably the oldest one). VERY useful when cleaning up data corruption having to do with duplicate records!
What's it doing?
Note the where clause. First, it ensures that the 'key' field matches. Otherwise, the query is doing a cross-product of the credential table with itself. It then asks the question, Is the id from table 1 greater than table 2? If it is, it returns (deletes) that record. Otherwise, it does not.
Let's look at that with some data.
| Key |
Id |
| 100 |
10 |
| 100 |
20 |
| 100 |
30 |
| 200 |
40 |
| 200 |
50 |
Below is the cross-product it does, and the answer to its question, by row:
| Key0 |
Id0 |
Key1 |
Id1 |
Id0 > Id1? |
| 100 |
10 |
100 |
10 |
No |
| 100 |
10 |
100 |
20 |
No |
| 100 |
10 |
100 |
30 |
No |
| 100 |
20 |
100 |
10 |
Yes |
| 100 |
20 |
100 |
20 |
No |
| 100 |
20 |
100 |
30 |
No |
| 100 |
30 |
100 |
10 |
Yes |
| 100 |
30 |
100 |
20 |
Yes |
| 100 |
30 |
100 |
30 |
No |
| 200 |
40 |
200 |
40 |
No |
| 200 |
40 |
200 |
50 |
No |
| 200 |
50 |
200 |
40 |
Yes |
| 200 |
50 |
200 |
50 |
No |
...which means that, for the delete query, it will remove records as follows. The Id=30 is repeated and will be "deleted" twice.
| Key |
Id |
| 100 |
20 |
| 100 |
30 |
| 200 |
50 |
This leaves, of course, what we wanted, which was to have a single record for each Key, and the Id that is left over is the lowest one for each Key:
So, not really magic after all. And very powerful. I don't know about you, but I love to know what is going on behind the scenes -- that's what makes the creativity and the fun possible!