If you’re modeling relational data, it doesn’t seem like you can get around using a DB that uses SQL, which to me is the worst: most programmers aren’t DB experts and the SQL they output is quite often terrible.
Not to dunk on the lemmy devs, they do a good job, but they themselves know that their SQL is bad. Luckily there are community members who stepped up and are doing a great job at fixing the numerous performance issues and tuning the DB settings, but not everybody has that kind of support, nor time.
Also, the translation step from binary (program) -> text (SQL) -> binary (server), just feels quite wrong. For HTML and CSS, it’s fine, but for SQL, where injection is still in the top 10 security risks, is there something better?
Yes, there are ORMs, but some languages don’t have them (rust has diesel for example, which still requires you to write SQL) and it would be great to “just” have a DB with a binary protocol that makes it unnecessary to write an ORM.
Does such a thing exist? Is there something better than SQL out there?
Simple queries don’t result in simple SQL. How many joins and subqueries do you think an SQL query would require in order fulfill “Give me the top 10 artists of the 90s whose albums were nominated for the MTV awards but didn’t win”?
In Django looks something like
nineties = (date(1,1,1990), date(31, 12, 1999) album_range=Q(albums__release_date__range=nineties) artists = Artists.objects.annotate( albums_sold=Sum("albums__sales", filter=album_range)), ).filter( album_range, nominations__date__range=nineties, nominations__won=False ).order_by("-albums_sold") top_artists = artists[:10]
What if one method wants the result of that but only wants the artists’ names, but another one wanted additional or other fields? In django you could simply use
artists.only(*field_names)
and each method would provide a different set of field names. What would that look like without a capable ORM? Do you think somebody would refactor the method to add afield_names
argument? In my experience the result is a bunch of copy pasted queries that modify the query itself to add the fieldnames.Another common thing is querying related objects. Say you simply wanted to have information about the record label of the aforementioned artists while handling the artists. A many-to-one relationship (artist has one record label, record label has many artists). You could either
artist.record_label
while in your for-loop, but that would trigger an query for every artist (1+n problem). Or in django that’sartists.select_related("record_label")
and it will get all the record_labels in the same query.If it’s a many-to-many relationship for example “festivals”, then
.prefetch_related()
will first select the artists, then make a second query of festivals of those artists, andartist.festivals
would be available.An ORM like django makes that simple. SQL, does not.
So, before we even get to the DB optimisation part (which indices to create, whether a view is better or now, which storage engine to use, WAL size, yadayadayada), there’s an entire interface / language that makes writing bad code very easy.
I’m too lazy to convert that by hand, but here’s what chatgpt converted that to for SQL, for the sake of discussion:
SELECT a.id, a.artist_name -- or whatever the name column is in the 'artists' table FROM artists a JOIN albums al ON a.id = al.artist_id JOIN nominations n ON al.id = n.album_id -- assuming nominations are for albums WHERE al.release_date BETWEEN '1990-01-01' AND '1999-12-31' AND n.award = 'MTV' -- assuming there's a column that specifies the award name AND n.won = FALSE GROUP BY a.id, a.artist_name -- or whatever the name column is in the 'artists' table ORDER BY COUNT(DISTINCT n.id) DESC, a.artist_name -- ordering by the number of nominations, then by artist name LIMIT 10;
I like Django’s ORM just fine, but that SQL isn’t too bad (it’s also slightly different than your version though, but works fine as an example). I also like PyPika sometimes for building queries when I’m not using Django or SQLAlchemy, and here’s that version:
q = ( Query .from_(artists) .join(albums).on(artists.id == albums.artist_id) .join(nominations).on(albums.id == nominations.album_id) .select(artists.id, artists.artist_name) # assuming the column is named artist_name .where(albums.release_date.between('1990-01-01', '1999-12-31')) .where(nominations.award == 'MTV') .where(nominations.won == False) .groupby(artists.id, artists.artist_name) .orderby(fn.Count(nominations.id).desc(), artists.artist_name) .limit(10) )
I think PyPika answers your concerns about
It’s just regular Python code, same as the Django ORM.