I'm just starting to learn about database stuff, playing around with PostgreSQL (and spgsql). So, if I have a data structure that contains an ordered list of another data structure, what's the best way to design the tables? I can think of at least two ways to do it, but I'm not sure I like either: add ordinal numbers to the second table, or store arrays in the first table. In the former, I'd have to renumber if I want to insert something in the middle of a list; in the latter, I'd have to generate a synthetic primary key to use in the arrays. Which I guess isn't so bad, but I don't know how to do this atomically in SQL. Any hints? Is there some other standard idiom for ordered lists? Maybe I need a good online tutorial for database design, any pointers?

From: [identity profile] petdance.livejournal.com


Basically yes, you need ordinal numbers to say what order they should go in.

Any of Joe Celko's books are good for reading up on SQL stuff, except for "Instant SQL Programming" which he disclaimed this summer when I heard him speak. (oh, and don't hear him speak)

Whyncha post some tables that we can see?

From: [identity profile] dougo.livejournal.com


Yeah, I read that section, but I'd still need to put foreign keys into the array manually, right?

From: [identity profile] ahkond.livejournal.com


I might use ordinal numbers in the second table because then you get the benefits of table implementation for the second structure. If you put arrays in the first table there's no quickie way to ensure that they're all set up right.

Another possibility that still lets you put the second structure in a table is to have each row in that second table have a field containing the ID value(s) of the NEXT such instance. That is, each row in the second table acts like a member of a linked list.

The second table would be like this:

unique ID
foreign key field pointing to table 1 for the big grouping
foreign key pointing to table 2 to reference the next entry after this one
(maybe ditto for previous entry)
data a
data b
data c
etc.

In this case, if you need to insert new members, all you have to do is update some rows, like stitching something into the middle of a linked list.

A foreign key that points from a table to itself is no big deal. Typically you leave it NULL to indicate the beginning and end of the list; then you can easily find the beginning of the list by querying for that.

The downside is that it's hard to say "give me all the rows in table 2 that belong to element x of table 1 ... IN ORDER". Instead you have to select them all and put them into a linked list in your client program (if such a thing exists), and then traverse the linked list outside the DB. If you need to go through the list in order as part of a data query then ordinals would probably be easier.

Dealing with ordinals isn't all that bad because you can say "update FOO set ordinalfield = ordinalfield + 1 where ordinalfield > (some breakpoint)" to shift everybody up one (above a certain point) to make a gap in the middle.


From: [identity profile] rikchik.livejournal.com


I'm not sure I understand the question.

Is it a 1-to-many relationship with the many needing to be in a particular order? Or is it many-to-many? Either way, I think I'd go with ordinals somewhere, either in the many table or the relationship table.

From: [identity profile] rikchik.livejournal.com


Also, good (non-mySQL) DBMSes allow you to create arbitrarily complex atomic actions. I don't know if PostgreSQL has this ability, but I'd expect it to.

From: [identity profile] dougo.livejournal.com


Huh, cute. There is something appealing about using a linked list and mapping it directly to a Scheme list, but I think that would probably end up being more trouble than it's worth. Sounds like ordinals will be less painful than I was thinking.

Do I still need that unique ID, though? The combination of the ordinal plus the foreign key back-pointer will do as a primary key, but I don't expect to have any references into this table anyway since the association has been inverted.

From: [identity profile] dougo.livejournal.com


Well, yeah, I guess I could make a stored procedure that does nextID++ atomically, but I was wondering if there was some mechanism that already did this. I'm sort of surprised/annoyed that rows don't already have unique identities, but I guess that'd just be an OODB. (Which is maybe what I really want, hm.)

From: [identity profile] dougo.livejournal.com


By the way, I guess it wasn't clear, I was intending to put arrays of foreign keys into the first table, not arrays with actual data.

From: [identity profile] ahkond.livejournal.com


In that case you probably don't need the unique ID. For me it's just become a habit.

From: [identity profile] ahkond.livejournal.com


Ugh! Let the database deal with the foreign keys using its own foreign key machinery. I don't know about PostgreSQL, but the stuff I use at work lets you declare foreign keys as such and it will enforce them for you. If you hide them in arrays the system won't verify them for you, no?

Besides the question of enforcing data integrity, the database might be smart enough to use real foreign keys to help it optimize queries.

In other words, if your data has relationships, try to embody them using the machinery of the database system (indexes, keys, etc.) and your DB will perform better.

From: [identity profile] ahkond.livejournal.com


In some database systems you can declare a field using a special data type which is effectively a system-generated unique ID. When you insert rows into the table you omit that field from the list of values you provide, and the system generates the unique ID using a seed/increment system. Microsoft SQL Server and Access both have this, for instance. But I've found that they're a lot of trouble (in terms of managing things six months later when we have to change the table because of a new feature) and I only use them as a last resort.

This is generally not there by default because the philosophy of a relational database is that the rows of a table are inherently un-ordered. They're just instances of a class-like sort of thing. For example, states of the union could be ordered by name, by population, by date they joined the union. No particular ordering is inherently the default ordering. So if you set up a table for "States" with those fields and started typing, the DB is free to store them in whatever order it likes and if you select * from the States table you won't necessarily get them back in the order you typed. You could say "ORDER BY name" or "ORDER BY area" and that would work, but how it works is the DB's business.

You can use unique ID numbers for setting up incoming foreign keys but these should be inherently meaningless, just placeholders for the relationship. The database generally makes no guarantees about what order rows are stored in a table unless you create a clustered (sorted) index.

From: [identity profile] dougo.livejournal.com


Yeah, I was thinking I could tell it that the array had foreign keys, but I guess you can only tell it whether a whole column is a single foreign key. So, dumb idea.

From: [identity profile] dougo.livejournal.com


I understand that rows (and columns) are unsorted, that's why I have to deal with ordinals in the first place. But generating unique IDs is independent of ordering-- I don't care if the IDs are in order, or even consecutive, I just need to guarantee they're unique, even if two INSERTs happen concurrently.

After a little browsing around, I see that PostgreSQL does in fact have this, in the form of CREATE SEQUENCE and sequence functions. Good to know, but I think I don't need it yet.

Anyway, thanks for all your advice.

From: [identity profile] dougo.livejournal.com


Turns out it's even easier: PostgreSQL has a SERIAL column keyword that will handle the auto-increment automatically whenever a row is inserted with that column being DEFAULT (or no value). Cool.
.