27 Jul 2010, 19:14
Generic-user-small

Tas Pro (3 posts)

Hi

I just got the book and have been able to digest 4 chapters. It is really an eye opening read.

We are working on a Health Information Management System. In the module where we capture surgery data, we have several types of surgeries for (180 surgery types currently) and the number seems to increase with time. All the surgical procedures have a common set of info (we call them ‘parameters’). On the other hand, each procedure also has a set specific parameters too. There are (currently) three Categories of parameters identified (in addition to common parameters): Per-Operative Settings, Per-Operative Complications, Post-Operative Evaluation. We solved this situation with implementing EAV pattern. A meta data table defines parameters for each surgery procedure, along with an FK for parameter category and type information. The EAV table stores patient’s ID, patient’s surgery session id, patient’s surgery id, surgery type FK and the name-value pair.

However, this (anti-)pattern solved our data capture problem, we have begun to worry about reports.

I would like your opinion about addressing such a situation and whether to modify our data model.

Thanks.

27 Jul 2010, 19:49
Bk2_pragsmall

Bill Karwin (19 posts)

One of my SQL developer friends said, “EAV makes it easy to enter data, but hard to get data out.” So you’re right to worry about the reporting.

At one level, EAV is just like a many-to-many intersection table. As long as you treat the attribute type column as data, and don’t try to think of it as metadata in the same way a column is metadata.

The place where people get into trouble is trying to pivot the EAV data, pretending that they have strings that name conventional columns. Don’t try to do that. Just fetch all your EAV attributes back one row at a time and write application code to digest them. I think I gave an example near the end of the EAV chapter of some PHP code that creates an array of objects, each object with its own custom attributes. Treat it like an intersection table, and you’ll be better off. Don’t mix data and metadata in SQL.

Of course this complicates your reporting. You have to write more code to do it, instead of using reporting tools that consume the result set from an SQL query.

27 Jul 2010, 20:20
Generic-user-small

Tas Pro (3 posts)

Thanks for a prompt reply.

We are already using reporting methods that mostly rely on code rather than on reporting engines. Many reports are done with web-reports, while others using a reporting engine that takes a disconnected recordset as the datasource.

As you rightly pointed out, I am of course worried about pivoting the data, coz biostats is a major concern of our system design and implementation.

How about transforming various entity types using materialized views for reporting purpose? Rotate the attributes to columns and values to rows?

27 Jul 2010, 23:07
Bk2_pragsmall

Bill Karwin (19 posts)

The problem is that defining the view to rotate the attributes involves the same kind of non-scalable query that querying the data directly does. That is, a whole lot of LEFT OUTER JOINs, one per attribute. Even though the materialized view reduces the frequency of running this query, you’re still left the with possibility that even running the query once is impractical.

You could write a stored procedure to iterate over the EAV data and populate a series of tables for reporting. But that’s just the same as your application-level reporting code, it’s just located in the database instead of in the app.

Another choice is to use a non-relational database, one of the “NoSQL” branded technologies that have a lot of marketing energy behind them right now. Like CouchDB, MongoDB, Cassandra, Hadoop, etc. You could even use Apache Solr in creative ways for non-relational data.

28 Jul 2010, 02:47
Generic-user-small

Tas Pro (3 posts)

Hmm, but this (surgery module) is the only area where we have run into the need for resorting to EAV. No other part of our design needs non-relational techniques. So RDBMS is well off everywhere except here.

Well, Thanks for your kind help.

28 Jul 2010, 04:12
Bk2_pragsmall

Bill Karwin (19 posts)

You might check out this solution:

http://bret.appspot.com/entry/how-friendfeed-uses-mysql

They use Martin Fowler’s “Serialized LOB” pattern, storing a bunch of non-relational data in JSON format. Then for fields they want to index, they create a kind of mapping table for each respective field. Thus you get extensible data and also indexed lookups. You sacrifice referential integrity, but the advantage is that it can live alongside relational data in the same database.

Though it would still be hard to create reports; you’re still required to wrote code to prepare result sets in application space.

02 Aug 2010, 01:56
Generic-user-small

Joel Mamedov (2 posts)

Hi. I just purchased the book and finished chapter 6. it is reference to a “Class Table Inheritance” section of chapter 6. The way you presented this solution has a major problem. There is a nothing to prevent from sharing the same “issue_id” by both child tables. Potentially, the same “issue_id” can end up in “Bugs “and “FuatureRequests” tables. Which is logically incorrect and only way to prevent it is in application code.

02 Aug 2010, 05:48
Bk2_pragsmall

Bill Karwin (19 posts)

Hi Joel, thanks for your comment.

There actually is a way to solve the scenario you mention using SQL:

CREATE TABLE Issues (issue_id INT PRIMARY KEY, issue_type CHAR(1), UNIQUE KEY (issue_id, issue_type));

CREATE TABLE Bugs (issue_id INT PRIMARY KEY, issue_type CHAR(1) CHECK (issue_type = ‘B’), FOREIGN KEY (issue_id, issue_type) REFERENCES Issues (issue_id, issue_type));

Likewise for FeatureRequests, but issue_type must be ‘F’ in that table.

Little-known fact is that a foreign key in a child table can reference a UNIQUE key in the parent table, as well as the more common use of referencing the parent’s primary key. Given the above constraints, there is no way a given issue_id could be referenced as both ‘B’ and ‘F’.

Sorry this option didn’t make it into the book, we were constrained for space.

02 Aug 2010, 14:43
Generic-user-small

Joel Mamedov (2 posts)

HI. That’s certainly will work. But, this way: 1. We introduce a redundant data (issue_type). 2. Discriminator which is according to you is one of the sql anipatterns. 3. Lost metadata.(reliance of data value (issue type = ‘B’). 4. Because of #3 issue it still must be enforced via application code.

11 Aug 2010, 16:33
Bk2_pragsmall

Bill Karwin (19 posts)

I agree this solution is a bit awkward and skirts the boundary toward a non-normalized design. It would take some thought to prove one way or the other.

But the key advantage of my solution is that you can declare a foreign key constraint to enforce referential integrity. In the antipattern you cannot.

When you have complex problems (e.g. multiple issue subtypes) you naturally have complex solutions. You have to declare the relationship somewhere. I stand by my position that relying on application code to enforce referential integrity is the wrong solution. See the “Keyless Entry” chapter.

  You must be logged in to comment