NBLUG Library ?

Ron Wickersham rjw at alembic.com
Fri Oct 25 01:17:55 PDT 2002


On Thu, 24 Oct 2002 hanksdc at plug.org wrote:

> On Thu, 24 Oct 2002, Eric Eisenhart wrote:
> 
> > On Thu, Oct 24, 2002 at 03:42:06PM -0700, Ron Wickersham wrote:
> > > this is not something i understand very well, but the problem shows up in
> > > the structure and rules for the MARC records, which fit multi-valued
> > > database model (like pick).
> > >
> > > a search in the koha archives didn't yield the pages i read, but i did
> > > find this link that sounds similar to the issues raised on the koha list:
> > > http://sunsite.berkeley.edu/XML4Lib/archive/0112/0010.html
> >
> > That page talks about the proper relational method for creating
> > "multi-valued fields" being complicated enough that it's a hassle.
> >
> > Example:
> > DB with multi-valued fields where you give up on proper relational stuff:
> >
> > Foos
> > ----------
> > id  |int
> > name|varchar
> > bar |multi-value [one, two, three]
> 
> While I'd need to look at the MARC format a little more to fully
> understand this, a design much like this could still adhere to the
> relational model (and yes, still be 'proper'). Essentially you would have
> to design a user-defined type (let's call it multi-value-triple), along
> with the appropriate operators to manipulate that type, as well as
> means to enforce the allowable domain of values possible for that type.
> Postgres allows you to do this. But despite the apparent lack of atomicity,
> such a design would still be in 1st normal form, and hence, faithfully
> adhere to the relational model, which prescribes that each table be at
> least in 1NF.

but the cataloging data were done by different people at different times
and from different institutions so there is no single "user-defined" type 
for each tag or sub.  a 'proper' MARC record is not a unique map from
the real world.   

> Extending this idea, lots of complex data could be considered atomic, as
> long as the internal representation of the data type is invisible to the
> user, and as long as an 'object' of the data type can be manipulated only
> by operators defined for that type. In that sense, is it atomic.
> 
> To quote C.J. Date on the subject (he explains it better anyway):
> 
> "Thus we can have domains of audio recordings, domains of maps, domains of
> video recordings, domains of engineering drawings, domains of legal
> documents . . . The only requirement is that . . . the values in the
> domain must be manipulable <i>solely</i> by means of the operators defined
> for the domain in question; the internal representation must be hidden."
> [1]

in this case perhaps artificial intelligence could be used to define operators
for the type, but that may be a harder job than you want just to force the
data to table form.   the MARC records already exist and one can't afford to
re-catalog from scratch.

> More on this can be found in the first chapter of Fabian Pascal's
> excellent book, 'Practical Issues in Database Management'. (Also see
> www.dbdebunk.com for more of this type of discussion).
> 
> A lot of argument today against the relational model is its perceived lack
> of support for complex data, but in this light I don't think that
> argument holds water.
> 
> Now I'd need to look more at the MARC format, but I'd venture to guess it
> wouldn't be as difficult as it might appear at first glance to design a
> relational database to comfortably hold it, given the utility of
> the appropriate user-defined data types to accomodate the data.

but the (undiciplined) users have already chosen the 'user-defined' data types
so the database designer has to fit this (existing) data into the machine.

> Just my $0.02.
> 
> -- Dan Hanks
> 
> [1] C.J. Date, Hugh darwen, Foundation for Future Database Systems: The
> Third Manifesto, 2nd ed.
> 
> Also,
> http://www.columbia.edu/cu/libraries/inside/projects/metadata/model/whitepaper.html
> looks fairly interesting.

the Columbia paper starts off well, fairly showing (some) problems with
MARC and enthusiasticly proposes relational database model for various
reasonable arguments.  but the author does admit that the fit is poor but
argues (when written in 1998) that you should still choose relational.
then the paper section "A Proposed Entity Relational Model using MARC fields"
shows why it's really, really hard to drive a MARC stake in the target he's
proposed.   see "Subject Metadata" and the following three examples.

indeed, looking at the progress at Columbia's site, they have limited their
work to newly cataloged "digital" holdings (approx 75,000) and have not yet
implemented MARC book or other conventional records yet.

i think this field is interesting, but what do i know?

-ron

> 
> 
> 
> >
> > DB without multi-valued fields, doing single values:
> >
> > Foos
> > ----------
> > id  |int
> > name|varchar
> > bar |int
> >
> > Bars
> > ----------
> > id  |int
> > name|varchar
> >
> >
> > DB without multi-valued fields, doing real multi-valued stuff:
> >
> > Foos
> > ----------
> > id  |int
> > name|varchar
> >
> > Bars
> > ----------
> > id  |int
> > name|varchar
> >
> > FooBar
> > ----------
> > foo |int
> > bar |int
> >
> >
> > In other words, in a database with explicit "multi-valued fields", you have
> > a single table with one field that's "extra special".  Without explicit
> > "multi-valued fields" you create one table holding the potential values and
> > another table holding the link between available values and the record.
> >
> 
> 



More information about the talk mailing list