[NBLUG/talk] Transferring data from Oracle to an open-source database

Lincoln Peters sampln at sbcglobal.net
Wed May 24 16:25:06 PDT 2006


Eric Eisenhart wrote:
> On Tue, May 23, 2006 at 09:17:13PM -0700, Lincoln Peters wrote:
>   
>> I have a project I've been working on for the database class at SSU.  
>> It's starting to look like my project will be useful well beyond this 
>> class, so I'd like to transfer it from the CS department's Oracle server 
>> ("kirby") to my own server.  I'd rather not have to deal with the 
>> licensing issues (not to mention cost) of running my own Oracle server, 
>> so I'd like to migrate it to an open-source SQL database such as MySQL 
>> or PostgreSQL.
>>     
>
> What are you going to use the DB server for?  Oracle will let you run a
> development-only, single-user version for free and has had that arrangment
> for a long time.  

I know, but that license agreement doesn't fit my goals.

I'm building a wildlife database, that can be used to identify plants 
and animals, or even to generate field guides (in print or digital).  A 
few volunteers at the Fairfield Osborn Preserve are planning to survey 
the plants and animals that live there over the summer, and they have 
expressed interest in using my database to store their findings.

> They've also got a newer "Express Edition" thing that's
> more limited in some ways, but has much less restrictive licensing
> (redistributable in commercial products!) and is a lot easier to install. 
> (restrictions are things like 4GB of data, 1 CPU used, 1GB of RAM used)
>
> http://www.oracle.com/technology/products/database/xe/
> http://www.oracle.com/database/product_editions.html
> http://www.oracle.com/technology/software/htdocs/devlic.html
> http://www.puschitz.com/OracleOnLinux.shtml
>   

I don't know how big the database will get.  I doubt it would exceed 
4GB, but it's just too soon to tell.

>   
>> What I'd like to find out is:
>>
>> * Which open-source database is preferred?  As far as I can tell, MySQL 
>> is the most popular, but I've heard good things about PostgreSQL as well.
>>     
>
> It depends.
>
> Basically, PostgreSQL has fuller support for the ANSI SQL standard.  MySQL's
> been catching up, especially if you use the InnoDB table format that gets
> you transactions.  If you're doing complicated SQL with sub-SELECTs and
> things like that, you're gonna want PostgreSQL.  Another example: PostgreSQL
> uses sequences; MySQL uses a marker on a column that sets it as
> auto-increment.  (you can emulate MySQL's behavior in PostgreSQL or Oracle
> with a sequence and a trigger).  Heck, for that matter, PostgreSQL can do
> triggers and I don't think MySQL does them yet.
>   

Most of the queries involved in my database are some combination of 
joins and "IN" operators, such as:


--Display all traits of the organism called 'Pinus monticola'.
SELECT tname FROM Organism o, OrganismHasTrait ot, Trait t WHERE 
o.oid=ot.oid AND ot.tid=t.tid AND o.sname='Pinus monticola';


--Display all organisms that have needles.
SELECT o.oid, o.sname FROM Organism o, OrganismHasTrait ot WHERE 
o.oid=ot.oid AND ot.tid IN (SELECT tid FROM Organism WHERE tname='has 
needles') AND ot.truefalse='T';


--Display all traits possessed by organisms that have needles, and how 
many organisms possess each trait, sorted by number of organisms in 
descending order (i.e. starting with the trait with the most associated 
organisms).
SELECT ot.oid, t.tname, count(*) Organisms FROM OrganismHasTrait ot, 
Trait t WHERE t.tid=ot.tid AND ot.tid IN (SELECT tid FROM Organism WHERE 
tname='has needles') AND ot.truefalse='T' GROUP BY ot.tid, t.tname ORDER 
BY Organisms DESC;


Given what you said about PostgreSQL having better support for 
sub-SELECT queries (which may play a major role in future versions 
involving more complex data sets), it sounds like PostgreSQL might 
actually be preferable to MySQL, at least in this regard (I'd rather not 
have to completely rewrite these queries so that I don't have to use 
sub-SELECT queries).

> On the other hand, having dealt with both, MySQL is a lot easier to manage
> that PostgreSQL.  PostgreSQL has some weird ideas about things with trying
> to match DB accounts to user accounts.
>   

Are you saying that every DB account has to correspond to a user 
account?  Actually, I think I can live with that.

> Also, PgSQL has (had? I haven't kept up) issues with CLOBs and BLOBs, and
> the DB driver doesn't support placeholders, forcing DBI to emulate them
> without the full security benefits of placeholders.
>   

I don't use CLOBs or BLOBs in this database, so this shouldn't be an 
issue.  I had been thinking about having it store images (photos of 
organisms, illustrations of traits, etc.), but I think I could just as 
easily store them outside the database and use a VARCHAR attribute to 
store the path to an image.


When you say placeholders, are you referring to bind values?  For example:

my $insert_sql = 'insert into foo (a,b,c) values (?,?,?);


I was aware that placeholders (bind values) could yield some major 
performance gains over interpolated statements, but I hadn't considered 
the security benefits (there is no data in here that needs to be secured 
against unauthorized access, so the only security issues I'd be worried 
about are unauthorized inserting and deleting of data).

>   
>> * Based on my understanding of the Perl DBI (which I've studied but have 
>> not yet used), it should be a simple matter to transfer all of the 
>> existing records from the Oracle database into any other RDBMS supported 
>> by the Perl DBI.  Assuming that I use either MySQL or PostgreSQL, will 
>> this be a simple task?
>>     
>
> Reasonably so, yes.  Especially if you're willing to write the new table
> definitions by hand and you're only trying to script the transfer of data.
>
> Something like this:
[snip]


Looks like what I had in mind.  The table definitions are surprisingly 
simple, so rewriting the definitions by hand isn't a big deal.  In fact, 
I ran across a few deficiencies in my original design (mostly 1-to-n 
relationships that should have been n-to-m), and I don't think it would 
be difficult to fix them during the conversion process.

> Or even easier if you find something that does the work for you.  Say this:
> http://www.mysql.com/products/tools/migration-toolkit/
>   

I don't know if I'll end up using MySQL, but this looks like something 
worth remembering for the future.

> (nothing quite so easy for Pg that I could find in under 30 seconds of
> googling.  http://techdocs.postgresql.org/techdocs/oracle-to-pg-porting.php
> is the best I could do...)
>   

As far as I can tell from this page, the features that differ between 
Oracle and PostgreSQL are all features that I'm not using, so it 
shouldn't be a problem.  The Perl DBI trick should work, then.


-- 
Lincoln Peters
<sampln at sbcglobal.net>

I just remembered something about a TOAD!




More information about the talk mailing list