Re: odbc vs jdbc for database access in Mathematica
- To: mathgroup at smc.vnet.net
- Subject: [mg38108] Re: [mg38039] odbc vs jdbc for database access in Mathematica
- From: Todd Gayley <tgayley at wolfram.com>
- Date: Tue, 3 Dec 2002 04:32:08 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
Alexander, My guess is that your speed problems are due to the overhead of making so many calls from Mathematica into Java. There is a more or less constant overhead of making a call into Java that is due to the internals of MathLink and J/Link. On a low-to-mid-range PC, each call to Java costs about a millisecond (plus the processing time for the actual Java code to run, which is usually negligible). For many programs, this overhead is not a concern, but your code is calling into Java at least twice for every item read from the database, and if there are 100,000 items to read then you will have a slow program. There is one simple optimization you can make that should cut the time about in half. Your Switch statement tests against the constants Types`CHAR, Types`DOUBLE, etc. These harmless-looking expressions trigger a call into Java each time they are evaluated. You can eliminate these unnecessary calls by caching the values ahead of time: {$charType, $doubleType, $dateType, ...etc. } = {Types`CHAR, Types`DOUBLE, Types`DATE, ...etc.}; Then the Switch becomes: Switch[type, $charType, rs@getString[i], $doubleType, rs@getDouble[i], $dateType, rs@getDate[i], ... etc. ] This one change will drop the number of calls into Java from two to one for each item. If you have lots of TIMESTAMP values in your data, you can also speed up your handling of these types. Right now you make 6 calls into Java for each TIMESTAMP. You could probably drop this to 2. In your DataSourceEvaluate function, create a DateFormat instance ahead of time: LoadJavaClass["java.text.DateFormat"]; $dateFormatter = DateFormat`getDateInstance[]; Then in GetColumn you would use it like this: $dateFormatter@format[rs@getDate[i]] If the default DateFormat doesn't look like you want, you can set a style when you call DateFormat`getDateInstance[styleArg]; If these optimizations are still not enough, the only thing to do is to move at least some of the loop (NestWhileList) down into Java code. Then you could call into Java once for each result set instead of once for each data item. Just to be clear, the reason we move the loop is not because Java is faster than Mathematica but because we want to minimize the number of times we cross the Mathematica-Java barrier. A simple first try would be to write Java code that gets the contents of a single row. Then you could use essentially all of your current Mathematica code unchanged. With this technique, you would only have one call into Java for each row instead of each row x col. That might be enough optimization. I like this approach because it conforms with a general principle that I try to adhere to when doing mixed-language programming: Write as much as possible in the highest-level language. With J/Link, start out writing everything in Mathematica, and only if the performance isn't acceptable do you consider writing Java. Then try to find the smallest possible piece of functionality to move down into Java. Here is the RowGetter class. It is little more than a direct translation of your GetColumn function into Java. The getRow() method is a good example of a method that sends its result back to Mathematica manually instead of simply relying on J/Link's automatic behavior of sending back the method's return value. "Manual" functions are discussed in section 1.2.18 of the J/Link User Guide. ///////////////////// RowGetter class ////////////////////// import java.sql.*; import com.wolfram.jlink.*; import java.text.DateFormat; public class RowGetter { private int[] types; private DateFormat dateFormatter = DateFormat.getDateInstance(); public RowGetter(int[] types) { this.types = types; } public void getRow(ResultSet rs) throws MathLinkException, SQLException { KernelLink ml = StdLink.getLink(); ml.beginManual(); ml.putFunction("List", types.length); for (int i = 0; i <= types.length; i++) { switch (types[i]) { case Types.VARCHAR: case Types.CHAR: ml.put(rs.getString(i)); break; case Types.DOUBLE: ml.put(rs.getDouble(i)); break; case Types.TIME: case Types.DATE: ml.put(rs.getDate(i)); break; case Types.NUMERIC: case Types.INTEGER: ml.put(rs.getInt(i)); break; case Types.TIMESTAMP: ml.put(dateFormatter.format(rs.getDate(i))); break; default: // Do something to handle the fallthrough case: ml.putSymbol("$UnhandledJDBCType"); } } } } /////////////////// End RowGetter class ///////////////////// Then nothing about your Mathematica code has to change except the NestWhileList: rowGetter = JavaNew["RowGetter", types]; Rest[ NestWhileList[ rowGetter@getRow[rs]&, 1, rs@next[]& ] ] Todd Gayley Wolfram Research At 11:50 PM 11/25/02, you wrote: >I have a Mathematica program that uses DatabaseAccess package to >retrieve large amount of data from Sybase database. DatabaseAccess is >available for Windows only since it uses odbc. Now, I wanted to run >the program under Linux so I used JLink for jdbc-based database >access. At the end of this message I'll insert the code that emulates >DataSourceEvaluate and OpenDataSource from DatabaseAccess but uses >jdbc. The implementation is simple and straightforward. The problem is >the code works MUCH slower than the equivalent odbc-based one on NT. I >retrieve A LOT of data so this is really annoying. So I have two >questions: > >1. Is it one of the Mathematica functions that make it slow (the >conversion of the result of the sql query into Mathematica table) or >is it some fundamental java-related problem (JLink, jdbc etc)? > >2. How can I make it run faster? > >Thanks in advance and best regards, >Alexander > >PS My correct email address is >"rze90bdsv02\[RawAt]sneakemail\[RawDot]com" > >InstallJava[]; >AddToClassPath[ > "jconn2.jar", "jdbc2_0-stdext.jar"]; > >LoadJavaClass["java.sql.Types"]; >LoadJavaClass["java.util.Calendar"]; > >Clear[GetColumn]; >GetColumn[i_, type_, rs_] := Switch[type, > Types`CHAR,rs@getString[i], > Types`DOUBLE,rs@getDouble[i], > Types`DATE,rs@getDate[i], > Types`INTEGER,rs@getInt[i], > Types`VARCHAR,rs@getString[i], > Types`TIME,rs@getDate[i], > Types`TIMESTAMP,JavaBlock[ > Module[{cal = Calendar`getInstance[]},cal@setTime[rs@getDate[i]]; > ToString[cal@get[Calendar`MONTH] + 1]<>"/"<> > ToString[cal@get[Calendar`DATE]]<>"/"<> > ToString[cal@get[Calendar`YEAR]]] > ], > Types`NUMERIC,rs@getInt[i]] > >ClearAll[DataSourceEvaluate]; >DataSourceEvaluate[conn_,sql_String] := > JavaBlock[ > Module[{st = conn@createStatement[],rs,md, types,res}, > res = If [st@execute[sql], rs = st@getResultSet[]; > md = rs@getMetaData[]; > types = Array[md@getColumnType[#]&,{md@getColumnCount[]}]; > Rest[ > NestWhileList[ > Array[GetColumn[#,types[[#]], rs] &,{Length@types}] &, > 1, > rs@next[] &]], > {}]; > st@close[]; > res]] > >Clear[OpenDataSource]; >OpenDataSource[database_String, user_String,password_String] := > Module[{driver = JavaNew["com.sybase.jdbc2.jdbc.SybDriver"], > dbUrl = "jdbc:sybase:Tds:"<>$DBServer<>":"<>$DBPort<>"/"<>database, > prop = JavaNew["java.util.Properties"]}, > prop@setProperty["user",user]; > prop@setProperty["password",password]; > driver@connect[dbUrl,prop]];