MathGroup Archive 2002

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: odbc vs jdbc for database access in Mathematica

  • To: mathgroup at smc.vnet.net
  • Subject: [mg38108] Re: [mg38039] odbc vs jdbc for database access in Mathematica
  • From: Todd Gayley <tgayley at wolfram.com>
  • Date: Tue, 3 Dec 2002 04:32:08 -0500 (EST)
  • Sender: owner-wri-mathgroup at wolfram.com

Alexander,

My guess is that your speed problems are due to the overhead of making so 
many calls from Mathematica into Java. There is a more or less constant 
overhead of making a call into Java that is due to the internals of 
MathLink and J/Link. On a low-to-mid-range PC, each call to Java costs 
about a millisecond (plus the processing time for the actual Java code to 
run, which is usually negligible). For many programs, this overhead is not 
a concern, but your code is calling into Java at least twice for every item 
read from the database, and if there are 100,000 items to read then you 
will have a slow program.

There is one simple optimization you can make that should cut the time 
about in half. Your Switch statement tests against the constants 
Types`CHAR, Types`DOUBLE, etc. These harmless-looking expressions trigger a 
call into Java each time they are evaluated. You can eliminate these 
unnecessary calls by caching the values ahead of time:

    {$charType, $doubleType, $dateType, ...etc. } = {Types`CHAR, 
Types`DOUBLE, Types`DATE, ...etc.};

Then the Switch becomes:

    Switch[type,
        $charType, rs@getString[i],
        $doubleType, rs@getDouble[i],
        $dateType, rs@getDate[i],
        ... etc.
    ]

This one change will drop the number of calls into Java from two to one for 
each item.

If you have lots of TIMESTAMP values in your data, you can also speed up 
your handling of these types. Right now you make 6 calls into Java for each 
TIMESTAMP. You could probably drop this to 2. In your DataSourceEvaluate 
function, create a DateFormat instance ahead of time:

    LoadJavaClass["java.text.DateFormat"];
    $dateFormatter = DateFormat`getDateInstance[];

Then in GetColumn you would use it like this:

     $dateFormatter@format[rs@getDate[i]]

If the default DateFormat doesn't look like you want, you can set a style 
when you call DateFormat`getDateInstance[styleArg];

If these optimizations are still not enough, the only thing to do is to 
move at least some of the loop (NestWhileList) down into Java code. Then 
you could call into Java once for each result set instead of once for each 
data item.

Just to be clear, the reason we move the loop is not because Java is faster 
than Mathematica but because we want to minimize the number of times we 
cross the Mathematica-Java barrier.

A simple first try would be to write Java code that gets the contents of a 
single row. Then you could use essentially all of your current Mathematica 
code unchanged. With this technique, you would only have one call into Java 
for each row instead of each row x col. That might be enough optimization. 
I like this approach because it conforms with a general principle that I 
try to adhere to when doing mixed-language programming: Write as much as 
possible in the highest-level language. With J/Link, start out writing 
everything in Mathematica, and only if the performance isn't acceptable do 
you consider writing Java. Then try to find the smallest possible piece of 
functionality to move down into Java.

Here is the RowGetter class. It is little more than a direct translation of 
your GetColumn function into Java. The getRow() method is a good example of 
a method that sends its result back to Mathematica manually instead of 
simply relying on J/Link's automatic behavior of sending back the method's 
return value. "Manual" functions are discussed in section 1.2.18 of the 
J/Link User Guide.

/////////////////////  RowGetter class  //////////////////////

import java.sql.*;
import com.wolfram.jlink.*;
import java.text.DateFormat;

public class RowGetter {

     private int[] types;
     private DateFormat dateFormatter = DateFormat.getDateInstance();

     public RowGetter(int[] types) {
         this.types = types;
     }

     public void getRow(ResultSet rs) throws MathLinkException, SQLException {

         KernelLink ml = StdLink.getLink();
         ml.beginManual();
         ml.putFunction("List", types.length);
         for (int i = 0; i <= types.length; i++) {
             switch (types[i]) {
                 case Types.VARCHAR:
                 case Types.CHAR:
                     ml.put(rs.getString(i));
                     break;
                 case Types.DOUBLE:
                     ml.put(rs.getDouble(i));
                     break;
                 case Types.TIME:
                 case Types.DATE:
                     ml.put(rs.getDate(i));
                     break;
                 case Types.NUMERIC:
                 case Types.INTEGER:
                     ml.put(rs.getInt(i));
                     break;
                 case Types.TIMESTAMP:
                     ml.put(dateFormatter.format(rs.getDate(i)));
                     break;
                 default:
                     // Do something to handle the fallthrough case:
                     ml.putSymbol("$UnhandledJDBCType");
             }
         }
     }

}
///////////////////  End RowGetter class  /////////////////////


Then nothing about your Mathematica code has to change except the 
NestWhileList:

    rowGetter = JavaNew["RowGetter", types];
    Rest[
        NestWhileList[
            rowGetter@getRow[rs]&,
            1,
            rs@next[]&
        ]
    ]


Todd Gayley
Wolfram Research


At 11:50 PM 11/25/02, you wrote:
>I have a Mathematica program that uses DatabaseAccess package to
>retrieve large amount of data from Sybase database. DatabaseAccess is
>available for Windows only since it uses odbc. Now, I wanted to run
>the program under Linux so I used JLink for jdbc-based database
>access. At the end of this message I'll insert the code that emulates
>DataSourceEvaluate and OpenDataSource from DatabaseAccess but uses
>jdbc. The implementation is simple and straightforward. The problem is
>the code works MUCH slower than the equivalent odbc-based one on NT. I
>retrieve A LOT of data so this is really annoying. So I have two
>questions:
>
>1. Is it one of the Mathematica functions that make it slow (the
>conversion of the result of the sql query into Mathematica table) or
>is it some fundamental java-related problem (JLink, jdbc etc)?
>
>2. How can I make it run faster?
>
>Thanks in advance and best regards,
>Alexander
>
>PS My correct email address is
>"rze90bdsv02\[RawAt]sneakemail\[RawDot]com"
>
>InstallJava[];
>AddToClassPath[
>     "jconn2.jar", "jdbc2_0-stdext.jar"];
>
>LoadJavaClass["java.sql.Types"];
>LoadJavaClass["java.util.Calendar"];
>
>Clear[GetColumn];
>GetColumn[i_, type_, rs_] := Switch[type,
>     Types`CHAR,rs@getString[i],
>     Types`DOUBLE,rs@getDouble[i],
>     Types`DATE,rs@getDate[i],
>     Types`INTEGER,rs@getInt[i],
>     Types`VARCHAR,rs@getString[i],
>     Types`TIME,rs@getDate[i],
>     Types`TIMESTAMP,JavaBlock[
>       Module[{cal = Calendar`getInstance[]},cal@setTime[rs@getDate[i]];
>         ToString[cal@get[Calendar`MONTH] + 1]<>"/"<>
>           ToString[cal@get[Calendar`DATE]]<>"/"<>
>           ToString[cal@get[Calendar`YEAR]]]
>       ],
>     Types`NUMERIC,rs@getInt[i]]
>
>ClearAll[DataSourceEvaluate];
>DataSourceEvaluate[conn_,sql_String] :=
>   JavaBlock[
>     Module[{st = conn@createStatement[],rs,md, types,res},
>       res = If [st@execute[sql], rs = st@getResultSet[];
>           md = rs@getMetaData[];
>           types = Array[md@getColumnType[#]&,{md@getColumnCount[]}];
>           Rest[
>             NestWhileList[
>               Array[GetColumn[#,types[[#]], rs] &,{Length@types}] &,
>               1,
>               rs@next[] &]],
>           {}];
>       st@close[];
>       res]]
>
>Clear[OpenDataSource];
>OpenDataSource[database_String, user_String,password_String] :=
>     Module[{driver = JavaNew["com.sybase.jdbc2.jdbc.SybDriver"],
>         dbUrl = "jdbc:sybase:Tds:"<>$DBServer<>":"<>$DBPort<>"/"<>database,
>         prop = JavaNew["java.util.Properties"]},
>       prop@setProperty["user",user];
>       prop@setProperty["password",password];
>       driver@connect[dbUrl,prop]];



  • Prev by Date: copying cells in Mathematica 4.2--problem solved!
  • Next by Date: Re: Re: copying cells in Mathematica 4.2
  • Previous by thread: copying cells in Mathematica 4.2--problem solved!
  • Next by thread: Re: odbc vs jdbc for database access in Mathematica