[blf@Logging /~]:

March 25, 2008

关于 JDBC, ORM, OLTP, OLAP…

Filed under: PostgreSQL, database — blowfisher @ 9:11 am

Non-ORM layers over JDBC?

If you want JDBC access that is more closely integrated into the language I
would suggest using Groovy. It REALLY simplifies JDBC access because of
Groovy’s dynamic typing, which is basically the same thing as using variant
data types in C++, at least syntactically. Groovy’s way of executing JDBC’s
statements is also much easier to use. Groovy compiles to Java class files
and the JVM doesn’t know the difference. The groovy runtime/library is just
a jar file that you stick on your classpath.

ORM for me works really well in OLTP situations. If I am doing pure OLTP I
rarely need to go outside of my ORM access layer, which is Hibernate.
Hibernate’s query language (HQL) has lots of features to make writing SQL
queries easier and lots of features to minimize performance problems
. If you
are used to SQL, it make take a little getting used to because HQL is more
abstract than SQL. It’s like making a jump from C to Java, more abstraction,
less code, less raw power.

If you have lots of screens where users are basically building up sql queries,
using forms, then Hibernate’s query by criteria makes this easy because you
are not longer manually building up SQL (or HQL) queries by hand (which is
really error prone). All of my complicated search screens use this feature
of Hibernate.

ORM falls down badly for two things: 1) OLAP style database work and 2) Batch
processing
. OLAP depends way too much on specific database facilities to
make things fast, which Hibernate can’t take advantage of
. Batch processing
chokes because Hibernate will cache too much because it is trying to optimize
OLTP style interactions.

–David Clark

March 20, 2008

TB级 PostgreSQL 拾零

Filed under: FreeBSD, PostgreSQL, database — blowfisher @ 10:45 am

有关TB级以上 PostgreSQL 数据库的一些信息:

Well I can’t speak to MS SQL-Server because all of our clients run
PostgreSQL ;) .. I can tell you we have many that are in the 500GB -
1.5TB
range.

All perform admirably as long as you have the hardware behind it and are
doing correct table structuring (such as table partitioning).

Sincerely,

–Joshua D. Drake

We have several TB database in production and it works well on
HP rx1620 dual Itanium2, MSA 20, running Linux. It’s read-only storage for
astronomical catalogs with about 4-billions objects. We have custom
index for spherical coordinates which provide great performance.

–Oleg

I had a client that tried to use Ms Sql Server to run a 500Gb+ database.
The database simply colapsed. They switched to Teradata and it is
running good. This database has now 1.5Tb+.

Currently I have clients using postgresql huge databases and they are
happy. In one client’s database the biggest table has 237Gb+ (only 1
table!)
and postgresql run the database without problem using
partitioning, triggers and rules (using postgresql 8.2.5).

–Pablo

I think either would work; both PostgreSQL and MS SQL Server have
success stories out there running VLDBs. It really depends on what you
know and what you have. If you have a lot of experience with Postgres
running on Linux, and not much with SQL Server on Windows, of course the
former would be a better choice for you. You stand a much better chance
working with tools you know.

–Pablo Alcaraz

All of those responses have cooked up quite a few topics into one. Large
databases might mean text warehouses, XML message stores, relational
archives and fact-based business data warehouses
.

The main thing is that TB-sized databases are performance critical. So
it all depends upon your workload really as to how well PostgreSQL, or
another other RDBMS vendor can handle them.

Anyway, my reason for replying to this thread is that I’m planning
changes for PostgreSQL 8.4+ that will make allow us to get bigger and
faster databases. If anybody has specific concerns then I’d like to hear
them so I can consider those things in the planning stages.

–Simon Riggs

March 5, 2008

Oracle on eBay goes over 5 petabytes

Filed under: database — blowfisher @ 8:23 am

Oracle on eBay goes over 5 petabytes

Curt Monash, the Harvard teen PhD genius., notes that eBay’s Oracle database has now exceeded five petabytes:

From Oliver Ratzesberger’s LinkedIn profile:

Our systems process in excess of 10 billion records per day, serving thousands of users and delivering hundreds of millions of queries per month in a true global 24×7 operation with distributed teams around the globe on systems over 5 PB in size (largest single system >1.4PB).

With multi-petabyte databases becoming common, it’s only a matter of time until Oracle will support Petafiles . . . .

eBay is noted as one of Oracle’s premier web apps, with over 20 billion transactions per day:

* Over 212 million registered users
* Two Petabytes of Data
* 26 billion SQL executions per day
* 99.94% available
* One billion page views per day
* Uses Sun e10k servers

Curt Monash also notes:

“eBay’s biggest analytic database is 1.4 petabytes of disk, holding between 1/2 and 1 petabyte of user data, and running (I’m pretty certain) on Teradata.”

www.blowfisher.net  |  Powered by WP