January 4, 2010
November 17, 2009
“Removing (Much of) the Suck from ORMs”
November 6, 2009
Why ORMs Suck
September 23, 2009
April 25, 2009
postgres, mysql, oracle Job Trends
April 22, 2009
[WHY] “You can’t kill open-source projects…”
http://forums.netbeans.org/post-32334.html
http://forums.netbeans.org/topic11438.html
—
“Companies come and go, but popular open-source programs like MySQL just keep rolling on.”
—
“If netbeans goes private, there will be a fork. Netbeans or whatever new name it would have, the project will move on. Very Happy . you just can’t kill it.
Making a fork only cost a one line command : hg clone , anyone can do it.”
—
“netbeans has taught us so much that we our self can manage netbeans and if ortacle does not support it we are there for netbeans”
—
“Netbeans just changed the way i learned Java.”
March 27, 2009
pgAdmin moves to BSD licence!
http://www.pgadmin.org/licence.php
pgAdmin v1.10 and later versions are released under the BSD licence. Earlier versions are released under the Artistic licence.
The BSD Licence
pgAdmin III
Copyright (c) 2002 – 2009, The pgAdmin Development Team
Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.
IN NO EVENT SHALL THE PGADMIN DEVELOPMENT TEAM BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE PGADMIN DEVELOPMENT TEAM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE PGADMIN DEVELOPMENT TEAM SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN “AS IS” BASIS, AND THE PGADMIN DEVELOPMENT TEAM HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
October 16, 2008
September 17, 2008
[ZT] Why isn’t PostgreSQL using my index?
...PostgreSQL's query optimizer is smart, really smart and unless you have evidence otherwise you should trust what it is doing. In this particular case, the optimizer realizes that if a table has only a few rows that using the index is actually slower than just spinning through the entire table. Just because PostgreSQL isn't using your index today with a small number of rows, does not mean it won't choose to use it later when you have more data or the query changes...August 28, 2008
OpenSQL Camp 2008: a community-organized database conference
From announcing-opensql-camp-2008:
Key facts:
- It is of, by and for the community (you).
- At this event, all open-source databases are created equal. We’ll learn together and grow together.
It’s a combination conference and hackathon. - It’s free.
- It is Friday night Nov 14, 18:00 through Sun the 16th at 18:00 in Charlottesville, Virginia USA in a very cool location.
- The website, where all details will be posted: http://opensqlcamp.pbwiki.com/
- The mailing group, where details will be discussed, decided and arranged: http://groups.google.com/group/opensqlcamp
- Date, time and place are confirmed and will not change. You can make your travel plans now. There’s travel information on the wiki.
- We already have some great speakers who have offered to give great talks. These and other details will show up on the wiki as they’re finalized.
- Sponsors are needed. See the wiki.
June 5, 2008
[PostgreSQL] pgAdmin III v1.8.4 released
2008-06-05 – pgAdmin III v1.8.4 released.
May 31, 2008
[HACKERS] Core team statement on replication in PostgreSQL (备查)
built-in replication in PostgreSQL 看来有戏了,留个记号在此备查!
* From: Tom Lane
* To: pgsql-hackers(at)postgreSQL(dot)org
* Subject: Core team statement on replication in PostgreSQL
* Date: Thu, 29 May 2008 10:12:55 -0400
* Message-id: <26529(dot)1212070375(at)sss(dot)pgh(dot)pa(dot)us>
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00913.php
The Postgres core team met at PGCon to discuss a few issues, the largest
of which is the need for simple, built-in replication for PostgreSQL.
Historically the project policy has been to avoid putting replication
into core PostgreSQL, so as to leave room for development of competing
solutions, recognizing that there is no “one size fits all” replication
solution. However, it is becoming clear that this policy is hindering
acceptance of PostgreSQL to too great an extent, compared to the benefit
it offers to the add-on replication projects. Users who might consider
PostgreSQL are choosing other database systems because our existing
replication options are too complex to install and use for simple cases.
In practice, simple asynchronous single-master-multiple-slave
replication covers a respectable fraction of use cases, so we have
concluded that we should allow such a feature to be included in the core
project. We emphasize that this is not meant to prevent continued
development of add-on replication projects that cover more complex use
cases.We believe that the most appropriate base technology for this is
probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
We hope that such a feature can be completed for 8.4. Ideally this
would be coupled with the ability to execute read-only queries on the
slave servers, but we see technical difficulties that might prevent that
from being completed before 8.5 or even further out. (The big problem
is that long-running slave-side queries might still need tuples that are
vacuumable on the master, and so replication of vacuuming actions would
cause the slave’s queries to deliver wrong answers.)Again, this will not replace Slony, pgPool, Continuent, Londiste, or
other systems for many users, as it will be not be highly scalable nor
support long-distance replication nor replicating less than an entire
installation. But it is time to include a simple, reliable basic
replication feature in the core system.—regards, tom lane
May 26, 2008
[Jignesh K. Shah] Problems with PG on Multi-Cores with Multi-TB Data
Problems with PostgreSQL on Multi-core Systems with Multi-Terabyte Data.
– by Jignesh K. Shah

PDF: http://www.blowfisher.net/pubs/pgcon_problems-JigneshK.Shah.pdf
Current Market Trends in Systems
• Quad-core sockets are current market standards
> Also 8-core sockets available now and could become a standard in next couple of year• Most common rack servers now have two sockets
> 8-core (or more ) systems are the norm with trend going to 12-16 core systems soon• Most Servers have internal drives > 146 GB
> Denser in capacity, smaller in size but essentially same or lower speed
> More denser in case of SATA-II drives
Current Market Trends in Software
• Software (including Operating Systems) have yet to fully catch up with multi-core systems
> “tar” still single process utility• Horizontal Scaling helps a lot but not a good clean solution for multi-core systems
• Virtualization is the new buzzword for Consolidations
> Hides the fact that the software is not able to fully capitalize the extra cores![]()
• Research being done on new paradigms
> Complexity of parallelized software is huge
Current Market Trends in Data
• 12 years ago, a 20GB data warehouse was considered a big database
• Now everybody talks about 200GB-5TB databases
• Some 2005 Survey numbers:
> Top OLTP DB sizes = 5,973 GB to 23,101 GB
> Top DW DB Sizes = 17,685 GB to 100,386 GB
> Source http://www.wintercorp.com/VLDB/2005_TopTen_Survey/TopTenWinners_2005.asp• Some 2007 Survey numbers:
> Top DB sizes = 20+ TB to 220 TB ( 6+ PB on tape)
> Source http://www.businessintelligencelowdown.com/2007/02/top_10_largest.html
April 9, 2008
[mailist] How to handling large volumes of data on PostgreSQL?
mailing list: pgsql-admin.postgresql.org
from: Johann Spies
..loaded about 4,900,000,000 in one of two tables with 7200684 in the second table in database ‘firewall’, built one index using one date-field (which took a few days) and used that index to copy about 3,800,000,000 of those records from the first to a third table, deleted those copied record from the first table and dropped the third table.
This took about a week on a 2xCPU quadcore server with 8Gb RAM..—
Table paritioning is need.
distribute tables across different disks through tablespaces.Tweak the shared buffers and work_mem settings.
RAID5/6 are very,very slow when it comes to small disk *writes*.
At least a hardware RAID controller with RAID 0 or 10 should be used, with 10krpm or 15krpm drives. SAS preferred.
as on SATA the only quick disks are Western Digital Raptor.
look at a view called pg_stat_activity. Do: select * from pg_stat_activity;
March 25, 2008
关于 JDBC, ORM, OLTP, OLAP…
Non-ORM layers over JDBC?
If you want JDBC access that is more closely integrated into the language I
would suggest using Groovy. It REALLY simplifies JDBC access because of
Groovy’s dynamic typing, which is basically the same thing as using variant
data types in C++, at least syntactically. Groovy’s way of executing JDBC’s
statements is also much easier to use. Groovy compiles to Java class files
and the JVM doesn’t know the difference. The groovy runtime/library is just
a jar file that you stick on your classpath.ORM for me works really well in OLTP situations. If I am doing pure OLTP I
rarely need to go outside of my ORM access layer, which is Hibernate.
Hibernate’s query language (HQL) has lots of features to make writing SQL
queries easier and lots of features to minimize performance problems. If you
are used to SQL, it make take a little getting used to because HQL is more
abstract than SQL. It’s like making a jump from C to Java, more abstraction,
less code, less raw power.If you have lots of screens where users are basically building up sql queries,
using forms, then Hibernate’s query by criteria makes this easy because you
are not longer manually building up SQL (or HQL) queries by hand (which is
really error prone). All of my complicated search screens use this feature
of Hibernate.ORM falls down badly for two things: 1) OLAP style database work and 2) Batch
processing. OLAP depends way too much on specific database facilities to
make things fast, which Hibernate can’t take advantage of. Batch processing
chokes because Hibernate will cache too much because it is trying to optimize
OLTP style interactions.–David Clark
March 20, 2008
TB级 PostgreSQL 拾零
有关TB级以上 PostgreSQL 数据库的一些信息:
Well I can’t speak to MS SQL-Server because all of our clients run
PostgreSQL.. I can tell you we have many that are in the 500GB -
1.5TB range.All perform admirably as long as you have the hardware behind it and are
doing correct table structuring (such as table partitioning).Sincerely,
–Joshua D. Drake
We have several TB database in production and it works well on
HP rx1620 dual Itanium2, MSA 20, running Linux. It’s read-only storage for
astronomical catalogs with about 4-billions objects. We have custom
index for spherical coordinates which provide great performance.–Oleg
I had a client that tried to use Ms Sql Server to run a 500Gb+ database.
The database simply colapsed. They switched to Teradata and it is
running good. This database has now 1.5Tb+.Currently I have clients using postgresql huge databases and they are
happy. In one client’s database the biggest table has 237Gb+ (only 1
table!) and postgresql run the database without problem using
partitioning, triggers and rules (using postgresql 8.2.5).–Pablo
I think either would work; both PostgreSQL and MS SQL Server have
success stories out there running VLDBs. It really depends on what you
know and what you have. If you have a lot of experience with Postgres
running on Linux, and not much with SQL Server on Windows, of course the
former would be a better choice for you. You stand a much better chance
working with tools you know.–Pablo Alcaraz
All of those responses have cooked up quite a few topics into one. Large
databases might mean text warehouses, XML message stores, relational
archives and fact-based business data warehouses.The main thing is that TB-sized databases are performance critical. So
it all depends upon your workload really as to how well PostgreSQL, or
another other RDBMS vendor can handle them.Anyway, my reason for replying to this thread is that I’m planning
changes for PostgreSQL 8.4+ that will make allow us to get bigger and
faster databases. If anybody has specific concerns then I’d like to hear
them so I can consider those things in the planning stages.–Simon Riggs
March 5, 2008
Oracle on eBay goes over 5 petabytes
Oracle on eBay goes over 5 petabytes
Curt Monash, the Harvard teen PhD genius., notes that eBay’s Oracle database has now exceeded five petabytes:
From Oliver Ratzesberger’s LinkedIn profile:
Our systems process in excess of 10 billion records per day, serving thousands of users and delivering hundreds of millions of queries per month in a true global 24×7 operation with distributed teams around the globe on systems over 5 PB in size (largest single system >1.4PB).
With multi-petabyte databases becoming common, it’s only a matter of time until Oracle will support Petafiles . . . .
eBay is noted as one of Oracle’s premier web apps, with over 20 billion transactions per day:
* Over 212 million registered users
* Two Petabytes of Data
* 26 billion SQL executions per day
* 99.94% available
* One billion page views per day
* Uses Sun e10k serversCurt Monash also notes:
“eBay’s biggest analytic database is 1.4 petabytes of disk, holding between 1/2 and 1 petabyte of user data, and running (I’m pretty certain) on Teradata.”
February 23, 2008
PostgreSQL 规模参考:400G+容量,N亿条记录
400G容量,N亿条记录
With our previous solution, we found it difficult to support databases over 50GB. Now, because of PostgreSQL, we are easily working with databases with 400GB+ of data and hundreds of millions of records… It has brought us to our markets spotlight and has played a primary role in our continuing growth of revenue.
Joshua Marsh, Listfusion
February 21, 2008
PostgreSQL: More Traffic than MySQL
MarkMail now has 635,000 PostgreSQL emails loaded and searchable
Comparing PostgreSQL and MySQL is kind of interesting. With all the talk about the LAMP (Linux/Apache/MySQL/PHP-Perl-Python) architecture you’d think MySQL had a lock on the open source database market, but based on simple message traffic analytics, PostgreSQL has a much higher level of community involvement.
来源: http://markmail.blogspot.com/2008/02/postgresql-more-traffic-than-mysql-and.html
February 19, 2008
建立PostgreSQL数据库热备系统
Creating an 8.2 warm-standby demo system
From: “Charles Duffy”
http://archives.postgresql.org/sydpug/2006-10/msg00001.php
February 18, 2008
[PGSQL-MAILIST]: Why isn’t an index being used when selecting a distinct value?
from Keaton Adams kadams#mxlogic.com
Version: Postgres 8.1.4
Platform: RHELGiven this scenario with the indexes in place, when I ask for the distinct field1_id values, why does the optimizer choose a sequential scan instead of just reading from the kda_log_fid_cre_20080123_idx index? The time it takes to perform the sequential scan against 20+ million records is way too slow.
CREATE TABLE kda_log_20080213 (
"field1" character varying(255),
field character varying(100),
value bigint,
period integer DEFAULT 60,
created timestamp with time zone DEFAULT ('now'::text)::timestamp(6) with time zone,
customer_id integer,
field1_id integer
);
CREATE INDEX kda_log_cid_cre_fld_20080213_idx ON kda_log_20080213 USING btree (customer_id, created, "field1");
CREATE INDEX kda_log_fid_cre_20080213_idx ON kda_log_20080213 USING btree (field1_id, created);# explain select distinct field1_id into temp kda_temp from kda_log_20080213;
QUERY PLAN
----------------------------------------------------------------------------------------------
Unique (cost=5759201.93..5927827.87 rows=8545 width=4)
-> Sort (cost=5759201.93..5843514.90 rows=33725188 width=4)
Sort Key: field1_id
-> Seq Scan on kda_log_20080213 (cost=0.00..748067.88 rows=33725188 width=4)
(4 rows)
“Keaton Adams”
> Version: Postgres 8.1.4
> Platform: RHEL
>
> Given this scenario with the indexes in place, when I ask for the distinct
> field1_id values, why does the optimizer choose a sequential scan instead of
> just reading from the kda_log_fid_cre_20080123_idx index? The time it takes
> to perform the sequential scan against 20+ million records is way too slow.Try (temporarily) doing:
SET enable_seqscan = off;> keaton=# explain select distinct field1_id into temp kda_temp from
> kda_log_20080213;If the database is right that will be even slower. Using a full index scan
requires a lot of random access seeks, generally the larger the table the
*more* likely a sequential scan and sort is a better approach than using an
index.If it’s wrong and it’s faster then you have to consider whether it’s only
faster because you’ve read the table into cache already. Will it be in cache
in production? If so then you migth try raising effective_cache_size or
lowering random_page_cost.Another thing to try is using GROUP BY instead of DISTINCT. This is one case
where the postgres optimizer doesn’t handle the two equivalent cases in
exactly the same way and there are some plans available in one method that
aren’t in the other. That’s only likely to help if you have relative few
values of field1_id but it’s worth trying.
February 17, 2008
把你的数据交由PostgreSQL管理的10个理由
10 reasons to entrust your data to PostgreSQL.
1. Data Integrity
2. Freedom
3. Standard compliance
4. Clear and robust source code
5. Performance
6. Scalability
7. Native full text search engine
8. Many stored procedures languages
9. Mutiple replication tools
10. An active international community>
February 12, 2008
扔掉 ORM 的理由
how to check if given update is from trigger or why i hate orms?
* we use orm
* yes, and?
* and we have a problem with it…
* you already said thatnow, imagine a simple scenario:
* objects table (let’s name it “topics”), which contains column: posts_count
* sub-objects table (posts), with foreign key to topics
* triggers which update posts_count on insert/update/delete on postssimple? yes? well. it was. now, enter disaster: orm.
— by depesz
PostgreSQL: work_mem tuning and log_temp_files
work_mem is used for sorts, aggregates, and a few other operations. This is non-shared memory, which is allocated per-operation (one to several times per query); the setting is used to put a limit on the amount of RAM any single operation can grab before being forced to disk through temp files. If it is been forced to the disk that can cause a bad execution time
With PostgreSQL 8.3 a new variable got added names as “log_temp_files” that enables/disables whether temporary files are logged when deleted. These temp files can be created for sorts, hashes, and temporary query results.
log_temp_files can be used to better tune the values for work_mem after analyzing the usage of temp files (from the db server logs) for a certain query so that you can minimize or nullify the temp file usage by raising the values for work_mem.
-Shoaib (shoaibmir[@]gmail.com)
February 5, 2008
PostgreSQL 性能参照一则
来自 PostgreSQL case studies 的一则参考数据:
We process over 18,000 queries per second on over 300GB of user data on our PostgreSQL servers, and those numbers go up every month.
Gavin Roy, CTO of MyYearbook.com
18,000 查询/秒
数据库容量: 300GB
PostgreSQL 8.3 正式发布
新增特性包括:
* Heap Only Tuples
* BGWriter Autotuning
* Asynchronous Commit
* Spread Checkpoints
* Synchronous Scan
* "Var-Varlena"
* L2 Cache Protection
* Lazy XID
* CSV Logging
* SQL/XML
* MS Visual C++ support
* ENUMs
* Integrated Tsearch
* SSPI & GSSAPI
* Composite Type Arrays
* pg_standby
详细参考:
http://www.postgresql.org/about/press/features83.html
January 28, 2008
MySQL又见一存储引擎 “Maria”
“Maria” ,-”a crash-safe alternative to MyISAM”,似乎有作为 InnoDB 继任者的可能。
Yet Another Engine
Apparently Monty and crew have been hard at work on yet another engine for MySQL, this one called “Maria” and is billed as, and I quote, “a crash-safe alternative to MyISAM” (hint, hint, for people still using MyISAM). Apparently it also will support a transactional mode as well, making it a possible successor to InnoDB, and perhaps a competitor to Falcon. It’s not clear how Falcon and Maria will interact in future versions of MySQL (I’d ask on Monty’s blog but it’s one of those unfriendly ones that won’t let the general public post to it).
I’ve still yet to be convinced that the multiple-engine theory is a good one, but it’s good to see that MySQL AB is recognizing the flaws of MyISAM and InnoDB (technical and legal, respectively) and making progress to replace them. Interesting times ahead.
http://people.planetpostgresql.org/greg/index.php?/archives/121-Yet-Another-Engine.html
Here’s a list of major features of Maria:
* ACID
* Commit/Rollback
* Concurrent selects (thanks to MVCC)
* Row locking
* Group commit
January 26, 2008
January 21, 2008
Postgres is an open-source “PROJECT”
“Postgres is not for sale”
http://people.planetpostgresql.org/greg/index.php?/archives/120-Postgres-is-not-for-sale.html
In light of the recent MySQL sellout, I’d like to once again answer the question that pops up occasionally: “Who will purchase Postgres?” Sure, it brings a smile to those of us immersed in the open source world, but it bears a serious answer: Postgres cannot be bought. While from a distance, MySQL and PostgreSQL look the same (”open-source databases”), they are very different beasts, both in technical and non-technical ways. In a nutshell, the difference can be expressed as:
MySQL is an open-source PRODUCT.
Postgres is an open-source PROJECT.
January 20, 2008
PostgreSQL wins Developer.com Top Database
http://www.postgresql.org/about/news.914
http://www.developer.com/design/article.php/10925_3721761_1
PostgreSQL has won the Developer.com Product of the Year award in the database category.
Quote:
“And the winner is PostgreSQL. This one surprised us and taught us something about our audience base. We need to do a better job of covering this topic! Postgresql.org touts this product as “The world’s most advanced open source database” and after seeing the votes, they may just be correct.”





