[blf@Logging /~]:

October 16, 2008

PostgreSQL 最常用类型存储空间速查

Filed under: database, PostgreSQL — blowfisher @ 9:39 am

PG Types Storage Comparison Sheet:

September 17, 2008

[ZT] Why isn’t PostgreSQL using my index?

Filed under: database, PostgreSQL — blowfisher @ 9:24 am
    ...PostgreSQL's query optimizer is smart, really smart and unless you have evidence otherwise you should trust what it is doing. In this particular case, the optimizer realizes that if a table has only a few rows that using the index is actually slower than just spinning through the entire table. Just because PostgreSQL isn't using your index today with a small number of rows, does not mean it won't choose to use it later when you have more data or the query changes...

August 28, 2008

OpenSQL Camp 2008: a community-organized database conference

Filed under: database, PostgreSQL — blowfisher @ 8:46 am

From announcing-opensql-camp-2008:

Key facts:

  • It is of, by and for the community (you).
  • At this event, all open-source databases are created equal. We’ll learn together and grow together.
    It’s a combination conference and hackathon.
  • It’s free.
  • It is Friday night Nov 14, 18:00 through Sun the 16th at 18:00 in Charlottesville, Virginia USA in a very cool location.
  • The website, where all details will be posted: http://opensqlcamp.pbwiki.com/
  • The mailing group, where details will be discussed, decided and arranged: http://groups.google.com/group/opensqlcamp
  • Date, time and place are confirmed and will not change. You can make your travel plans now. There’s travel information on the wiki.
  • We already have some great speakers who have offered to give great talks. These and other details will show up on the wiki as they’re finalized.
  • Sponsors are needed. See the wiki.

June 5, 2008

[PostgreSQL] pgAdmin III v1.8.4 released

Filed under: database, PostgreSQL — blowfisher @ 7:28 pm

2008-06-05 - pgAdmin III v1.8.4 released.

http://www.postgresql.org/about/news.965

下载地址:
http://www.pgadmin.org/download/

May 31, 2008

[HACKERS] Core team statement on replication in PostgreSQL (备查)

Filed under: database, PostgreSQL — blowfisher @ 10:09 pm

built-in replication in PostgreSQL 看来有戏了,留个记号在此备查!

* From: Tom Lane
* To: pgsql-hackers(at)postgreSQL(dot)org
* Subject: Core team statement on replication in PostgreSQL
* Date: Thu, 29 May 2008 10:12:55 -0400
* Message-id: <26529(dot)1212070375(at)sss(dot)pgh(dot)pa(dot)us>

http://archives.postgresql.org/pgsql-hackers/2008-05/msg00913.php

The Postgres core team met at PGCon to discuss a few issues, the largest
of which is the need for simple, built-in replication for PostgreSQL.
Historically the project policy has been to avoid putting replication
into core PostgreSQL, so as to leave room for development of competing
solutions, recognizing that there is no “one size fits all” replication
solution. However, it is becoming clear that this policy is hindering
acceptance of PostgreSQL to too great an extent, compared to the benefit
it offers to the add-on replication projects. Users who might consider
PostgreSQL are choosing other database systems because our existing
replication options are too complex to install and use for simple cases
.
In practice, simple asynchronous single-master-multiple-slave
replication covers a respectable fraction of use cases, so we have
concluded that we should allow such a feature to be included in the core
project. We emphasize that this is not meant to prevent continued
development of add-on replication projects that cover more complex use
cases.

We believe that the most appropriate base technology for this is
probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
We hope that such a feature can be completed for 8.4. Ideally this
would be coupled with the ability to execute read-only queries on the
slave servers, but we see technical difficulties that might prevent that
from being completed before 8.5 or even further out. (The big problem
is that long-running slave-side queries might still need tuples that are
vacuumable on the master, and so replication of vacuuming actions would
cause the slave’s queries to deliver wrong answers.
)

Again, this will not replace Slony, pgPool, Continuent, Londiste, or
other systems for many users, as it will be not be highly scalable nor
support long-distance replication nor replicating less than an entire
installation. But it is time to include a simple, reliable basic
replication feature in the core system.

—regards, tom lane

May 26, 2008

[Jignesh K. Shah] Problems with PG on Multi-Cores with Multi-TB Data

Filed under: database, PostgreSQL — blowfisher @ 5:00 pm

Problems with PostgreSQL on Multi-core Systems with Multi-Terabyte Data.

– by Jignesh K. Shah

PDF: http://www.blowfisher.net/pubs/pgcon_problems-JigneshK.Shah.pdf

Current Market Trends in Systems

Quad-core sockets are current market standards
> Also 8-core sockets available now and could become a standard in next couple of year

• Most common rack servers now have two sockets
> 8-core (or more ) systems are the norm with trend going to 12-16 core systems soon

• Most Servers have internal drives > 146 GB
> Denser in capacity, smaller in size but essentially same or lower speed
> More denser in case of SATA-II drives

Current Market Trends in Software

• Software (including Operating Systems) have yet to fully catch up with multi-core systems
> “tar” still single process utility

Horizontal Scaling helps a lot but not a good clean solution for multi-core systems

• Virtualization is the new buzzword for Consolidations
> Hides the fact that the software is not able to fully capitalize the extra cores :-(

• Research being done on new paradigms
> Complexity of parallelized software is huge

Current Market Trends in Data

• 12 years ago, a 20GB data warehouse was considered a big database

• Now everybody talks about 200GB-5TB databases

• Some 2005 Survey numbers:
> Top OLTP DB sizes = 5,973 GB to 23,101 GB
> Top DW DB Sizes = 17,685 GB to 100,386 GB
> Source http://www.wintercorp.com/VLDB/2005_TopTen_Survey/TopTenWinners_2005.asp

• Some 2007 Survey numbers:
> Top DB sizes = 20+ TB to 220 TB ( 6+ PB on tape)
> Source http://www.businessintelligencelowdown.com/2007/02/top_10_largest.html

April 9, 2008

[mailist] How to handling large volumes of data on PostgreSQL?

Filed under: database, PostgreSQL — blowfisher @ 7:14 pm

mailing list: pgsql-admin.postgresql.org

from: Johann Spies

..loaded about 4,900,000,000 in one of two tables with 7200684 in the second table in database ‘firewall’, built one index using one date-field (which took a few days) and used that index to copy about 3,800,000,000 of those records from the first to a third table, deleted those copied record from the first table and dropped the third table.
This took about a week on a 2xCPU quadcore server with 8Gb RAM..

Table paritioning is need.

distribute tables across different disks through tablespaces.Tweak the shared buffers and work_mem settings.

RAID5/6 are very,very slow when it comes to small disk *writes*.

At least a hardware RAID controller with RAID 0 or 10 should be used, with 10krpm or 15krpm drives. SAS preferred.

as on SATA the only quick disks are Western Digital Raptor.

look at a view called pg_stat_activity. Do: select * from pg_stat_activity;

March 25, 2008

关于 JDBC, ORM, OLTP, OLAP…

Filed under: database, PostgreSQL — blowfisher @ 9:11 am

Non-ORM layers over JDBC?

If you want JDBC access that is more closely integrated into the language I
would suggest using Groovy. It REALLY simplifies JDBC access because of
Groovy’s dynamic typing, which is basically the same thing as using variant
data types in C++, at least syntactically. Groovy’s way of executing JDBC’s
statements is also much easier to use. Groovy compiles to Java class files
and the JVM doesn’t know the difference. The groovy runtime/library is just
a jar file that you stick on your classpath.

ORM for me works really well in OLTP situations. If I am doing pure OLTP I
rarely need to go outside of my ORM access layer, which is Hibernate.
Hibernate’s query language (HQL) has lots of features to make writing SQL
queries easier and lots of features to minimize performance problems
. If you
are used to SQL, it make take a little getting used to because HQL is more
abstract than SQL. It’s like making a jump from C to Java, more abstraction,
less code, less raw power.

If you have lots of screens where users are basically building up sql queries,
using forms, then Hibernate’s query by criteria makes this easy because you
are not longer manually building up SQL (or HQL) queries by hand (which is
really error prone). All of my complicated search screens use this feature
of Hibernate.

ORM falls down badly for two things: 1) OLAP style database work and 2) Batch
processing
. OLAP depends way too much on specific database facilities to
make things fast, which Hibernate can’t take advantage of
. Batch processing
chokes because Hibernate will cache too much because it is trying to optimize
OLTP style interactions.

–David Clark

March 20, 2008

TB级 PostgreSQL 拾零

Filed under: database, FreeBSD, PostgreSQL — blowfisher @ 10:45 am

有关TB级以上 PostgreSQL 数据库的一些信息:

Well I can’t speak to MS SQL-Server because all of our clients run
PostgreSQL ;).. I can tell you we have many that are in the 500GB -
1.5TB
range.

All perform admirably as long as you have the hardware behind it and are
doing correct table structuring (such as table partitioning).

Sincerely,

–Joshua D. Drake

We have several TB database in production and it works well on
HP rx1620 dual Itanium2, MSA 20, running Linux. It’s read-only storage for
astronomical catalogs with about 4-billions objects. We have custom
index for spherical coordinates which provide great performance.

–Oleg

I had a client that tried to use Ms Sql Server to run a 500Gb+ database.
The database simply colapsed. They switched to Teradata and it is
running good. This database has now 1.5Tb+.

Currently I have clients using postgresql huge databases and they are
happy. In one client’s database the biggest table has 237Gb+ (only 1
table!)
and postgresql run the database without problem using
partitioning, triggers and rules (using postgresql 8.2.5).

–Pablo

I think either would work; both PostgreSQL and MS SQL Server have
success stories out there running VLDBs. It really depends on what you
know and what you have. If you have a lot of experience with Postgres
running on Linux, and not much with SQL Server on Windows, of course the
former would be a better choice for you. You stand a much better chance
working with tools you know.

–Pablo Alcaraz

All of those responses have cooked up quite a few topics into one. Large
databases might mean text warehouses, XML message stores, relational
archives and fact-based business data warehouses
.

The main thing is that TB-sized databases are performance critical. So
it all depends upon your workload really as to how well PostgreSQL, or
another other RDBMS vendor can handle them.

Anyway, my reason for replying to this thread is that I’m planning
changes for PostgreSQL 8.4+ that will make allow us to get bigger and
faster databases. If anybody has specific concerns then I’d like to hear
them so I can consider those things in the planning stages.

–Simon Riggs

March 5, 2008

Oracle on eBay goes over 5 petabytes

Filed under: database — blowfisher @ 8:23 am

Oracle on eBay goes over 5 petabytes

Curt Monash, the Harvard teen PhD genius., notes that eBay’s Oracle database has now exceeded five petabytes:

From Oliver Ratzesberger’s LinkedIn profile:

Our systems process in excess of 10 billion records per day, serving thousands of users and delivering hundreds of millions of queries per month in a true global 24×7 operation with distributed teams around the globe on systems over 5 PB in size (largest single system >1.4PB).

With multi-petabyte databases becoming common, it’s only a matter of time until Oracle will support Petafiles . . . .

eBay is noted as one of Oracle’s premier web apps, with over 20 billion transactions per day:

* Over 212 million registered users
* Two Petabytes of Data
* 26 billion SQL executions per day
* 99.94% available
* One billion page views per day
* Uses Sun e10k servers

Curt Monash also notes:

“eBay’s biggest analytic database is 1.4 petabytes of disk, holding between 1/2 and 1 petabyte of user data, and running (I’m pretty certain) on Teradata.”

February 23, 2008

PostgreSQL 规模参考:400G+容量,N亿条记录

Filed under: database, PostgreSQL — blowfisher @ 7:30 pm

400G容量,N亿条记录

With our previous solution, we found it difficult to support databases over 50GB. Now, because of PostgreSQL, we are easily working with databases with 400GB+ of data and hundreds of millions of records… It has brought us to our markets spotlight and has played a primary role in our continuing growth of revenue.

Joshua Marsh, Listfusion

http://www.postgresql.org/about/quotesarchive

February 21, 2008

PostgreSQL: More Traffic than MySQL

Filed under: database, mysql, PostgreSQL — blowfisher @ 11:41 pm

MarkMail now has 635,000 PostgreSQL emails loaded and searchable

Comparing PostgreSQL and MySQL is kind of interesting. With all the talk about the LAMP (Linux/Apache/MySQL/PHP-Perl-Python) architecture you’d think MySQL had a lock on the open source database market, but based on simple message traffic analytics, PostgreSQL has a much higher level of community involvement.

来源: http://markmail.blogspot.com/2008/02/postgresql-more-traffic-than-mysql-and.html

February 19, 2008

建立PostgreSQL数据库热备系统

Filed under: database, PostgreSQL — blowfisher @ 5:09 pm

Creating an 8.2 warm-standby demo system

From: “Charles Duffy”
http://archives.postgresql.org/sydpug/2006-10/msg00001.php

February 18, 2008

[PGSQL-MAILIST]: Why isn’t an index being used when selecting a distinct value?

Filed under: database, PostgreSQL — blowfisher @ 3:11 pm

from Keaton Adams kadams#mxlogic.com

Version: Postgres 8.1.4
Platform: RHEL

Given this scenario with the indexes in place, when I ask for the distinct field1_id values, why does the optimizer choose a sequential scan instead of just reading from the kda_log_fid_cre_20080123_idx index? The time it takes to perform the sequential scan against 20+ million records is way too slow.

CREATE TABLE kda_log_20080213 (
"field1" character varying(255),
field character varying(100),
value bigint,
period integer DEFAULT 60,
created timestamp with time zone DEFAULT ('now'::text)::timestamp(6) with time zone,
customer_id integer,
field1_id integer
);
CREATE INDEX kda_log_cid_cre_fld_20080213_idx ON kda_log_20080213 USING btree (customer_id, created, "field1");
CREATE INDEX kda_log_fid_cre_20080213_idx ON kda_log_20080213 USING btree (field1_id, created);

# explain select distinct field1_id into temp kda_temp from kda_log_20080213;

QUERY PLAN
----------------------------------------------------------------------------------------------
Unique (cost=5759201.93..5927827.87 rows=8545 width=4)
-> Sort (cost=5759201.93..5843514.90 rows=33725188 width=4)
Sort Key: field1_id
-> Seq Scan on kda_log_20080213 (cost=0.00..748067.88 rows=33725188 width=4)
(4 rows)

“Keaton Adams” writes:

> Version: Postgres 8.1.4
> Platform: RHEL
>
> Given this scenario with the indexes in place, when I ask for the distinct
> field1_id values, why does the optimizer choose a sequential scan instead of
> just reading from the kda_log_fid_cre_20080123_idx index? The time it takes
> to perform the sequential scan against 20+ million records is way too slow.

Try (temporarily) doing:

SET enable_seqscan = off;

> keaton=# explain select distinct field1_id into temp kda_temp from
> kda_log_20080213;

If the database is right that will be even slower. Using a full index scan
requires a lot of random access seeks
, generally the larger the table the
*more* likely a sequential scan and sort is a better approach than using an
index
.

If it’s wrong and it’s faster then you have to consider whether it’s only
faster because you’ve read the table into cache already. Will it be in cache
in production? If so then you migth try raising effective_cache_size or
lowering random_page_cost
.

Another thing to try is using GROUP BY instead of DISTINCT. This is one case
where the postgres optimizer doesn’t handle the two equivalent cases in
exactly the same way and there are some plans available in one method that
aren’t in the other. That’s only likely to help if you have relative few
values of field1_id but it’s worth trying.

February 17, 2008

把你的数据交由PostgreSQL管理的10个理由

Filed under: database, PostgreSQL — blowfisher @ 8:48 pm

10 reasons to entrust your data to PostgreSQL.

1. Data Integrity
2. Freedom
3. Standard compliance
4. Clear and robust source code
5. Performance
6. Scalability
7. Native full text search engine
8. Many stored procedures languages
9. Mutiple replication tools
10. An active international community>

February 12, 2008

扔掉 ORM 的理由

Filed under: database — blowfisher @ 11:49 pm

how to check if given update is from trigger or why i hate orms?

* we use orm
* yes, and?
* and we have a problem with it…
* you already said that

now, imagine a simple scenario:

* objects table (let’s name it “topics”), which contains column: posts_count
* sub-objects table (posts), with foreign key to topics
* triggers which update posts_count on insert/update/delete on posts

simple? yes? well. it was. now, enter disaster: orm.

— by depesz

PostgreSQL: work_mem tuning and log_temp_files

Filed under: database, PostgreSQL — blowfisher @ 6:48 pm

work_mem 优化及 log_temp_files

work_mem is used for sorts, aggregates, and a few other operations. This is non-shared memory, which is allocated per-operation (one to several times per query); the setting is used to put a limit on the amount of RAM any single operation can grab before being forced to disk through temp files. If it is been forced to the disk that can cause a bad execution time

With PostgreSQL 8.3 a new variable got added names as “log_temp_files” that enables/disables whether temporary files are logged when deleted. These temp files can be created for sorts, hashes, and temporary query results.

log_temp_files can be used to better tune the values for work_mem after analyzing the usage of temp files (from the db server logs) for a certain query so that you can minimize or nullify the temp file usage by raising the values for work_mem.

-Shoaib (shoaibmir[@]gmail.com)

相关参考: PostgreSQL Blog’s 8.3 Feature Round-Up

February 5, 2008

PostgreSQL 性能参照一则

Filed under: database, PostgreSQL — blowfisher @ 11:21 pm

来自 PostgreSQL case studies 的一则参考数据:

We process over 18,000 queries per second on over 300GB of user data on our PostgreSQL servers, and those numbers go up every month.
Gavin Roy, CTO of MyYearbook.com

18,000 查询/秒
数据库容量: 300GB

PostgreSQL 8.3 正式发布

Filed under: database, PostgreSQL — blowfisher @ 11:11 pm

新增特性包括:
* Heap Only Tuples
* BGWriter Autotuning
* Asynchronous Commit
* Spread Checkpoints
* Synchronous Scan
* "Var-Varlena"
* L2 Cache Protection
* Lazy XID
* CSV Logging
* SQL/XML
* MS Visual C++ support
* ENUMs
* Integrated Tsearch
* SSPI & GSSAPI
* Composite Type Arrays
* pg_standby

详细参考:
http://www.postgresql.org/about/press/features83.html

各历史版本特性对照:
http://www.postgresql.org/about/featurematrix

January 28, 2008

MySQL又见一存储引擎 “Maria”

Filed under: database, mysql — blowfisher @ 10:43 pm

“Maria” ,-”a crash-safe alternative to MyISAM”,似乎有作为 InnoDB 继任者的可能。

Yet Another Engine

Apparently Monty and crew have been hard at work on yet another engine for MySQL, this one called “Maria” and is billed as, and I quote, “a crash-safe alternative to MyISAM” (hint, hint, for people still using MyISAM). Apparently it also will support a transactional mode as well, making it a possible successor to InnoDB, and perhaps a competitor to Falcon. It’s not clear how Falcon and Maria will interact in future versions of MySQL (I’d ask on Monty’s blog but it’s one of those unfriendly ones that won’t let the general public post to it).

I’ve still yet to be convinced that the multiple-engine theory is a good one, but it’s good to see that MySQL AB is recognizing the flaws of MyISAM and InnoDB (technical and legal, respectively) and making progress to replace them. Interesting times ahead.

http://people.planetpostgresql.org/greg/index.php?/archives/121-Yet-Another-Engine.html

Here’s a list of major features of Maria:

* ACID
* Commit/Rollback
* Concurrent selects (thanks to MVCC)
* Row locking
* Group commit

http://java.dzone.com/news/mysql-creator-releases-maria-n

January 26, 2008

PostgreSQL vs MySQL on 16-core xeon, Freebsd 7.0

Filed under: mysql, FreeBSD, PostgreSQL — blowfisher @ 1:48 am
Jeff Roberson: non-uniform cpu scheduling.

from Planet FreeBSD by nospam@nospam.com


January 21, 2008

Postgres is an open-source “PROJECT”

Filed under: mysql, PostgreSQL — blowfisher @ 9:25 am

“Postgres is not for sale”

http://people.planetpostgresql.org/greg/index.php?/archives/120-Postgres-is-not-for-sale.html

In light of the recent MySQL sellout, I’d like to once again answer the question that pops up occasionally: “Who will purchase Postgres?” Sure, it brings a smile to those of us immersed in the open source world, but it bears a serious answer: Postgres cannot be bought. While from a distance, MySQL and PostgreSQL look the same (”open-source databases”), they are very different beasts, both in technical and non-technical ways. In a nutshell, the difference can be expressed as:

MySQL is an open-source PRODUCT.

Postgres is an open-source PROJECT.

January 20, 2008

PostgreSQL wins Developer.com Top Database

Filed under: PostgreSQL — blowfisher @ 12:15 am

http://www.postgresql.org/about/news.914
http://www.developer.com/design/article.php/10925_3721761_1

PostgreSQL has won the Developer.com Product of the Year award in the database category.

Quote:

“And the winner is PostgreSQL. This one surprised us and taught us something about our audience base. We need to do a better job of covering this topic! Postgresql.org touts this product as “The world’s most advanced open source database” and after seeing the votes, they may just be correct.”

January 17, 2008

Sun 10 亿美元收购 MySQL

Filed under: sun, mysql — blowfisher @ 9:54 am

2008-1-16: Sun 10 亿美元收购 MySQL。

Sun Microsystems Announces Agreement to Acquire MySQL, Developer of the World’s Most Popular Open Source Database

Sun Growth Strategy Accelerates With New Position in $15 Billion Database Market

SANTA CLARA, CA January 16, 2008 Sun Microsystems, Inc. (NASDAQ: JAVA) today announced it has entered into a definitive agreement to acquire MySQL AB, an open source icon and developer of one of the world’s fastest growing open source databases for approximately $1 billion in total consideration. The acquisition accelerates Sun’s position in enterprise IT to now include the $15 billion database market. Today’s announcement reaffirms Sun’s position as the leading provider of platforms for the Web economy and its role as the largest commercial open source contributor.

http://mysql.com/news-and-events/sun-to-acquire-mysql.html
http://www.sun.com/aboutsun/media/presskits/2008-0116/index.jsp?intcmp=hp2008jan16_mysql_learn

Couple of random thoughts

* First, better Sun than Oracle. The thought of Oracle eating up MySQL has always been rather disturbing to us especially since we do a fair amount of MySQL consulting and don’t care much for Oracle as a company. I suppose it could still happen.
* Given the fact that Sun is a large contributor to the PostgreSQL project, does this mean PostgreSQL fans can’t make fun of MySQL anymore? Are we like friends now? This could take away some fun and add a little fun at the same time.
* Will this mean MySQL will have no qualms of using PostgreSQL underlying storage engine and what would it be called? - MyPost

http://www.postgresonline.com/journal/index.php?/archives/23-guid.html

January 9, 2008

PostgreSQL 8.3 RC1 发布

Filed under: PostgreSQL — blowfisher @ 9:41 am

PostgreSQL 8.3 RC1 版本已发布。
8.3 RC1 包含了所有自 2008-01-07 的累积安全更新

下载:
二进制包
源代码

http://www.postgresql.org/about/news.907

January 7, 2008

PostgreSQL 发布更新版本 8.2.6, 8.1.11, 8.0.15, 7.4.19 & 7.3.21

Filed under: PostgreSQL — blowfisher @ 9:55 pm

1月7日,PostgreSQL 全球开发组发布了 PostgreSQL 更新版本,修正了 5 个可能的安全漏洞。

更新版本号:
8.2.6, 8.1.11, 8.0.15, 7.4.19, 7.3.21

http://www.postgresql.org/about/news.905

下载地址:
http://www.postgresql.org/ftp/binary

January 6, 2008

PostgreSQL 速查表

Filed under: Photo, PostgreSQL — blowfisher @ 2:23 am

两张 老外 制作的 PostgreSQL 日常使用速查表,非常方便实用。

PDF HTML

下载图片: 1.1724×2227 2.761×1107

January 4, 2008

Google Maps 上的奇景!

Filed under: Photo, PostgreSQL, Map, Google — blowfisher @ 3:57 pm

Google Maps 上的 奇景
把地图缩放到最大,注意不要拖动! 直接点 Zoom…

几张 截图: 1 2 3 4

链接来源:
http://maps.google.com/maps?z=1&t=h&ll=10.903707,19.933068

www.blowfisher.net  |  Powered by WP