123
 123

Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes --

2010-06-03 Thu

12:34 Beginner/Expert (1 Bytes) » Oracle Scratchpad
09:17 Presentation: Cassandra Basics - Indexing (2170 Bytes) » myNoSQL

A very informative presentation by Benjamin Black on Cassandra indexing:

There are so many interesting things to learn from these slides. Benjamin is briefly introducing the main Cassandra terms — if you are not familiar with them you can read more in this Cassandra tutorial — and moves to explain how column sorting and partitioning strategies should be used. Also to mention, some really quotable fragments from the deck:

Relational stores are schema oriented. Start from your schema & work forwards

Column stores are query oriented. Start from your queries & work backwards

Cassandra is an index construction kit

07:09 A Universal SQL Client (1024 Bytes) » DBA@SKY-MOBI
非常有幸能够管理越来越多的数据库,不光是数量多,平台也越来越多,管理软件也是层出不穷。 就拿开发工具来说,PostgreSQL下面用pgadmin的比较多,oracle平台下面用pl/psql的比较多,mysql下面用navicat 比较多。还有SQLSERVER,DB2等等。 这些都是与数据库绑定性比较高的开发管理工具,可能能用到的特性也比较多。 但是如果只是平常用用的话,其实可以选择一个可以通用的SQL 开发GUI工具,如采用JAVA开发的使用JDBC连接数据库的开发工具。 SQuirrel SQL是其中的代表作之一。 可以支持的数据库平台非常之多: ORACLE POSTGRESQL DB2 GREENPLUM FIREBIRD SYBASE H2 HSQLDB HXTT INFORMIX MYSQL 等等。 当然除了支持很多平台之外也有其他非常优秀的特点: 可编辑SQL结果。 ER图展示。 等等 还有很多插件可供下载: 数据导入插件。 SQL语法高亮显示插件。 SQL脚本执行插件。 等等。 是一个非常好用的工具之一.
06:39 Oracle读取事件的命名理由 (10880 Bytes) » dbthink

db file sequential read与db file scattered read这两个等待事件在今天又让我们重温了一下2年前的一次争论, 大家对于这两个事件以及其名称的理解进行了多个角度的解读, 我觉得我有责任将Jeff Holt在多年前的这篇旧文翻译出来, 以助后来者理解这两个让人迷惑的概念. 翻译的比较仓促, 不足之处尽情谅解

本文的原文链接: Why are Oracle’s Read Events Named Backwards?
原文版权归Jeff Holt以及Hotsos公司所有.

为什么Oracle的读取事件使用”向后兼容的命名方式”?
By Jeff Holt Translated by Jametong

在几乎所有的Oracle跟踪文件中,都有两个出现频繁的事件:db file sequential readdb file scattered read.这些事件表明Oracle内核请求从磁盘读取数据块.

当我们想到磁盘时,我们熟悉的顺序访问(sequential read)的概念是,一个进程从磁盘读取大块的连续数据.当Oracle执行一次全表扫描时,会在一次大I/O中读取多个Oracle数据块的数据,它是一次顺序读取.当然,在我们想到磁盘时,也会习惯于随机访问(random access)这个概念. 在一个有合理设计的索引的查询中,SQL 语句通常会使用随机访问来完成单块读取(single block read)调用.

然而,如果你曾经读过任何关于Oracle等待事件的内容,都会发现db file sequential read是用来表示随机读取(例如,索引扫描),db file scattered read是用来表示顺序读取(例如,全表扫描). Oracle使用这个术语表示确实不是很直观. 这使得部分猜测Oracle的事件命名是一种不经意的向后兼容命名. 嗯,这个理论很有意思,不过事实不是这样. 这些事件之所以如此命名确实有个很好的理由.

事件名db file sequential read与db file scattered read描述的是如何将数据块存储到内存中的,而不是如何从磁盘进行读取. Oracle对这两个事件的命名类似于Unix对两类读取调用的命名,函数read()与readv()是其中的代表. Unix的read()函数读取文件中的一段连续的内容,并将数据的不同片段存储到不同的内存区域,此区域由一个内存应用数组所支配. 由类似于read()调用执行的Oracle磁盘读被记录为db file sequential read事件,由类似于readv()调用执行的磁盘读被记录为db file scattered read 事件.

如果填充磁盘读取的内容的内存是连续的,发生的磁盘读就是db file sequential read.

  • Oracle 为所有的单块读取生成db file sequential read事件.Oracle始终将单个数据块存储在单个缓存缓冲块(cache buffer)中,因此单块读取永远不会产生db file scattered read事件.
  • 在Oracle 7.3以及后期版本中,当一个独占进程从磁盘中读取临时段时,Oracle会产生多块(multi-block) db file sequential read事件. Oracle早期版本会使用db file scattered read读取临时段的数据到数据库的高速缓冲池. 更新的版本发现临时段的数据几乎无法被共享或被重新访问到,因此就将其直接读取到服务器进程的程序全局区(PGA, Program Global Area)中了.

当填充从磁盘读取的数据的内存的连续性无法被保证的时候,发生的磁盘读就是db file scattered read. 为了填充一次read()调用读取的数据,Oracle内核可能要花费更多的时间来寻找一组连续的内存.只有当read()调用于readv()调用相对readv()调用的速度优势足够抵消这部分额外的时间消耗时,使用这种方式才会有净收益. 找到连续缓冲块(buffer)的代价可能远远超过read()调用相对于readv()调用带来的好处,推动Oracle的内核开发者选择了他们现在使用的这种算法.

  • 在大部分情况下,全表扫描与快速全索引扫描都会产生一次或多次db file scattered read. 不过,有时,这些扫描只会产生db file sequential read.

例如: 从一个Oracle跟踪文件摘录的如下片段显示,一个全表扫描仅仅使用到了db file sequential read. 这个表存储在文件2的块2到20的连续块中,它的高水位线在第16个块上. 数据库的块大小(block size)是8KB. 由于在此查询执行之前,所有的偶数块都已经存在于数据库高速缓冲池(buffer cache)中,这次扫描的读操作全都是单块读取.

PARSING IN CURSOR #1 len=31 dep=0 uid=18 oct=3 lid=18 tim=26532087 hv=2854987545 ad='82061e04'
select count(1) from test_tab
END OF STMT
PARSE #1:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=4,tim=26532087
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=26532087
WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=3 p3=1
WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=5 p3=1
WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=7 p3=1
WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=9 p3=1
WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=11 p3=1
WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=13 p3=1
WAIT #1: nam='db file sequential read' ela= 0 p1=2 p2=15 p3=1
FETCH #1:c=1,e=1,p=7,cr=14,cu=3,mis=0,r=1,dep=0,og=4,tim=26532088

下面这些来自Solaris truss命令的输出结果展示了这个查询执行的系统调用.第23行到27行展示了文件是如何打开,以及如何以文件句柄409进行引用的.第31行显示了一次pread()系统函数的调用,除了不会与read()一样移动文件指针外,pread()的行为与read()完全一致,它接收文件偏移量作为它的第四个参数.它的返回值是成功读取的字节的大小. 由于timed_statistices实例参数为true, times()调用提供了计时信息.

*** SUID: ruid/euid/suid = 4076 / 108 / 108 ***
read(10, 0x010ACACE, 2064)	(sleeping...)
read(10, "\097\0\006\0\0\0\0\003 ^".., 2064) = 151
times(0xEFFFE360)	= 162380015
times(0xEFFFDA80)	= 162380015
times(0xEFFFD9C0)	= 162380015
times(0xEFFFD9C0)	= 162380015
time()	= 948902570
brk(0x010E9A60)	= 0
brk(0x010EBA60)	= 0
brk(0x010EBA60)	= 0
brk(0x010EDA60)	= 0
times(0xEFFFD9C0)	= 162380015
times(0xEFFFD9C0)	= 162380015
times(0xEFFFC4A0)	= 162380015
times(0xEFFFC4A0)	= 162380015
time()	= 948902570
times(0xEFFFD7A8)	= 162380015
times(0xEFFFD760)	= 162380015
times(0xEFFFC5D8)	= 162380015
times(0xEFFFC578)	= 162380015
times(0xEFFFBE90)	= 162380015
open64("/home/jeffh/test/data/tools01.dbf", O_RDWR|O_DSYNC) = 15
getrlimit(RLIMIT_NOFILE, 0xEFFFBE98)	= 0
fstat64(409, 0xEFFFBE00)	Err#9 EBADF
fcntl(15, F_DUP2FD, 0x00000199)	= 409
close(15)	= 0
fcntl(409, F_SETFD, 0x00000001)	= 0
ioctl(409, 0x0403, 0xEFFFBE5C)	Err#25 ENOTTY
times(0xEFFFBE48)	= 162380016
pread(409, "0602\0\0\080\003\0\0 + B".., 8192, 24576) = 8192
times(0xEFFFC530)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC578)	= 162380016
pread(409, "0602\0\0\080\005\0\0 + V".., 8192, 40960) = 8192
times(0xEFFFC530)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC578)	= 162380016
pread(409, "0602\0\0\080\007\0\0 + V".., 8192, 57344) = 8192
times(0xEFFFC530)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC578)	= 162380016
pread(409, "0602\0\0\080\0\t\0\0 + V".., 8192, 73728) = 8192
times(0xEFFFC530)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC578)	= 162380016
pread(409, "0602\0\0\080\0\v\0\0 + V".., 8192, 90112) = 8192
times(0xEFFFC530)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC578)	= 162380016
pread(409, "0602\0\0\080\0\r\0\0 + V".., 8192, 106496) = 8192
times(0xEFFFC530)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380016
times(0xEFFFC5D8)	= 162380017
times(0xEFFFC578)	= 162380017
pread(409, "0602\0\0\080\00F\0\0 + V".., 8192, 122880) = 8192
times(0xEFFFC530)	= 162380017
times(0xEFFFC5D8)	= 162380017
times(0xEFFFC5D8)	= 162380017
times(0xEFFFDB40)	= 162380017
times(0xEFFFE3A8)	= 162380017
write(13, "\0B0\0\006\0\0\0\0\010\0".., 176) = 176
read(10, "\0 y\0\006\0\0\0\0\003 ^".., 2064) = 121
times(0xEFFFE360)	= 162380017
times(0xEFFFDA80)	= 162380017
times(0xEFFFDB40)	= 162380017
times(0xEFFFE3A8)	= 162380017
times(0xEFFFE360)	= 162380017
lseek(14, 37376, SEEK_SET)	= 37376
read(14, "\0\v04E2\0\0\0 J04E3\0\0".., 512) = 512
times(0xEFFFE3A8)	= 162380017
write(13, "\0 U\0\006\0\0\0\0\004\0".., 85) = 85
read(10, 0x010ACACE, 2064)	(sleeping...)

下表描述了Oracle产生db file sequential readdb file scattered read事件的相关情况.

Oracle 事件 产生单块读取事件的操作(p3=1) 产生多块读取事件的操作(p3>1)
db file sequential read 索引扫描,
在区间内跳过块进行读取的全表扫描,
根据rowid进行的表访问等
临时段读取
db file scattered read 绝对不会 全表扫描,快速全索引扫描等.

引用:
HOLT, JEFF. 2000. “Predicting Multi-Block Read Call Sizes,” Hotsos Journal, (Jan. 2000): 8–9. http://www.hotsos.com.

Jeff Holt是位于得克萨斯州盐湖城的Hotsos LLC公司的一位系统性能优化专家.

No related posts.

05:56 Using “GET DIAGNOSTICS integer_var = ROW_COUNT;” capture rows effect by the last SQL (1094 Bytes) » DBA@SKY-MOBI
在PostgreSQL中,执行insert,update,delete时,数据库都会返回给客户端有多少条SQL被insert,update或delete了。 如: oup=> insert into dual values (‘ok’),(‘ok’); INSERT 0 2 Time: 1.208 ms oup=> update dual set p_1=1; UPDATE 3 Time: 1.250 ms oup=> delete from dual; DELETE 3 同样,执行select时,除了返回给客户端所需要的记录之外,在结果的最后,也会附上有多少条记录返回了。 oup=> select * from dual; p_1 —– ok ok ok ok (4 rows) 在PLPGSQL中,如何来获得这些值呢? PostgreSQL提供了一个方法,GET DIAGNOSTICS variable = item [ , ... ]; 通过这个语法可以获得最近一条sql的row_count也可以获得最近一条SQL插入的表的OID; 如: 首先创建一个测试表 create table tbl_test(id int,col2 text); insert into tbl_test values (1,’a'),(2,’b'),(3,’c'),(4,’d'),(5,’e'); 然后创建一个函数: create or replace function f_test (i_1 text) returns setof int as $BODY$ declare v_1 int; begin update tbl_test [...]
04:38 [discard result sets]deal with jdbc batch select query abnormal (1582 Bytes) » DBA@SKY-MOBI
批量插入或修改时,使用和不使用JDBC Batch,其性能将有数量级的差距。所以很多适合的场合都会用到BATCH。但是用BATCH来做SELECT的场合真的很少见(或者用来调用函数)。 今天一位同事告诉我他用batch调用函数操作时,会抛出异常。但是调用update,insert,delete都不会异常。 首先分析一下,调用函数使用的是select 语句,有返回,而update,insert,delete都是没有返回(这里说的返回是指消息流里边定义的结果集)的语句。因为把函数的返回值写成VOID也不行。还是报错 一般SELECT返回的是这样的结果: oup=> select * from tbl_test; id | col2 —-+—— 2 | 123 3 | 123 1 | ok (3 rows) insert返回的是这样的结果: oup=> insert into  dual values(‘ok’); INSERT 0 1 这让我想到了在PostgreSQL的函数里面如果写了select语句的话,调用这个函数的时候会报如下错误: ERROR:  query has no destination for result data CONTEXT:  PL/pgSQL function “f_test” line 2 at SQL statement 使用perform替换SELECT的话就OK了,PERFORM丢弃结果集。 那么在batch中调用select是否会有同样的情况发生呢。 我让这位同事做了一个测试来印证一下我的想法: 将select转换为insert操作. 首先见一个规则表。 create table dual (p_1 type1,p_2 type2,p_3 type3) ; 这些字段用于给函数传递参数。 然后创建一个规则: create rule r_insert_dual as on delete  to dual  do instead select [...]
03:30 Sequential和Scattered的含义 (1987 Bytes) » Hello DBA

之前,我对Sequential和Scattered的理解是Oracle读取IO的顺序不同,今天又讨论起这个问题,另外一种解释是两者在内存中的保存方式有差异,Sequential read在内存中是连续的,而Scattered read是离散的。以下解释来自于Oracle文档:

Scattered read:

This event signifies that the user process is reading buffers into the SGA buffer cache and is waiting for a physical I/O call to return. A db file scattered read issues a scatter-read to read the data into multiple discontinuous memory locations. A scattered read is usually a multiblock read. It can occur for a fast full scan (of an index) in addition to a full table scan.

Sequential read:

This event signifies that the user process is reading buffers into the SGA buffer cache and is waiting for a physical I/O call to return. This call differs from a scattered read, because a sequential read is reading data into contiguous memory space. A sequential read is usually a single-block read.

这也从另外一个方面解释了Sequential和Scattered的含义。

–EOF–

03:26 Project Voldemort Performance Tool (1135 Bytes) » myNoSQL
Project Voldemort Performance Tool:

Project Voldemort gets a performance tool from Roshan Sumbaly :

  • Run using bin/voldemort-performance-tool.sh
  • Has a warmup phase to insert records (—record-count)
  • Various record selection distributions
  • Can fix client throughput to measure latency under certain load
03:01 Amazon Elastic MapReduce Upgrades Hadoop, Hive and Pig (1690 Bytes) » myNoSQL
Amazon Elastic MapReduce Upgrades Hadoop, Hive and Pig:

Amazon upgraded the set of tools to work with NoSQL data (and not only):

Customers can now take advantage of improved Hadoop performance and the following new features:

  • Multiple inputs class for reading multiple types of data.
  • Multiple outputs class for writing multiple types of data.
  • ChainMapper and ChainReducer which allows users to perform M+RM* within one Hadoop job. Previously customers could only run one mapper and one reducer per job.
  • Skip bad records in the dataset that cause jobs to fail. This allows a job to complete even if some records in a dataset are erroneous.
  • JVM reuse across task boundaries to increase performance when processing small files.
  • Support for bzip2 compression.
01:29 Comparing Document Databases to Key-Value Stores (4224 Bytes) » myNoSQL

Oren Eini has an interesting ☞ post emphasizing the main differences between document databases (f.e. CouchDB, MongoDB, etc.) and key-value stores (f.e. Redis, Project Voldemort, Tokyo Cabinet):

The major benefit of using a document database comes from the fact that while it has all the benefits of a key/value store, you aren’t limited to just querying by key.

One of the main advantages of data transparency (as opposed to opaque data) is that the engine will be able to perform additional work without having to translate the data into an intermediary or a format that it understands. Querying by non primary key is such an example. The various document stores provide different implementation flavors depending on index creation time, index update strategy, etc. Oren goes on and covers the query behavior for CouchDB, Raven and MongoDB:

In the first case (nb indexes prepared ahead of time), you define an indexing function (in Raven’s case, a Linq query, in CouchDB case, a JavaScript function) and the server will run it to prepare the results, once the results are prepared, they can be served to the client with minimal computation. CouchDB and Raven differs in the method they use to update those indexes, Raven will update the index immediately on document change, and queries to indexes will never wait. […] With CouchDB, a view is updated at view query time, which may lead to a long wait time on the first time a view is accessed if there were a lot of changes in the meanwhile. […]

Note that in both CouchDB and Raven’s cases, indexes do not affect write speed, since in both cases this is done at a background task.

MongoDB, on the other hand, allows ad-hoc querying, and relies on indexes defined on the document values to help it achieve reasonable performance when the data size grows large enough. MongoDB’s indexes behave in much the same way RDBMS indexes behave, that is, they are updated as part or the insert process, so large number of indexes is going to affect write performance.

Another good resource explaining the differences between MongoDB and CouchDB queries is Rick Osbourne’s ☞ article.

After RavenDB made his appearance in the NoSQL space we’ll probably have to compare it to the existing CouchDB and MongoDB features.

This is not to say that some of this functionality cannot be achieved with pure key-value stores, but these seem to be focused mainly on single/multi key lookups and most probably you’ll have to build this additional layer by yourself.

00:00 Links for 2010-06-02 [del.icio.us] (244 Bytes) » Chanel [K]

2010-06-02 Wed

17:19 PBXT in tpcc-like benchmark » MySQL Performance Blog