123
 123

Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes --

2008-05-09 Fri

22:36 国家干部和烦人 (1375 Bytes) » 玉面飞龙的BLOG
昨天看完国家干部的大结局。最后一集市长夏中民住进了医院,来医院看望他的父老乡亲很多,场面描绘很感动。代表劳苦大众利益的官员最终战胜了贪污腐败谋取私利的官员。 这种”与人斗其乐无穷”的电视剧比同名小说好看多了。 根据同名小说改编的电视连续剧《国家干部》,深刻描写了党的执政者以人民利益为最高利益,立党为公、执政为民,坚持与地方势力、宗法势力和大小既得利益者进行斗争的感人故事。讴歌了党在执政能力建设中,众多可歌可泣的、用生命维护人民利益的…… 整部电视剧我就看到以王志文演的夏中民所带领的一小搓官员,一点点关乎民生的小事也要亲自过问亲自抓,表现出其它职位低的干部则阳奉阴违全不在心民生。也难怪中国政府的政令不通,完全是由于政府官员选拔的体制问题。 ===========分割线================= 出去旅游发现挺烦人的:要带手机,就要带手机充电器;要带相机,就要带相机电池和相机充电器。如果手机型号不同,多人旅游就要带多种充电器。 抽屉里又有充电的,又有用来连接电脑的数据线,错中复杂,纠缠在一起,不能够统一。这种”高科技”太让人烦了。
22:08 婚礼 (603 Bytes) » Uploads from dbanotes

dbanotes posted a photo:

婚礼

有点喜剧效果的一张。

老婆披的橙色纱巾,是公司的主 Logo 颜色。“阿里橙”。
后面的墙,是阿里巴巴“五年陈”员工的签名墙。临时做道具。

自己没有带相机,各对新人的摄影师不知道照片什么时候能给过来。

18:01 获取导致导入失败的数据(二) (686 Bytes) » yangtingkun
前不久从一个数据库执行导出操作时报错,通过直接路径方式跳过后,导入时候再次报错。推测是由于源数据库出现的异常导致表中数据超过表定义的精度。由于源数据库中错误记录已经被删除,因此只能想办法从导出的dmp文件中获取错误的记录。导出、导入过程的描述可以参考:EXP在9R2上导出时报错ORA-3113和ORA-24324:http://yangtingkun.itpub.net/post/468/460647EXP在9R2上导出时报错ORA-3113和ORA-24324(二):http://yangtingkun.itpub.net/post/468/460831上一篇通过触发器的方式没有获取到具体的错误数据,这篇继续修正方法继续找出异常...
11:40 Friday round-up (2337 Bytes) » Red Hat Magazine

Coming to the Red Hat Summit in a few weeks? Join us on Facebook.

The sci-fi and fantasy worlds must have been busy on the web this week, because that’s what’s been catching our eyes.

  • Have you ever wished you could play Lego Star Wars while it was being projected from a mobile R2D2 who could also dock your iPod? Now you can.
  • Comic book superhero Iron Man is on the side of… proprietary software? “According to series writer Matt Fraction, the battle between Tony Stark and new bad guy Ezekiel Stane is really just an allegory for the battle Bill Gates wages against smaller software providers every single day of his life.”
  • Would you rather read more about Tron Guy than Paris Hilton? ROFLCon, held a couple of weekends ago, describes itself as “a group dissection of internet culture.” The io9 blog poses it as “the future of celebrity.” Wired has a photo gallery from the event.
10:57 邀请您与YUPOO合作 (3818 Bytes) » Fenng's shared items in Google Reader

邀请您与YUPOO合作

    55VIP服务推出,和贴片广告试运行以来,我们收到了来自大家的各种声音,几天来我们并没有停滞不前,我们没有闭而不听。根据大家的意见和外链的实际需求,YUPOO团队设想了这个“与YUPOO合作”的服务,以谋求用户与YUPOO的双赢。

 

为什么要与Yupoo合作?

Yupoo广告互换”是Yupoo特别为博客或网站用户提供的资源合作服务,通过这一服务,实现用户与Yupoo的双赢。

谁能与Yupoo合作?

所有YUPOO注册用户都可以自愿加入,与Yupoo合作。

如何与Yupoo合作?

1合作页面获取Yupoo提供的广告代码添加到博客或网站首页。

2 提交审核通知Yupoo,通过审核即可。

如何实现双赢?

1 Yupoo用户在博客或网站首页为Yupoo提供展示区域。

2. 通过审核后,Yupoo为用户取消外链至该博客或网站上图片的贴片广告。审核工作将在一个工作日完成。

我们也会继续考虑其他的服务,与用户实现共赢。如果你愿意尝试,你可以加入Yupoo合作”.

 

此外,YUPOO已经努力请网络运营商提前清理服务器缓存,如果已经对你的帐号进行了取消水印的操作,那么缓存将在0~48小时内清除,您个人可以使用CTRL+F5清除本地缓存。

 

这里特别感谢用户PiscdongYupoo blog上留给我们的建议,很感谢大家直言不讳提出自己的意见,很感谢大家长期以来的支持与理解,Yupoo会在成为最好的照片分享社区继续努力。

 

09:37 Log Buffer #96: a Carnival of the Vanities for DBAs (5739 Bytes) » Pythian Group Blog » Log Buffer

This is the 96th edition of the weekly review of database blogs, Log Buffera>.

Let’s start this one in SQL Server Land, with a question from Dennis Goboshould SQL Server have the CREATE [OR REPLACE] PROCEDURE syntax? There are, he writes, advantages: “When scripting out a database you don’t have to generate if exists…..drop statements,” and disadvantages: “I can overwrite a proc without even knowing it.” Of course, the commenters have opinions of their own, and the piece becomes a straw poll for the desirability of that syntax as a feature.

Aaron Bertrand has one too: when was my database/table last accessed? Writes Aaron, “SQL Server does not track this information for you. SELECT triggers still do not exist. Third party tools are expensive and can incur unexpected overhead. And people continue to be reluctant or unable to constrain table access via stored procedures, which could otherwise perform simple logging.” He looks at 2008’s built-in auditing, and for those who can’t wait for that, illustrates a workaround for 2005.

Linchi Shea explores something else from 2008, Page Compression, focusing on how the number of processors affects the rebuilding a table with page compression.

Jamie Thomson, the SSIS Junkie writes that he has made a submission to Connect on the matter of absolute and relative paths in SSIS. “. . . I have always agreed that stipulating the use of absolute paths within SSIS was the right thing to do (and indeed I have championed it) however of late I have changed my mind. Support for relative paths would greatly simplify package deployment and package management . . . What do you think? Should SSIS support relative paths?” So far, it looks like a shoo-in.

Brian Knight also explains another little quirk, SSIS Case Sensitivity: “The case sensitivity can in some cases create behavior that is not expected and may give you bad results if you’re not careful.  . . . One such example is with the Lookup Transform, where comparisons against the cache are case sensitive. If you do not expect this, you may have a miss in a match that is actually a hit.”

In the MySQL ’sphere this week, there is plenty of talk about the openness or otherwise of MySQL. (more…)

Bookmark online using:These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Reddit
  • Spurl
  • Furl
  • blogmarks
06:59 五一肇庆游 (1144 Bytes) » OracleBlog.cn

五一去了肇庆玩,肇庆的鼎湖山的风景还是很不错的,上面很多佛,老子遇神拜神遇佛拜佛……

06:26 记一次BT的数据订正—-全角半角统一 (3750 Bytes) » Alibaba DBA Team

最近业务部门提出一个需求, 数据库里有些字段中保存的信息里全角和半角字符参杂, 影响市容, 要求做数据订正, 全部统一.
其中英文/数字/英文标点符号 都统一用半角字符, 而日文字符都统一用全角字符. 而且提出订正的字段中还包含一些clob字段.
 
那就函数+游标, 慢慢订正吧, 分析了一下, 要写四个函数:
1. varchar2数据类型的, 全角英文变半角.
2. clob数据类型的, 全角英文变半角.
3. varchar2数据类型的, 半角日文变全角.
4. clob数据类型的, 半角日文变全角.
 
开工:
第一个, 还好, 有现成函数: to_single_byte.
TO_SINGLE_BYTE returns char with all of its multibyte characters converted to their corresponding single-byte characters. char can be of datatype CHAR, VARCHAR2, NCHAR, or NVARCHAR2. The value returned is in the same datatype as char.
Any multibyte characters in char that have no single-byte equivalents appear in the output as multibyte characters. This function is useful only if your database character set contains both single-byte and multibyte characters.
可惜呢, 不支持clob
Note:
This function does not support CLOB data directly. However, CLOBs can be passed in as arguments through implicit data conversion. Please refer to “Datatype Comparison Rules” for more information.
我把clob输入进去, 发现被自动截取了4000个字节, 然后做了处理, 输出的只有这4000个字节, 后面的字节全丢掉了.
   
第二个, 思路, 循环把clob拆成1000个字符(为什么是1000个字符, 因为我的字符集是UTF8的, 每个字符最多可能占用4个字节, 4000个字节悲观估计就只能容纳1000个字符), 每1000个字符调用to_single_byte, 最后把它们拼起来.
这里有一点要注意的是, 把它门拼起来的时候如果这样写:
v_clob:= v_clob || v_varchr2;
那得到的结果的数据类型还是varchar2, 而不是clob.
正确的写法是:
v_clob:= v_clob || to_clob(v_varchar2);
连接操作符”||”虽然对clob重载了, 但是当这个操作符的两边一边是varchar2另一边是clob时, 它是把clob转化为varchar2.
  
第三个, 惨. 没有现成的函数可以用. 网上查了下又咨询了懂日文的同事, 日文字符中可以用半角来写的字符不超过100个(替自己擦了把汗).
那就用translate吧, 结果发现不行. 因为有几个半角的日文字符, 它对应的全角字符不是一个而是半个,也就是说有些半角字符必须和另一个半角字符连在一起才表示一个全角字符的意思, 而translate函数只能一个字符变一个字符, 不能两个字符变一个字符.
只有用replace了. 思路, 把这几十个字符对应关系保存到一个表里, 然后对每条记录都replace. 效率是低了点, 但是总算可以实现功能.
  
第四个, 惨了, replace函数不支持clob输入.还好网上有现成的replace_clob, 抄下来用吧.
  
总算把四个函数都写出来了.
  
现在写游标来做订正了, 噼里啪啦, 也都写好了.
用的时候发现问题了, 有一个表, 上百万笔记录, clob字段的, 也要做这个订正.
我写的第三/四个函数本来效率就低, 又碰上大数据量的clob, 结果测试库一跑就报错了, 跑到20多万笔的时候把temp表空间撑爆了.
怎么办呢, 20多万笔报错, 那就一次只执行5万笔. 然后多次(30次)调用这个脚本.
终于解决了,脚本跑了7个小时,正确出来了结果.

06:13 Manual Optimisation - 2 (1 Bytes) » Oracle Scratchpad
05:15 大表驱动hash导致额外物理读 (6519 Bytes) » Alibaba DBA Team

以前对hash join时应该用大记录集做驱动表还是用小记录集做驱动表一直存在疑虑, 直到最近做了一个实验, 才真正体会到其间的差别.实验过程如下.
用大记录集做驱动的执行计划如下:
—————————————————————————————
| Id  | Operation              |  Name                | Rows  | Bytes |TempSpc| Cost  |
—————————————————————————————
|   0 | SELECT STATEMENT       |                      |     1 |    12 |       |  1784 |
|   1 |  SORT AGGREGATE        |                      |     1 |    12 |       |       |
|*  2 |   HASH JOIN            |                      |  1820K|    20M|    31M|  1784 |
|   3 |    INDEX FAST FULL SCAN| SAMPLES_MEMINFO_IND  |  1820K|    10M|       |  1077 |
|   4 |    INDEX FAST FULL SCAN| MEMBER_PK            |   373K|  2188K|       |   106 |
—————————————————————————————
用小记录集做驱动的执行计划如下:
—————————————————————————————
| Id  | Operation              |  Name                | Rows  | Bytes |TempSpc| Cost  |
—————————————————————————————
|   0 | SELECT STATEMENT       |                      |     1 |    12 |       |  1784 |
|   1 |  SORT AGGREGATE        |                      |     1 |    12 |       |       |
|*  2 |   HASH JOIN            |                      |  1820K|    20M|  6568K|  1784 |
|   3 |    INDEX FAST FULL SCAN| MEMBER_PK            |   373K|  2188K|       |   106 |
|   4 |    INDEX FAST FULL SCAN| SAMPLES_MEMINFO_IND  |  1820K|    10M|       |  1077 |
—————————————————————————————
看起来cost都是一样的, 只是对TempSpc(temp表空间)的估算不太一样, 大记录集的是31M, 小记录集的是6568K.
  
  
Set autot trace, 各自执行看看.
大记录集作驱动:
12499  consistent gets
 2849  physical reads
小记录集做驱动:
12499  consistent gets
    0  physical reads
以上结果是反复执行多次之后的结果.
可以看到:
1. 逻辑读都是一样的.
2. 大记录集做驱动时多了很多物理读, 而且无论我怎么反复执行这个SQL, 这些物理读总是一个也不少.
那么这些物理读是哪儿冒出来的呢?
  
  
看看10104 trace吧, 差别一下子就出来了.
首先来看hash桶的一些统计,
大记录集作驱动:
### Hash table overall statistics ###
Total buckets: 4194304 Empty buckets: 4173603 Non-empty buckets: 20701
Total number of rows: 1820941
Maximum number of rows in a bucket: 26932
Average number of rows in non-empty buckets: 87.963915
小记录集作驱动:
### Hash table overall statistics ###
Total buckets: 524288 Empty buckets: 257306 Non-empty buckets: 266982
Total number of rows: 373588
Maximum number of rows in a bucket: 7
Average number of rows in non-empty buckets: 1.399300
可以看到, hash桶的总数/构建hash的总记录数/平均每个hash桶里的记录数 都是前者大. 而hash size是有限的, 那么内存中能承载的hash桶的数目也是有限的, 过多的hash桶会导致频繁的访问io, 读取hash桶. 而平均每个hash桶里的记录数太大, 又导致每个hash到的值在一个桶内匹配计算的次数增加(这些是CPU的负担, 在trace里甚至都看不到).
  
  
但是真正导致物理读的原因还不在于hash匹配, 而是创建hash桶. 我们再来看看10104 trace中对hash桶构建时的信息记录.
在大记录集做驱动时10104 trace文件中的构建hash桶这段, 多了很多这种东西:
kxhfRead(): Reading dba=156615 into slot=80
kxhfIsDone: waiting slot=80 lbs=0xb442fd40
我grep了一下, 执行一次这个SQL, Reading/waiting 总共有80次.而小记录集做驱动时一个这个操作都没有.
  
  
结论:
1. hash的时候一定要用小记录集做驱动.
2. 大/小记录集作驱动时, 读取数据文件的cost两者是一样的.但是前者的记录集在大到一定程度的时候, 在构建hash桶会产生很多物理读, 而且这些物理读根本无法消除, 每次执行都会产生.
   
结合工作:
1. 我们平常接触的OLTP系统可能用hash的时候比较少, 即使用到hash, 也不会是join太大的记录集, 除了搜索引擎的dump等特别SQL.
2. 但是对于OLAP系统来说, 几乎所有的SQL都是这样的大记录集的hash join, 这个时候就要尤其注意用小记录集来驱动大记录集.昨天帮数据仓库部门优化了一个SQL, 四个表做join, 其中三个表的有效记录集都是1千万左右, 另一表的有效记录集则只有几十笔. 原来的执行计划里用大表做驱动, 跑了几个小时, 最后把temp表空间撑爆了也没有执行出结果. 后来我就用这个小表做驱动, 结果两分钟就跑出了结果.

05:03 随笔 (1284 Bytes) » Think in 88
        今天叶辉结婚,我的第一个女朋友也是今天结婚~祝他们幸福~
   晚上公司发项目,大概2,3点结束~希望一切顺利~
   公司股票真奇怪,业绩越好,跌得越惨~forget it~
        以前没接触过AIX,觉得很神秘,现在发现如果只是简单的维护,确实很简单~很多东西不想懂的时候,懂的时候却发现不过如此~hacmp也是这样,存储也是这样~公司同工不同酬的现象确实挺重的~anyway~努力吧~快乐最重要~宽慰一下~
   周末本想去书店的,看来都泡汤了,周日下午还有个项目预发布,期待下一个周末吧~
   MM去宁波了,去读在职研究生前的培训,希望她可以顺利上线读研~   
   
04:18 Speak to Me (665 Bytes) » eagle's home

两年前我推荐给大家两个锻炼英语口语的播客网站:推荐两个不错的英文播客

现在EnlishPod已经不再更新了,但是以前的节目还是可以下载的,聊天性质的podcast,我蛮喜欢的

ESLPOD还是在持续不断的更新,语速慢了点,侧重于语言点的讲解

今天看到另一个学习英语的网站,办的也蛮不错的,推荐给大家: Speak2me

BTW, if you are learning Chinese, here is a wonderful chinese podcast website for you: ChinesePod

03:03 微软发布 Windows XP SP3 补丁包 (3308 Bytes) » Oracle Life

©作者:eygle 发布在 eygle.com

整天研究Oracle的Patch Set了,现在微软终于也出了一个Windows Xp的SP3补丁包。

今天在微软的Update站点上自动更新已经可以下载这个补丁包了
在线更新补丁包大小仅为60.1M,补丁号为:KB936929

不过微软的站点更新一贯非常缓慢,我1M的带宽,显示需要2个多小时来完成下载。

无奈还是去下载一个独立安装包,独立安装包大小为300M左右,使用迅雷可以在40分钟内完成下载,以下是几个常用版本下载地址:
简体中文版:
http://download.microsoft.com/download/e/d/9/ed9b7861-4e43-4c62-a007-4e93aa109602/WindowsXP-KB936929-SP3-x86-CHS.exe
英文版 | 繁体中文 | 香港中文

除此之外,微软还提供ISO镜像文件下载。

此前,很多宣传声称Xp Sp3会显著提升性能,不过在这个发布中,微软提示:Windows XP SP3 包含少量新的更新程序,但不会显著改变Windows Xp体验

看来对这个补丁我们还是不能有太多的期待。

-The End-

相关文章|Related Articles

评论数量(0)|Add Comments

本文网址:

01:19 MSTR 的deshboard特色 (3658 Bytes) » 淘宝数据仓库团队

MSTR 在8.1 版本后就有了DESHBOARD 仪表盘功能,使用后总结了一些特点,供大家参考

MSTR仪表盘 的特色

一:MSTR的DASHBOARD的原理,是将所有已经建立好的报表作为数据集,导入到仪表盘编辑页面。并且可选择需要展现的维和度量作为内容,来展现。也就是一张报表可以展现成多种数据形式,这点相对于BIEE显得灵活,BIEE的是将已经建立号的报表直接插入DASHBORADS中,内容和报表中完全一样。

二:MSTR的仪表盘相对来说操作很方便。他可以插入的东西主要有:
面板集,控制器,文本框,矩形框,仪表盘的WIDGET,图片,表格,直线这7项。
感觉面板集可以作为展现的主要框架,也可以当作一个“组合”来用,是在仪表盘设置中使用比较多的一个工具。

三:MSTR的控制器是操作相当方便的控制器。
之前使用过BIEE的列选择器和筛选、分页等功能,其实在MSTR中这样的功能实现起来相对来说比BIEE简便。
控制器可以选择控制三种:面板集,度量和实体。而控制器的展现形式也很多样化:可以水平展现,也可以垂直展现。可以选择按照按钮条、下拉条、复选框、列表框、单选框、单选按钮、滚动条、还有连接条,各种形式展现。很有特色的是滑动条,在控制时间的时候,可以随意拖动到开始时间和结束时间来控制图表。

四 HEATMAP 是MSTR的全新的一种图形展现形式,通过面积和颜色来显示各个纬度的值得大小。

五 MSTR DESHBOARD 的一大特色就是可以展现FLASH模式,可以将页面上的各种展现效果做成动画形式,很吸引眼球。不过FLASH模式还不是很成熟,在有些图表不支持FLASH的时候要不是直接报错,要不就是整个画面出不来。而且饼图会比交互模式下的大,变的很模糊。使用FLASH模式在打开的时候速度也会变得很慢。
所以FLASH是让人又爱又恨的一个功能。

MSTR的Deshboard  的一些问题:
一: 不方便钻取数据,图形形式下 不支持钻取 这点不如BIEE 。

二:图形展现方面,MSTR没有BIEE展现的漂亮。另外,在设计中,对于哪个线条按什么展现这样的设置,在BIEE中直接用选择就可以实现。但是MSTR中要不断的调整行列和各种设置,才能达到预期的效果,对于整体的画面效果,也都需要精心调制才能达到很美观的效果。UI 的工作量比较大。

三 :DESHBOARD开发中,要察看展现的效果需要不停的切换于各个模式之间。这个给开发效率造成影响。

四:对于文本,感觉没有BIEE有特色,因为BIEE的文本有叙述和静态,也可以自行编写HTML。即使不自己写HTML,也有可以来回滚动等功能,所以文本展现上还是BIEE有特色。

MSTR仪表盘 展现例子: 

desh.JPG

00:45 RAC的高可用也不是那么好用 (24505 Bytes) » OracleBlog.cn

oracle一直在“鼓吹”着他的rac是如何如何的高可用,如何如何的可以实现针对应用透明的failover,但是,在实际的使用过程中,要完全实现这样的透明,条件是何等的苛刻。

先从一次故障说起吧。某天中午正在吃饭的时候,突然接到电话做应用程序连接数据库挂死了,并且也在客户端测试连接数据库也是挂死,长时间没有响应。

该省的数据库是2节点的rac数据库,登录数据库后发现rac2已经宕机,只留着侦听还没宕。观察到rac2的alertlog为:

Tue Apr 29 10:12:10 2008
Thread 2 advanced to log sequence 11999
  Current log# 8 seq# 11999 mem# 0: /dev/vg_rac1/rG1_redo_512m_204
  Current log# 8 seq# 11999 mem# 1: /dev/vg_rac2/rG2_redo_512m_214
Tue Apr 29 10:12:10 2008
  Current log# 8 seq# 11999 mem# 2: /dev/vg_rac3/rG3_redo_512m_224
Tue Apr 29 10:12:10 2008
ARC1: Evaluating archive   log 6 thread 2 sequence 11998
Tue Apr 29 10:12:10 2008
ARC1: Beginning to archive log 6 thread 2 sequence 11998
Creating archive destination LOG_ARCHIVE_DEST_1: '/archlog2/fjmisc2_2_11998.arc'
Tue Apr 29 10:12:32 2008
ARC1: Completed archiving  log 6 thread 2 sequence 11998
Tue Apr 29 10:42:18 2008
alter tablespace DATA_PORTAL add datafile '/dev/vg_rac1/rG1_data_2g_185' size 2047m
Tue Apr 29 10:42:50 2008
Completed: alter tablespace DATA_PORTAL add datafile '/dev/vg
Tue Apr 29 12:16:59 2008
Communications reconfiguration: instance 0
Waiting for clusterware split-brain resolution
Tue Apr 29 12:17:32 2008
Trace dumping is performing id=[80429121659]
Tue Apr 29 12:18:09 2008
Trace dumping is performing id=[80429121745]
Tue Apr 29 12:27:04 2008
Errors in file /oracle/app/oracle/admin/fjmisc/bdump/fjmisc2_lmon_6757.trc:
ORA-29740: evicted by member 0, group incarnation 5
Tue Apr 29 12:27:04 2008
LMON: terminating instance due to error 29740
Tue Apr 29 12:27:04 2008
Errors in file /oracle/app/oracle/admin/fjmisc/bdump/fjmisc2_pmon_6753.trc:
ORA-29740: evicted by member , group incarnation
Tue Apr 29 12:27:04 2008
Errors in file /oracle/app/oracle/admin/fjmisc/bdump/fjmisc2_dbw0_6765.trc:
ORA-29740: evicted by member , group incarnation
Tue Apr 29 12:27:05 2008
Errors in file /oracle/app/oracle/admin/fjmisc/bdump/fjmisc2_lck0_6793.trc:
ORA-29740: evicted by member , group incarnation
Instance terminated by LMON, pid = 6757

根据报错信息,看到是ora-29740的错误,这个一般是rac之间的通信问题,导致一个节点被踢出cluster group。具体的错误分析,还是要看trace文件:

*** 2008-04-29 12:16:59.317
kjxgrgetresults: Detect reconfig from 0, seq 4, reason 3
kjxgrrcfgchk: Initiating reconfig, reason 3
*** 2008-04-29 12:17:04.332
kjxgmrcfg: Reconfiguration started, reason 3
kjxgmcs: Setting state to 4 0.
*** 2008-04-29 12:17:04.347
     Name Service frozen
kjxgmcs: Setting state to 4 1.
*** 2008-04-29 12:17:04.392
kjxgrrecp2: Waiting for split-brain resolution, upd 0, seq 5
*** 2008-04-29 12:27:04.479
Voting results, upd 0, seq 5, bitmap: 0
*** 2008-04-29 12:27:04.479
kjxgrdtrt: Evicted by 0, seq (5, 5)
IMR state information
  Member 1, thread 2, state 4, flags 0x00a1
  RR seq 5, propstate 3, pending propstate 0
  rcfg rsn 3, rcfg time 2235139911, mem ct 2
  master 1, master rcfg time 2235139911
 
  Member information:
    Member 0, incarn 4, version 122818100
      thrd 1, prev thrd 65535, status 0x0047, err 0x0002
      valct 0
    Member 1, incarn 4, version 134210
      thrd 2, prev thrd 65535, status 0x0107, err 0x0000
      valct 2
 
Group name: FJMISC
Member id: 1
Cached SKGXN event: 0
Group State:
  State: 4 1
  Commited Map: 0 1
  New Map: 0 1
  SKGXN Map: 0 1
  Master node: 0
  Memcnt 2  Rcvcnt 0
  Substate Proposal: false
Inc Proposal:
  incarn 0  memcnt 0  master 0
  proposal false  matched false
  map:
Master Inc State:
  incarn 0  memcnt 0  agrees 0  flag 0x1
  wmap:
  nmap:
  ubmap:
Submitting asynchronized dump request [1]
error 29740 detected in background process
ORA-29740: evicted by member 0, group incarnation 5
ksuitm: waiting for [5] seconds before killing DIAG
oracle@FJ_DB02:/oracle/app/oracle/admin/fjmisc/bdump >

在这里我们看到是由于ora-29740的reason 3引起宕机的,查询metalink(Note:219361.1)上得知:

For troubleshooting ORA-29740 errors, the 'reason' will be very important.
In the above example, the first section indicates the reason for the
initiated reconfiguration. The reasons are as follows:
 
Reason 0 = No reconfiguration
Reason 1 = The Node Monitor generated the reconfiguration.
Reason 2 = An instance death was detected.
Reason 3 = Communications Failure
Reason 4 = Reconfiguration after suspend
 
For ORA-29740 errors, you will most likely see reasons 1, 2, or 3.
 
-----------------------------------------------------------------------------
 
Reason 1: The Node Monitor generated the reconfiguration. This can happen if:
 
a) An instance joins the cluster
b) An instance leaves the cluster
c) A node is halted
 
It should be easy to determine the cause of the error by reviewing the alert
logs and LMON trace files from all instances. If an instance joins or leaves
the cluster or a node is halted then the ORA-29740 error is not a problem.
 
ORA-29740 evictions with reason 1 are usually expected when the cluster
membership changes. Very rarely are these types of evictions a real problem.
 
If you feel that this eviction was not correct, do a search in Metalink or
the bug database for:
 
ORA-29740 'reason 1'
 
Important files to review are:
 
a) Each instance's alert log
b) Each instance's LMON trace file
c) Statspack reports from all nodes leading up to the eviction
d) Each node's syslog or messages file
e) iostat output before, after, and during evictions
f) vmstat output before, after, and during evictions
g) netstat output before, after, and during evictions
 
There is a tool called "OS Watcher" that is being developed that helps gather
this information. For more information on "OS Watcher" see Note 301137.1
"OS Watcher User Guide".
 
-----------------------------------------------------------------------------
 
Reason 2: An instance death was detected. This can happen if:
 
a) An instance fails to issue a heartbeat to the control file.
 
When the heartbeat is missing, LMON will issue a network ping to the instance
not issuing the heartbeat. As long as the instance responds to the ping,
LMON will consider the instance alive. If, however, the heartbeat is not
issued for the length of time of the control file enqueue timeout, the
instance is considered to be problematic and will be evicted.
 
Common causes for an ORA-29740 eviction (Reason 2):
 
a) NTP (Time changes on cluster) - usually on Linux, Tru64, or IBM AIX
b) Network Problems (SAN).
c) Resource Starvation (CPU, I/O, etc..)
d) An Oracle bug.
 
 
Common bugs for reason 2 evictions:
Bug 2820871 - Abrupt time adjustments can crash instance with ORA-29740
(Reason 2) (Linux Only)
Fixed-Releases: 9204+ A000
Bug 3917158 - ORA-29740 and a false instance eviction can occur (Reason 2)
(IBM AIX Only)
Fixed Releases: 9206+
 
If you feel that this eviction was not correct, do a search in Metalink or the
bug database for:
 
ORA-29740 'reason 2'
 
Important files to review are:
 
a) Each instance's alert log
b) Each instance's LMON trace file
c) Statspack reports from all nodes leading up to the eviction
d) The CKPT process trace file of the evicted instance
e) Other bdump or udump files...
f) Each node's syslog or messages file
g) iostat output before, after, and during evictions
h) vmstat output before, after, and during evictions
i) netstat output before, after, and during evictions
 
There is a tool called "OS Watcher" that is being developed that helps gather
this information. For more information on "OS Watcher" see Note 301137.1
"OS Watcher User Guide".
 
-----------------------------------------------------------------------------
 
Reason 3: Communications Failure. This can happen if:
 
a) The LMON processes lose communication between one another.
b) One instance loses communications with the LMS, LMD, process of
another instance.
c) The LCK processes lose communication between one another.
d) A process like LMON, LMD, LMS, or LCK is blocked, spinning, or stuck
and is not responding to remote requests.
 
In this case the ORA-29740 error is recorded when there are communication
issues between the instances. It is an indication that an instance has been
evicted from the configuration as a result of IPC send timeout. A
communications failure between processes across instances will also generate a
ORA-29740 with reason 3. When this occurs, the trace file of the process
experiencing the error will print a message:
 
Reporting Communication error with instance:
 
If communication is lost at the cluster layer (for example, network cables
are pulled), the cluster software may also perform node evictions in the
event of a cluster split-brain. Oracle will detect a possible split-brain
and wait for cluster software to resolve the split-brain. If cluster
software does not resolve the split-brain within a specified interval,
Oracle proceeds with evictions.
 
Oracle Support has seen cases where resource starvation (CPU, I/O, etc...) can
cause an instance to be evicted with this reason code. The LMON or LMD process
could be blocked waiting for resources and not respond to polling by the remote
instance(s). This could cause that instance to be evicted. If you have
a statspack report available from the time just prior to the eviction on the
evicted instance, check for poor I/O times and high CPU utilization. Poor I/O
times would be an average read time of > 20ms.
 
Common causes for an ORA-29740 eviction (Reason 3):
 
a) Network Problems.
b) Resource Starvation (CPU, I/O, etc..)
c) Severe Contention in Database.
d) An Oracle bug.

OK,那么我们基本将问题定位在网络方面的原因,继续查看系统的syslog:

Apr 29 11:59:10 FJ_DB02 su: + 0 root-oracle
Apr 29 12:11:02 FJ_DB02 cmcld: lan1 failed
Apr 29 12:11:02 FJ_DB02 cmcld: Subnet 10.203.17.0 switching from lan1 to lan2
Apr 29 12:11:02 FJ_DB02 cmcld: lan1 switching to lan2
Apr 29 12:11:02 FJ_DB02 cmcld: Subnet 10.203.17.0 switched from lan1 to lan2
Apr 29 12:11:02 FJ_DB02 cmcld: lan1 switched to lan2
Apr 29 12:11:22 FJ_DB02 cmcld: lan1 recovered
Apr 29 12:11:22 FJ_DB02 cmcld: Subnet 10.203.17.0 switching from lan2 to lan1
Apr 29 12:11:22 FJ_DB02 cmcld: lan2 switching to lan1
Apr 29 12:11:22 FJ_DB02 cmcld: Subnet 10.203.17.0 switched from lan2 to lan1
Apr 29 12:11:22 FJ_DB02 cmcld: lan2 switched to lan1
Apr 29 12:14:15 FJ_DB02 vmunix: NFS server FJ_DB01 not responding still trying
Apr 29 12:14:59 FJ_DB02 cmcld: HB connection to 10.203.17.1 not responding, closing
Apr 29 12:14:59 FJ_DB02 cmcld: GS connection to 10.203.17.1 not responding, closing
Apr 29 12:18:59 FJ_DB02 cmcld: GS connection to 10.203.17.1 is responding
Apr 29 12:19:26 FJ_DB02 vmunix: NFS server FJ_DB01 ok
Apr 29 12:20:59 FJ_DB02 cmcld: HB connection to 10.203.17.1 is responding
Apr 29 12:22:47 FJ_DB02 xntpd[3717]: synchronized to 218.200.246.196, stratum=1

发现主机确实出现了网络问题,在rac2上连接不到rac1(10.203.17.1),此时应用挂死,怀疑是应用发起的连接去了rac2,而rac2的lsnr还是打开的但是instance已经宕机,由于配置了local lsnr和remote lsnr,在客户端也配置了load balance和failover,当应用的连接随机的连接到实例时候,由于配置了load balance和remote lsnr,需要检查对方节点的服务器负载,由于网络不正常,无法确定对方节点情况,因此挂死。关闭rac2上的lsnr后,应用恢复正常!!

从主机的syslog中进一步观察,主机的网络在12:18已经恢复,网络开始responding。ping rac1也是正常,检查cluster软件没有问题后,开始重启rac2的instance。

问题是,启动的时候再次出现了问题,本来rac1上的连接都是好的,且应用也是顺利的连接到rac1,可是在重启的时候,所有的应用挂死。由于考虑到此时两个节点是在做内存同步,对split brain的东西做回复。在这段时间内的rac2和rac1都会很慢,就让应用先等了一会……谁知过了15分钟了,rac2还是没起来,而且rac2 instance再次被lmon终止了。

汗!看来网络还是有问题!!白天不敢动了,怕再次影响业务,到晚上继续捣鼓……

晚上拉上了网络工程师,发现还是有“FJ_DB02 cmcld: GS connection to 10.203.17.1 not responding, closing”的报错,仔细的把网络检查了一遍:

$ sqlplus "/ as sysdba"
SQL> oradebug setmypid
SQL> oradebug ipc
SQL> exit;

发现在trace文件中:

SSKGXPT 0x1aa5e4 flags SSKGXPT_READPENDING     info for network 0
    socket no 9     IP 10.203.17.2     UDP 57854
<------此处
    
sflags SSKGXPT_UP
    
info for network 1
    
socket no 0     IP 0.0.0.0     UDP 0
    
sflags SSKGXPT_DOWN
context timestamp 0
    
no ports
    
sconno     accono   ertt  state   seq#   sent  async   sync rtrans   acks
      
ach     accono     sconno      admno  state lstack nxtseq    rcv rtrans   acks credts

发现此处配的10.203.17.2,即应用的连接的网段,没有用private网段,进一步检查:

SQL> show parameter cluster             
 
NAME                                 TYPE        VALUE
----------------------------------
-- ----------- ------------------------------
cluster_database                     boolean     TRUE
cluster_database_instances           integer     2
cluster_interconnects                string

发现也是没有配private的网卡。

由于这个省的rac的网络配置是这样的:
lan0 : 192.168.0.0
lan1 : 10.103.17.0
lan2 : 0.0.0.0
其中,lan2是lan1的备用。网卡lan1之间的连接是经过两台互为冗余的交换机,而lan0则是通过交叉线直接连接。应用程序通过10.103.17.0网段的IP连接数据库,因此,而lan0对于数据库来讲目前是没有用上的。lan1上的问题导致了此次rac2的宕机。

通过网络工程师的进一步ping包,发现lan1的接口上有丢包的现象,由于是少量的不稳定的丢包,并不是网络全断,因此也没有切换到lan2上。因此造成因为丢包而split brain,从而rac2实例宕机。而进一步的咨询现场了解到,在当天中午的12点多确实有人拔动过lan1的网线。可能是拔动后没有插紧,造成了这次宕机。

解决方法:重新插拔lan1的网线,更换网线,更换网口后,节点间的通信回复正常。顺利的启动rac2.

后续改进:将直连的interconnect改为通过交换机连接(需千兆网络),配置cluster_interconnects=ip_private.

总结:rac其实要求还是挺严格的。首先,在硬件上,要求千兆private网络用于interconnect,如果有多块网卡且既能做为interconnect通信又能用于应用连接,需要配置cluster_interconnects进行指定。load balance其实在2处地方可以做load balance:一处是在服务器端,根据配置的remote lsnr和local lsnr来做负载均衡,即在客户端指定连接rac1也有可能在服务器端做再次调整连去rac2;另一处是在客户端,通过客户端的tnsnames来配置load balance,客户端根据tnsnames中的address list随机的连接服务器端。至于最终连接到哪个节点,要根据服务器端的lsnr再做调整(感觉这不是很好控制,如果同时配置了客户端的load balance和服务器端的load balance,基本不可预知在客户端发起的连接会去哪个节点。更何况,oracle美其名曰说是通过cpu的负载来实现load balance,至于其中如何的运算cpu的比例,还是一个黑盒子。)。至于failover这个功能,感觉只是和已经连接上rac的实例select有关,update、insert还是基本不能实现TAF,至于新发起的连接,和failover没啥关系了……

2008-05-08 Thu

17:59 广州招聘初级DBA,欢迎大家投递简历 » OracleDBA Blog---请享受无法回避的痛苦!
17:31 BEFORE触发器修正数据错误 » yangtingkun
08:15 Convert Images to Base64 TextMate bundle » Fenng's shared items in Google Reader
06:44 Oracle Database 10.2.0.4 Released » Ricky's Test Blog
06:00 iPhone 251 » Uploads from dbanotes
06:00 D4: 门口的广告牌 » Uploads from dbanotes
06:00 D4: 有些坐不下 » Uploads from dbanotes
06:00 iPhone 267 » Uploads from dbanotes
06:00 D4: 边吃边听 » Uploads from dbanotes
06:00 iPhone 270 » Uploads from dbanotes
06:00 iPhone 276 » Uploads from dbanotes
06:00 D4: 评估公式 » Uploads from dbanotes
06:00 iPhone 272 » Uploads from dbanotes
06:00 明天结婚 » DBA notes
06:00 iPhone 252 » Uploads from dbanotes
06:00 D4: 站着也有不少同学 » Uploads from dbanotes
06:00 iPhone 253 » Uploads from dbanotes
06:00 iPhone 257 » Uploads from dbanotes
06:00 iPhone 236 » Uploads from dbanotes
05:59 iPhone 260 » Uploads from dbanotes
05:59 iPhone 242 » Uploads from dbanotes
05:59 iPhone 266 » Uploads from dbanotes
05:59 iPhone 237 » Uploads from dbanotes
05:59 iPhone 254 » Uploads from dbanotes
04:30 了解 EFS » developerWorks : AIX 专区的文章,教程
03:09 关于用户体验 » Fenng's shared items in Google Reader
02:42 武夷山旅游记 » 玉面飞龙的BLOG
01:38 Facebook 将改进个人资料页面的细节 » Fenng's shared items in Google Reader
00:20 JavaOne: Day 3 » Red Hat Magazine
00:14 关于Microsoft SQL Server的Scalability之讨论(1) » Fenng's shared items in Google Reader
00:10 AbiWord team interview » Red Hat Magazine

2008-05-07 Wed

23:10 让手中的低档佳能照相机脱胎换骨 » Fenng's shared items in Google Reader
22:15 很白很强大 » Fenng's shared items in Google Reader
19:41 10G里面启用资源计划管理 » Alibaba DBA Team
19:30 探索 AIX 6:WPAR 基本管理 » developerWorks : AIX 专区的文章,教程
18:02 获取导致导入失败的数据 » yangtingkun
13:32 Multi Table Loop » DBASupport
13:26 JavaOne: Day 2 » Red Hat Magazine
09:14 生活锁事::无线网络 » AnySQL.net
07:34 When Rebuild INDEX? » Alibaba DBA Team
07:15 PR=4 » Ricky's Test Blog
07:08 小猪猪新发型 » Ricky's Test Blog
01:49 您会选择什么编码? » Fenng's shared items in Google Reader
00:02 Protected: 绝对爆料 » OracleDBA Blog---请享受无法回避的痛苦!

2008-05-06 Tue