123
 123

Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes --

2012-04-06 Fri

19:38 MongoDB Architecture Overview (1952 Bytes) » myNoSQL
MongoDB Architecture Overview:

Ricky Ho takes a look at MongoDB and summarizes his thoughts about:

  • Major differences between MongoDB and RDBMS
  • MongoDB query processing, data update, and transaction
  • Storage model
  • Replication model
  • Sharding model
  • MongoDB Map-Reduce execution

concluding:

One thing I am very impressed by MongoDb is that it is extremely easy to use and the underlying architecture is also very easy to understand.

He’s definitely not the only one.

MongoDB Storage Model

Original title and link: MongoDB Architecture Overview (NoSQL database©myNoSQL)

19:31 Cardinality Estimation Algorithms: Memory Efficient Solutions for Counting 1 Billion Distinct Objects (2001 Bytes) » myNoSQL
Cardinality Estimation Algorithms: Memory Efficient Solutions for Counting 1 Billion Distinct Objects:

Matt Abrams from Clearspring:

Cardinality estimation algorithms trade space for accuracy. To illustrate this point we counted the number of distinct words in all of Shakespeare’s works using three different counting techniques. Note that our input dataset has extra data in it so the cardinality is higher than the standard reference answer to this question. The three techniques we used were Java HashSet, Linear Probabilistic Counter, and a Hyper LogLog Counter. Here are the results:

Cardinality estimation algorithms

Original title and link: Cardinality Estimation Algorithms: Memory Efficient Solutions for Counting 1 Billion Distinct Objects (NoSQL database©myNoSQL)

19:22 A Different Big Data Definition and What Data Scientists Are and Do (1588 Bytes) » myNoSQL
A Different Big Data Definition and What Data Scientists Are and Do:

Dr Rami Mukhtar cited by Divina Paredes reporting for PCAdvisor from Big Data Symposium in Sydney:

Big Data is the opportunity to really collect data sources, both big and small, in their source form, or their raw form, in one location or one place, unencumbered by the boundaries of a business or the boundaries of information silos across the business

Original title and link: A Different Big Data Definition and What Data Scientists Are and Do (NoSQL database©myNoSQL)

19:15 MongoDB Features Roadmap: Full Text Search, Data Compression, Schema Validation (1855 Bytes) » myNoSQL
MongoDB Features Roadmap: Full Text Search, Data Compression, Schema Validation:

Kenneth Falck shares on his blog what he learned at the recent MongoDB event in Stockholm, covering:

  • MongoDB indexing
  • MongoDB replica sets
  • MongoDB sharing and performance

The one bit I wanted to emphasize before reading his post:

10gen has a shortlist of features they would like to develop soon. Full text search is at the top. Other things included are at least data compression and possibly schema validation as a related feature.

From the developer’s friendliness perspective MongoDB is the most attractive NoSQL database. And these features will make it even more so.

Original title and link: MongoDB Features Roadmap: Full Text Search, Data Compression, Schema Validation (NoSQL database©myNoSQL)

19:02 What Are the Pros and Cons of Running Cloudera's Distribution for Hadoop vs Amazon Elastic MapReduce Service? (3765 Bytes) » myNoSQL

Old Quora question, but still very relevant. Top response from Jeff Hammerbacher:

Elastic MapReduce Pros:

  • Dynamic MapReduce cluster sizing.
  • Ease of use for simple jobs via their proprietary web console.
  • Great documentation.
  • Integrates nicely with other Amazon Web Services.

Cloudera Distribution for Hadoop:

  • CDH is open source; you have access to the source code and can inspect it for debugging purposes and make modifications as required.
  • CDH can be run on a number of public or private clouds using an open source framework, Whirr, so you’re not tied to a single cloud provider
  • With CDH, you can move your cluster to dedicated hardware with little disruption when the economics make sense. Most non-trivial applications will benefit from this move.
  • CDH packages a number of open source projects that are not included with EMR: Sqoop, Flume, HBase, Oozie, ZooKeeper, Avro, and Hue. You have access to the complete platform composed of data collection, storage, and processing tools.
  • CDH packages a number of critical bug fixes and features and the most recent stable releases, so you’re usually using a more stable and feature-rich product.
  • You can purchase support and management tools for CDH via Cloudera Enterprise.
  • CDH uses the open source Oozie framework for workflow management. EMR implemented a proprietary “job flow” system before major Hadoop users standardized on Oozie for workload management.
  • CDH uses the open source Hue framework for its user interface. If you require new features from your web interface, you can easily implement them using the Hue SDK.
  • CDH includes a number of integrations with other software components of the data management stack, including Talend, Informatica, Netezza, Teradata, Greenplum, Microstrategy, and others. […]
  • CDH has been designed and deployed in common Linux environments and you can use standard tools to debug your programs. […]

Make sure you also read Hadoop in the Cloud: Pros and Cons which addresses (almost) the same question.

A Twitter-style answer to this question would be: “Control and customization vs Automated and Managed Service”. 80 characters left to add your own perspective.

Original title and link: What Are the Pros and Cons of Running Cloudera’s Distribution for Hadoop vs Amazon Elastic MapReduce Service? (NoSQL database©myNoSQL)

18:22 DynamoDB Libraries, Mappers, and Mock Implementations (2029 Bytes) » myNoSQL
DynamoDB Libraries, Mappers, and Mock Implementations:

A list of DynamoDB libraries covering quite a few popular languages and frameworks:

DynamoDB Libraries, Mappers, and Mock Implementations

A couple of things I’ve noticed (and that could be helpful to other NoSQL database companies):

  1. Amazon provides official libraries for a couple of major programming languages (Java, .NET, PHP, Ruby)
  2. Amazon is not shy to promote libraries that are not official, but established themselves as good libraries (e.g. Python’s Boto)
  3. The list doesn’t seem to include anything for C and Objective C (Objective C is the language of iOS and Mac apps)

Original title and link: DynamoDB Libraries, Mappers, and Mock Implementations (NoSQL database©myNoSQL)

14:00 Log Buffer #266, A Carnival of the Vanities for DBAs (437 Bytes) » The Pythian Blog
The purpose of technology is the make the life easier and more quality oriented. It is this virtue of technology which makes it evergreen and sustainable. The unique feature of technology is its innovative nature. Technology blogging is a way for the technologists to rant about technology and throw light at known and unknown corners [...]
10:25 ORA-00600 ktspReadExtents:range 的错误猜测 (7190 Bytes) » Oracle Life

作者:eygle 发布在 eygle.com

客户的系统出现了如下错误:
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ktspReadExtents:range], [1], [1967145737], [0], [0], [], [], []
Current SQL statement for this session:

这个错误号,在MOS上没有解释,在Google上也找不到,但是从错误号上,我们可以大致猜出这个错误的原因:
KTSP - Kernel Transaction SPace Transaction
ReadExtents - Read segments Extents info
Range - Extents Range

这应该是在空间事务,进行区间(Extent)扫描时遇到的问题,跟进跟踪文件的进一步内容可以继续分析:
----- Call Stack Trace -----
calling              call     entry                argument values in hex     
location             type     point                (? means dubious value)    
-------------------- -------- -------------------- ----------------------------
ksedmp+0148          bl       ksedst               1029746FC ?
ksfdmp+0018          bl       01FD4014            
kgerinv+00e8         bl       _ptrgl              
kgeasnmierr+004c     bl       kgerinv              000000000 ? 000000000 ?
                                                   000000000 ?
                                                   41EFFFFFFFC00000 ?
                                                   40DFFFC000000000 ?
ktspReadExtentsFrom  bl       kgeasnmierr          110006308 ? 1103994E8 ?
L1+027c                                            102A91EC0 ? 400000004 ?
                                                   000000004 ? 000000001 ?
                                                   000000004 ? 075404309 ?
ktspGenExtentMap+02  bl       ktspReadExtentsFrom  11037C5F0 ? 11037C698 ?
9c                            L1                   760045892F1F5EB0 ?
                                                   300000020 ? B1100989E8 ?
kteinmap+0150        bl       ktspGenExtentMap     11037C470 ? 11037BFA0 ?

这里可以看到扫描错误出现在Level 1级别的位图扫描,地址是: 075404309 = 1967145737

所以这个ORA-600 错误给出的两个参数,第一个是:位图级别; 第二个参数是:地址

这个错误在用户的环境中,再次执行SQL并未出现,予以忽略。
分析过程供参考。

相关文章|Related Articles

评论数量(0)|Add Comments

本文网址:

04:08 The Three Pillars of Data-Based Computing: SQL, Hadoop And (2087 Bytes) » myNoSQL
The Three Pillars of Data-Based Computing: SQL, Hadoop And:

IBM’s Arvind Krishna in an interview for The Register:

Krishna said he sees the potential for three pillars of data-based computing: SQL – to give a language and syntax for programming; Hadoop – to provide a MapReduce semantic; and a third pillar which is yet to be decided upon. That could be a MongoDB or HBase, but the market will pick a winner. “There’s a whole set: one will survive,” Krishna said.

I’m pretty sure that last part (i.e. “that could be MongoDB or HBase”) is a mis-quote as the rest of what Krishna is saying makes a lot of sense:

“Wherever open source is mature I will leverage it; I won’t compete with it. To believe one can be monolithic, proprietary and closed and … succeed is a foolish proposition. One has to embrace open source and work with an ecosystem. Clients are looking to you to add value.”

Original title and link: The Three Pillars of Data-Based Computing: SQL, Hadoop And (NoSQL database©myNoSQL)

02:52 Challenges With MongoDB (1646 Bytes) » myNoSQL

Stone Gao: “This talk is not Yet Another Talk about it’s Awersomeness but challenges with MongoDB”. Plus workarounds.

Challenges with MongoDB

Original title and link: Challenges With MongoDB (NoSQL database©myNoSQL)

02:17 Why I Love NodeJS and Redis | Erronis (1436 Bytes) » myNoSQL
Why I Love NodeJS and Redis | Erronis:

Erwin van der Koogh:

All of this allows me, a fairly decent Java developer with hardly any Javascript skills, to solve real world problems in record time. And that’s why I love Node and Redis.

It’s perfect if it works for you. But please do not automatically generalize it.

Original title and link: Why I Love NodeJS and Redis | Erronis (NoSQL database©myNoSQL)

02:11 Cloud Computing Lets Us Rethink How We Use Data (2139 Bytes) » myNoSQL
Cloud Computing Lets Us Rethink How We Use Data:

But not everything we do in a database needs guaranteed transactional consistency.

Imagine you are charged with designing a system to collect data on temperature, air flow and electricity use in a building every few minutes from hundreds of locations. The system will be used to make the building more energy efficient. Now imagine you lose a few data points every day.  The cause isn’t important but it could be a glitch with a sensor, a dropped packet, or an incomplete write operation in the database.

Do you care?

It depends from what angle I’m looking at this question. If I’m the producer of the sensor, I do care if it has a glitch. If I’m a network administrator I do care there are dropped packets. And if I am a database system I do care if I’m dropping write operations. And I also have to tell whoever is using me if I am able to receive operations—am I available when I’m needed?

Original title and link: Cloud Computing Lets Us Rethink How We Use Data (NoSQL database©myNoSQL)

2012-04-05 Thu

04:01 换了博客模板 » 存储部落
01:49 Philosophy 16 » Oracle Scratchpad

2012-04-04 Wed