Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes --
Ricky Ho takes a look at MongoDB and summarizes his thoughts about:
- Major differences between MongoDB and RDBMS
- MongoDB query processing, data update, and transaction
- Storage model
- Replication model
- Sharding model
- MongoDB Map-Reduce execution
One thing I am very impressed by MongoDb is that it is extremely easy to use and the underlying architecture is also very easy to understand.
Original title and link: MongoDB Architecture Overview ( ©myNoSQL)
Matt Abrams from Clearspring:
Cardinality estimation algorithms trade space for accuracy. To illustrate this point we counted the number of distinct words in all of Shakespeare’s works using three different counting techniques. Note that our input dataset has extra data in it so the cardinality is higher than the standard reference answer to this question. The three techniques we used were Java HashSet, Linear Probabilistic Counter, and a Hyper LogLog Counter. Here are the results:
Original title and link: Cardinality Estimation Algorithms: Memory Efficient Solutions for Counting 1 Billion Distinct Objects ( ©myNoSQL)
Dr Rami Mukhtar cited by Divina Paredes reporting for PCAdvisor from Big Data Symposium in Sydney:
Big Data is the opportunity to really collect data sources, both big and small, in their source form, or their raw form, in one location or one place, unencumbered by the boundaries of a business or the boundaries of information silos across the business
Original title and link: A Different Big Data Definition and What Data Scientists Are and Do ( ©myNoSQL)
Kenneth Falck shares on his blog what he learned at the recent MongoDB event in Stockholm, covering:
- MongoDB indexing
- MongoDB replica sets
- MongoDB sharing and performance
The one bit I wanted to emphasize before reading his post:
10gen has a shortlist of features they would like to develop soon. Full text search is at the top. Other things included are at least data compression and possibly schema validation as a related feature.
From the developer’s friendliness perspective MongoDB is the most attractive NoSQL database. And these features will make it even more so.
Original title and link: MongoDB Features Roadmap: Full Text Search, Data Compression, Schema Validation ( ©myNoSQL)
Old Quora question, but still very relevant. Top response from Jeff Hammerbacher:
Elastic MapReduce Pros:
- Dynamic MapReduce cluster sizing.
- Ease of use for simple jobs via their proprietary web console.
- Great documentation.
- Integrates nicely with other Amazon Web Services.
Cloudera Distribution for Hadoop:
- CDH is open source; you have access to the source code and can inspect it for debugging purposes and make modifications as required.
- CDH can be run on a number of public or private clouds using an open source framework, Whirr, so you’re not tied to a single cloud provider
- With CDH, you can move your cluster to dedicated hardware with little disruption when the economics make sense. Most non-trivial applications will benefit from this move.
- CDH packages a number of open source projects that are not included with EMR: Sqoop, Flume, HBase, Oozie, ZooKeeper, Avro, and Hue. You have access to the complete platform composed of data collection, storage, and processing tools.
- CDH packages a number of critical bug fixes and features and the most recent stable releases, so you’re usually using a more stable and feature-rich product.
- You can purchase support and management tools for CDH via Cloudera Enterprise.
- CDH uses the open source Oozie framework for workflow management. EMR implemented a proprietary “job flow” system before major Hadoop users standardized on Oozie for workload management.
- CDH uses the open source Hue framework for its user interface. If you require new features from your web interface, you can easily implement them using the Hue SDK.
- CDH includes a number of integrations with other software components of the data management stack, including Talend, Informatica, Netezza, Teradata, Greenplum, Microstrategy, and others. […]
- CDH has been designed and deployed in common Linux environments and you can use standard tools to debug your programs. […]
Make sure you also read Hadoop in the Cloud: Pros and Cons which addresses (almost) the same question.
A Twitter-style answer to this question would be: “Control and customization vs Automated and Managed Service”. 80 characters left to add your own perspective.
Original title and link: What Are the Pros and Cons of Running Cloudera’s Distribution for Hadoop vs Amazon Elastic MapReduce Service? ( ©myNoSQL)
A list of DynamoDB libraries covering quite a few popular languages and frameworks:
A couple of things I’ve noticed (and that could be helpful to other NoSQL database companies):
- Amazon provides official libraries for a couple of major programming languages (Java, .NET, PHP, Ruby)
- Amazon is not shy to promote libraries that are not official, but established themselves as good libraries (e.g. Python’s Boto)
- The list doesn’t seem to include anything for C and Objective C (Objective C is the language of iOS and Mac apps)
Original title and link: DynamoDB Libraries, Mappers, and Mock Implementations ( ©myNoSQL)
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ktspReadExtents:range], , , , , , , 
Current SQL statement for this session:
KTSP - Kernel Transaction SPace Transaction
ReadExtents - Read segments Extents info
Range - Extents Range
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp+0148 bl ksedst 1029746FC ?
ksfdmp+0018 bl 01FD4014
kgerinv+00e8 bl _ptrgl
kgeasnmierr+004c bl kgerinv 000000000 ? 000000000 ?
ktspReadExtentsFrom bl kgeasnmierr 110006308 ? 1103994E8 ?
L1+027c 102A91EC0 ? 400000004 ?
000000004 ? 000000001 ?
000000004 ? 075404309 ?
ktspGenExtentMap+02 bl ktspReadExtentsFrom 11037C5F0 ? 11037C698 ?
9c L1 760045892F1F5EB0 ?
300000020 ? B1100989E8 ?
kteinmap+0150 bl ktspGenExtentMap 11037C470 ? 11037BFA0 ?
这里可以看到扫描错误出现在Level 1级别的位图扫描，地址是： 075404309 = 1967145737
所以这个ORA-600 错误给出的两个参数，第一个是：位图级别； 第二个参数是：地址
- 数据恢复:ORA-600 kccpb_sanity_check_2解决
- Oracle数据恢复:ORA-00600 6002错误的解决
- Oracle数据恢复:ORA-00600 6749与ORA-8102
- ORA-600  ON SYSMAN.MGMT_METRICS_RAW
- ORA-600 [kghasp1] 引致的PGA内存故障
IBM’s Arvind Krishna in an interview for The Register:
Krishna said he sees the potential for three pillars of data-based computing: SQL – to give a language and syntax for programming; Hadoop – to provide a MapReduce semantic; and a third pillar which is yet to be decided upon. That could be a MongoDB or HBase, but the market will pick a winner. “There’s a whole set: one will survive,” Krishna said.
I’m pretty sure that last part (i.e. “that could be MongoDB or HBase”) is a mis-quote as the rest of what Krishna is saying makes a lot of sense:
“Wherever open source is mature I will leverage it; I won’t compete with it. To believe one can be monolithic, proprietary and closed and … succeed is a foolish proposition. One has to embrace open source and work with an ecosystem. Clients are looking to you to add value.”
Original title and link: The Three Pillars of Data-Based Computing: SQL, Hadoop And ( ©myNoSQL)
Stone Gao: “This talk is not Yet Another Talk about it’s Awersomeness but challenges with MongoDB”. Plus workarounds.
Original title and link: Challenges With MongoDB ( ©myNoSQL)
Erwin van der Koogh:
It’s perfect if it works for you. But please do not automatically generalize it.
Original title and link: Why I Love NodeJS and Redis | Erronis ( ©myNoSQL)
But not everything we do in a database needs guaranteed transactional consistency.
Imagine you are charged with designing a system to collect data on temperature, air flow and electricity use in a building every few minutes from hundreds of locations. The system will be used to make the building more energy efficient. Now imagine you lose a few data points every day. The cause isn’t important but it could be a glitch with a sensor, a dropped packet, or an incomplete write operation in the database.
Do you care?
It depends from what angle I’m looking at this question. If I’m the producer of the sensor, I do care if it has a glitch. If I’m a network administrator I do care there are dropped packets. And if I am a database system I do care if I’m dropping write operations. And I also have to tell whoever is using me if I am able to receive operations—am I available when I’m needed?
Original title and link: Cloud Computing Lets Us Rethink How We Use Data ( ©myNoSQL)
- Give you some color to see see!
- Oracle Scratchpad
- Oracle Life
- Channel [K]
- Oracle Security Blog
- The Tom Kyte Blog
- O'Reilly Databases
- Red Hat Magazine
- 木匠 Creative and Flexible
- Hey!! Sky!
- Brotherxiao's Home
- jametong's shared items in Google Reader
- DBA Tools
- Inside the Oracle Optimizer - Removing the black magic
- 支付宝官方 Blog - 支付志
- 木匠的天空 Database Architect and Developer
- Hello DBA
- Cary Millsap
- Guy Harrison's main page
- eagle's home
- DBA Notes
- OracleDBA Blog---三少个人涂鸦地！
- The Pythian Blog