Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes -- ![]()
2012-01-26 Thu
David Menninger commenting the results of a Big Data survey run by Ventana Research:
This research shows that big data is not a single thing with one uniform set of requirements. Hadoop, a well-publicized technology for dealing with big data, gets a lot of attention (including from me), but there are other technologies being used to store and analyze big data.
Nobody said Hadoop is the only solution for Big Data. But Hadoop is a leading technology in the Big Data market.
One of the most interesting aspects of the survey is captured by the following:
Research participants cited real-time capabilities and integration as their key technical challenges.
Integration in the world of Big Data is like the old saying about successful web sites: “the more you send them away, the more they will come back”.
Update: Here is what Ventana Research was saying about Hadoop adoption in July 2011:
The research findings indicate that Hadoop is already being used in one third of big data environments and evaluated in nearly another fifth.
While in this one:
One-third (34%) are using data warehouse appliances, which typically combine relational database technology with massively parallel processing. About as many (33%) are using in-memory databases. Each of these alternatives is being more widely used than Hadoop. As well, 15% use specialized databases such as columnar technologies, and one-quarter (26%) are using other technologies.
Original title and link: Big Data Is More Than Hadoop (©myNoSQL)
Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.
I’d love to hear from people with more knowledge in the field how Mavuno compares to Mahout.
Original title and link: Mavuno: A Hadoop-Based Text Mining Toolkit (©myNoSQL)
Adam Gray[1]:
In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.
If you put together Amazon S3, Amazon DynamoDB, Amazon RDS, and Amazon Elastic MapReduce, you have a complete polyglot persistence solution in the cloud[2].
Original title and link: Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials (©myNoSQL)
2012-01-25 Wed
My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:
- Amazon Elastic MapReduce
- Cloudera
- Datameer
- EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
- Hadapt
- Hortonworks
- IBM (InfoSphere BigInsights)
- Informatica (for HParser)
- Karmasphere
- MapR
- Microsoft
- Oracle
Original title and link: 12 Hadoop Vendors to Watch in 2012 (©myNoSQL)
2012-01-24 Tue
Jonathan Hsieh provides a summary of the new features in HBase 0.92.0 by splitting them into user features:
- HFile v2, a new more efficient storage format
- Faster recovery via distributed log splitting
- Lower latency region-server operations via new multi-threaded and asynchronous implementations.
operator features:
- An enhanced web UI that exposes more internal state
- Improved logging for identifying slow queries
- Improved corruption detection and repair tools
and developer features:
- Coprocessors
- Build support for Hadoop 0.20.20x, 0.22, 0.23.
- Experimental: offheap slab cache and online table schema change
Earlier today when covering the HBase 0.92.0 release, I wrote that coprocessors are the hightlight of this release. I’ll take that back. Way too many interesting features in HBase 0.92.0 to highlight just one of them.
Original title and link: More Details About Apache HBase 0.92.0 (©myNoSQL)
AnySQL.net
Give you some color to see see!
Oracle Scratchpad
Oracle Life
Channel [K]
Oracle Security Blog
The Tom Kyte Blog
Delicious/Fenng/oracle
O'Reilly Databases
Red Hat Magazine
车东[Blog^2]
blue_prince
玉面飞龙的BLOG
木匠 Creative and Flexible
Brotherxiao's Home
jametong's shared items in Google Reader
DBA Tools
ramarao
Inside the Oracle Optimizer - Removing the black magic
DBA@Taobao
存储部落
OracleBlog.org
知道分子
支付宝官方 Blog - 支付志
木匠的天空 Database Architect and Developer
Hello DBA
OS与Oracle
Cary Millsap
Guy Harrison's main page
eagle's home
DBA Notes
OracleDBA Blog---三少个人涂鸦地!The Pythian Blog
myNoSQL