123
 123

Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes --

2012-01-26 Thu

21:40 Big Data Is More Than Hadoop (2855 Bytes) » myNoSQL
Big Data Is More Than Hadoop:

David Menninger commenting the results of a Big Data survey run by Ventana Research:

This research shows that big data is not a single thing with one uniform set of requirements. Hadoop, a well-publicized technology for dealing with big data, gets a lot of attention (including from me), but there are other technologies being used to store and analyze big data.

Nobody said Hadoop is the only solution for Big Data. But Hadoop is a leading technology in the Big Data market.

One of the most interesting aspects of the survey is captured by the following:

Research participants cited real-time capabilities and integration as their key technical challenges.

Integration in the world of Big Data is like the old saying about successful web sites: “the more you send them away, the more they will come back”.

Update: Here is what Ventana Research was saying about Hadoop adoption in July 2011:

The research findings indicate that Hadoop is already being used in one third of big data environments and evaluated in nearly another fifth.

While in this one:

One-third (34%) are using data warehouse appliances, which typically combine relational database technology with massively parallel processing. About as many (33%) are using in-memory databases. Each of these alternatives is being more widely used than Hadoop. As well, 15% use specialized databases such as columnar technologies, and one-quarter (26%) are using other technologies.

Original title and link: Big Data Is More Than Hadoop (NoSQL database©myNoSQL)

21:11 Mavuno: A Hadoop-Based Text Mining Toolkit (1930 Bytes) » myNoSQL
Mavuno: A Hadoop-Based Text Mining Toolkit:

Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.

I’d love to hear from people with more knowledge in the field how Mavuno compares to Mahout.

Ryan Rosario

Original title and link: Mavuno: A Hadoop-Based Text Mining Toolkit (NoSQL database©myNoSQL)

20:12 Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials (2168 Bytes) » myNoSQL
Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials:

Adam Gray[1]:

In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.

If you put together Amazon S3, Amazon DynamoDB, Amazon RDS, and Amazon Elastic MapReduce, you have a complete polyglot persistence solution in the cloud[2].


  1. Adam Gray is Product Manager on the Elastic MapReduce Team  

  2. Complete in the sense of core building blocks.  

Original title and link: Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials (NoSQL database©myNoSQL)

03:24 Pythian at RMOUG Training Days 2012 (428 Bytes) » The Pythian Blog
Pythian is very excited to return to the much-awaited RMOUG 12 held in Denver, Colorado from February 14-16, 2012. Keep your eyes open for Alex Gorbachev, Marc Fielding, Don Seiler and Gwen Shapira in attendance. We have a fantastic line-up of speakers this year featuring a total of seven papers presented by Alex, Marc, Don [...]

2012-01-25 Wed

21:40 12 Hadoop Vendors to Watch in 2012 (1958 Bytes) » myNoSQL

My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:

  • Amazon Elastic MapReduce
  • Cloudera
  • Datameer
  • EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
  • Hadapt
  • Hortonworks
  • IBM (InfoSphere BigInsights)
  • Informatica (for HParser)
  • Karmasphere
  • MapR
  • Microsoft
  • Oracle

Original title and link: 12 Hadoop Vendors to Watch in 2012 (NoSQL database©myNoSQL)

01:11 T-SQL: Retrieve all users and associated roles for ALL databases (514 Bytes) » The Pythian Blog
A frequent inquiry concerning databases’ security is to retrieve the database role(s) associated with each user for auditing or troubleshooting purposes. Each database user (principal) can be retrieved from sys.database_principals and the associated database roles can be retrieved from sys.database_role_members The following code runs against ALL the databases using SP_MSForeachdb and all roles for one [...]

2012-01-24 Tue

23:09 More Details About Apache HBase 0.92.0 (2418 Bytes) » myNoSQL
More Details About Apache HBase 0.92.0:

Jonathan Hsieh provides a summary of the new features in HBase 0.92.0 by splitting them into user features:

  • HFile v2, a new more efficient storage format
  • Faster recovery via distributed log splitting
  • Lower latency region-server operations via new multi-threaded and asynchronous implementations.

operator features:

  • An enhanced web UI that exposes more internal state
  • Improved logging for identifying slow queries
  • Improved corruption detection and repair tools

and developer features:

  • Coprocessors
  • Build support for Hadoop 0.20.20x, 0.22, 0.23.
  • Experimental: offheap slab cache and online table schema change

Earlier today when covering the HBase 0.92.0 release, I wrote that coprocessors are the hightlight of this release. I’ll take that back. Way too many interesting features in HBase 0.92.0 to highlight just one of them.

Original title and link: More Details About Apache HBase 0.92.0 (NoSQL database©myNoSQL)