123
 123

Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes --

2012-01-25 Wed

21:40 12 Hadoop Vendors to Watch in 2012 (1958 Bytes) » myNoSQL

My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:

  • Amazon Elastic MapReduce
  • Cloudera
  • Datameer
  • EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
  • Hadapt
  • Hortonworks
  • IBM (InfoSphere BigInsights)
  • Informatica (for HParser)
  • Karmasphere
  • MapR
  • Microsoft
  • Oracle

Original title and link: 12 Hadoop Vendors to Watch in 2012 (NoSQL database©myNoSQL)

01:11 T-SQL: Retrieve all users and associated roles for ALL databases (514 Bytes) » The Pythian Blog
A frequent inquiry concerning databases’ security is to retrieve the database role(s) associated with each user for auditing or troubleshooting purposes. Each database user (principal) can be retrieved from sys.database_principals and the associated database roles can be retrieved from sys.database_role_members The following code runs against ALL the databases using SP_MSForeachdb and all roles for one [...]

2012-01-24 Tue

23:09 More Details About Apache HBase 0.92.0 (2418 Bytes) » myNoSQL
More Details About Apache HBase 0.92.0:

Jonathan Hsieh provides a summary of the new features in HBase 0.92.0 by splitting them into user features:

  • HFile v2, a new more efficient storage format
  • Faster recovery via distributed log splitting
  • Lower latency region-server operations via new multi-threaded and asynchronous implementations.

operator features:

  • An enhanced web UI that exposes more internal state
  • Improved logging for identifying slow queries
  • Improved corruption detection and repair tools

and developer features:

  • Coprocessors
  • Build support for Hadoop 0.20.20x, 0.22, 0.23.
  • Experimental: offheap slab cache and online table schema change

Earlier today when covering the HBase 0.92.0 release, I wrote that coprocessors are the hightlight of this release. I’ll take that back. Way too many interesting features in HBase 0.92.0 to highlight just one of them.

Original title and link: More Details About Apache HBase 0.92.0 (NoSQL database©myNoSQL)

20:24 Google Research: Let's Make TCP Faster (2140 Bytes) » myNoSQL
Google Research: Let's Make TCP Faster:

Google is actively researching ways to improve TCP:

Our research shows that the key to reducing latency is saving round trips. We’re experimenting with several improvements to TCP. Here’s a summary of some of our recommendations to make TCP faster:

  1. Increase TCP initial congestion window to 10 (IW10). The amount of data sent at the beginning of a TCP connection is currently 3 packets, implying 3 round trips (RTT) to deliver a tiny 15KB-sized content.
  2. Reduce the initial timeout from 3 seconds to 1 second.
  3. Use TCP Fast Open (TFO).
  4. Use Proportional Rate Reduction for TCP (PRR).

The database world attacked the network latency with connection pools and pipelining. For reducing network round trips we’ve used JOINs or denormalized data. But all software architectures will benefit from a faster TCP.

Andrei Savu

Original title and link: Google Research: Let’s Make TCP Faster (NoSQL database©myNoSQL)

18:19 Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions? (2591 Bytes) » myNoSQL

It looks like the three pictures about Hadoop versionsfirst two by Cloudera and the third by Konstantin I. Boudnik & Cos—are actually worth 1066 Gartner words.

On the other hand, to address the question in the title—would custom distributions clarify Hadoop versions—I think that while custom distributions might be helpful for experimenting or getting started with Hadoop, long term they’ll actually lead to more segmentation in the market and bigger maintenance and upgrade costs for end users.

There are just a few companies with a track record of maintaining and distributing open source projects—in the Hadoop space these are Cloudera and Hortonworks (nb Hortonworks is supporting the Apache Hadoop distribution). So if a vendor tries to sell you a Hadoop package ask them about their history managing open source distributions.

Original title and link: Apache Hadoop 1.0 Doesn’t Clear Up Trunks and Branches Questions. Do Distributions? (NoSQL database©myNoSQL)

17:52 A Cost Analysis of DynamoDB for Tarsnap (2725 Bytes) » myNoSQL
A Cost Analysis of DynamoDB for Tarsnap:

Tarsnap is a service offering secure online backups. Colin Percival details the costs Tarsnap would have for using Amazon DynamoDB:

For each TB of data stored, this gives me 30,000,000 blocks requiring 60,000,000 key-value pairs; these occupy 2.31 GB, but for DynamoDB pricing purposes, they count as 8.31 GB, or $8.31 per month. That’s about 2.7% of Tarsnap’s gross revenues (30 cents per GB per month); significant, but manageable. However, each of those 30,000,000 blocks need to go through log cleaning every 14 days, a process which requires a read (to check that the block hasn’t been marked as deleted) and a write (to update the map to point at the new location in S3). That’s an average rate of 25 reads and 25 writes per second, so I’d need to reserve 50 reads and 50 writes per second of DynamoDB capacity. The reads cost $0.01 per hour while the writes cost $0.05 per hour, for a total cost of $0.06 per hour — or $44 per month. That’s 14.6% of Tarsnap’s gross revenues; together with the storage cost, DynamoDB would eat up 17.3% of Tarsnap’s revenue — slightly over $0.05 from every $0.30/GB I take in.

To put it differently getting an 83.7% profit margin sounds like a good deal, but without knowing the costs of the other components (S3, EC2, data transfer) it’s difficult to conclude if this solution would remain profitable at a good margin. Anyway, an interesting aspect of this solution is that the costs of some major components of the platform (S3, DynamoDB) would scale lineary with the revenue.

Original title and link: A Cost Analysis of DynamoDB for Tarsnap (NoSQL database©myNoSQL)

13:46 Latest NoSQL Releases: HBase 0.92, DataStax Community Server, Hortonworks Data Platform, SolrCloud (4064 Bytes) » myNoSQL

Just a quick roundup of the latest releases and announcements.

Hortonworks Data Platform (HDP) version 2

HDP v2 will include:

  • NextGen MapReduce architecture
  • HDFS NameNode HA
  • HDFS Federation
  • up-to-date HCatalog, HBase, Hive, Pig

According to the announcement:

In order to avoid confusion, let me explain the two versions of HDP:

  • HDP v1 is based upon Apache Hadoop 1.0 (which comes from the 0.20.205 branch). It the most stable, production-ready version of Hadoop that is currently found in many large enterprise deployments. HDP v1 is currently available as a private technology preview. A public technology preview will be made available later this quarter.
  • HDP v2 is based upon Apache Hadoop 0.23, which includes the next generation advancements mentioned above. It’s an important step forward in terms of scalability, performance, high availability and data integrity. A technology preview will also be made publicly available later in Q1.

SolrCloud Completes Phase 2

Mark Miller about the completion of phase 2:

The second phase of SolrCloud has been in full swing for a couple of months now and it looks like we are going to be able to commit this work to trunk very soon! In Phase1 we built on top of Solr’s distributed search capabilities and added cluster state, central config, and built-in read side fault tolerance. Phase 2 is even more ambitious and focuses on the write side. We are talking full-blown fault tolerance for reads and writes, near real-time support, real-time GET, true single node durability, optimistic locking, cluster elasticity, improvements to the Phase 1 features, and more.

Not there yet, but it’s coming.

DataStax Community Server 1.0.7

A new release of DataStax’s distribution of Cassandra incorporating Cassandra 1.0.7

HBase 0.92

Don’t let the version number trick you. This is an important release for HBase featuring:

  • coprocessors
  • security
  • new (self-migrating) file format
  • AWS improvements: EBS support, building a HA cluster

The list of new features, improvements, and bug fixes in HBase 0.92 is impressive. But the highlight of this release is in my opinion HBase coprocessors (Jira entry HBASE-200).

I’m leaving you with Andrew Purtell’s slides about HBase Coprocessors:

Original title and link: Latest NoSQL Releases: HBase 0.92, DataStax Community Server, Hortonworks Data Platform, SolrCloud (NoSQL database©myNoSQL)

13:29 Introducing Amazon DynamoDB Slidesdeck (1636 Bytes) » myNoSQL

An official slidedeck to introduce Amazon DynamoDB to your team. My notes about DynamoDB could be a nice addition.

Original title and link: Introducing Amazon DynamoDB Slidesdeck (NoSQL database©myNoSQL)

03:37 Solr Index Replication at Etsy: From HTTP to BitTorrent (2130 Bytes) » myNoSQL
Solr Index Replication at Etsy: From HTTP to BitTorrent:

Etsy went from using HTTP to BitTorrent for replicating Solr indexes:

By integrating BitTorrent protocol into Solr we could replace HTTP replication. BitTorrent supports updating and continuation of downloads, which works well for incremental index updates. When we use BitTorrent for replication, all of the slave servers seed index files allowing us to bring up new slaves (or update stale slaves) very quickly.

[…]

Our Ops team started experimenting with a BitTorrent package herd, which sits on top of BitTornado. Using herd they transferred our largest search index in 15 minutes. They spent 8 hours tweaking all the variables and making the transfer faster and faster. Using pigz for compression and herd for transfer, they cut the replication time for the biggest index from 60 minutes to just 6 minutes!

Make sure you don’t miss the part where they were experimenting with multicast UDP rsync.

Original title and link: Solr Index Replication at Etsy: From HTTP to BitTorrent (NoSQL database©myNoSQL)

00:22 Jelastic Database Marketshare: MySQL, MongoDB, MariaDB (2154 Bytes) » myNoSQL
Jelastic Database Marketshare: MySQL, MongoDB, MariaDB:

Jelastic, a company offering a cloud platform for Java server hosting, has published some stats about the databases used by their over 7000 users:

Jelastic Database Marketshare

While it would be wrong to generalize these results to absolute database marketshare, it is interesting nonetheless to see that MongoDB is already outrunning PostrgeSQL being the second most used database and that CouchDB, which was added only one month ago, is already used by 5% of Jelastic’s users. MySQL detains the first position with over 40% users or differently put double the number of the second place (MongoDB).

These numbers would be even more interesting if they would account for some real usage stats like database sizes or query volumes.

Mat Keep

Original title and link: Jelastic Database Marketshare: MySQL, MongoDB, MariaDB (NoSQL database©myNoSQL)

2012-01-23 Mon