Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes --

2017-07-28 Fri

21:58 Episode 14 – DBA Career & Automation (4005 Bytes) » Official Pythian Blog

In this episode we are going to continue talking about the evolution of the DBA career, specifically as it pertains to automation. Today we are under more pressure than ever to automate our tasks. This is driven by cloud-first initiatives and the need to increase development velocity as well as other practices such as Agile and DevOps. Today as DBAs we need to know more languages than ever, we need to know more platforms than ever, we need to do more work, rather, be more productive than ever. So this means we also have to change our approach.

Joining us today we have Derek Downey who is the Practice Advocate at Pythian who will present OpenSource Technologies, we have Warner Chaves, an Internal Principal Consultant at Pythian and Microsoft MVP who will represent SQL Server and we have Bjoern Rost who is an Internal Principal Consultant at Pythian and Oracle ACE Director and a member of the brand new program The Oracle Developer Champion Program representing the Oracle Database.

Key Points From This Episode:

• Derek tells us how he is now able to implement better and more robust tasks in OpenSource.
• How the provisioning in Oracle has changed, and what that means.
• Find out how the landscape is very similar in the Windows world, according to Warner.
• Warner explains how they are required to do more and manage more servers with less time.
• Understand why automation is not for everybody; opinions differ.
• Discover what has been influencing and driving the need to automate.
• Hear how mistakes can be very expensive and the impact that company cultures have.
• Why DBAs will never be totally automated out of their jobs, if they know other disciplines.
• The pitfalls of automation and what things we should not be automating.
• The importance of having fail safes and understanding exactly what you are automating.
• Discover some best automation practices to mitigate risks.
• Understand how to operate on the principle of least privilege.
• Learn why it is important to verify as much as possible through your scripts.
• Find out why Databases are different to Web Servers; mitigation versus data loss.
• Understand how to measure success as a manager of DBA’s.
• The importance of knowing the clear objectives of a team and whether they are well matched.
• Learn how to distinguish hard work from good work.
• Embracing the tools that already exists versus trying to reinvent the wheel.
• Understand more about the trends in automation today.
• And much more!

Links Mentioned in Today’s Episode:

Derek Downey — https://www.pythian.com/blog/author/downey/
Derek on Twitter — https://twitter.com/derek_downey/
Pythian — https://www.pythian.com/
Warner Chaves — https://www.pythian.com/experts/warner-chaves/
Warner on Twitter — https://twitter.com/warchav
Bjorn Rost — https://www.pythian.com/experts/bj%D3%A7rn-rost/
Bjorn on Twitter — https://twitter.com/brost
Oracle — https://www.oracle.com/index.html
Opensource — https://opensource.org/
Agile — http://www.agilenutshell.com/
DevOps — https://devops.com/
Amazon S3 — https://aws.amazon.com/s3/
Microsoft — https://www.microsoft.com/en-ca/
AWS — https://aws.amazon.com/
Management Studio — https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms
PowerShell — https://msdn.microsoft.com/en-us/powershell/mt173057.aspx
.Net — https://www.microsoft.com/net/
Puppet — https://puppet.com/
Python — https://www.python.org/
Oribi — https://oribisoftware.com/en/

01:03 Words I Don’t Use, Part 2: “Holistic” (728 Bytes) » Cary Millsap
The second “word I do not use” is holistic.

When people use the word “holistic” in my industry (Oracle), it means that they’re paying attention to not just an individual subcomponent of a system, but to a whole system, including (I hope) even the people it serves.

But trying to differentiate technology services by saying “we take a holistic view of your system” is about like differentiating myself by saying I’ll wear clothes to work. Saying “holistic” would make it look like I’ve only just recently become aware that optimizing a system’s individual subsystems is not a reliable way to optimize the system itself. This should not be a distinctive revelation.
00:43 Automation, PowerShell and Word Templates – Let the Technician Do Tech (9483 Bytes) » Official Pythian Blog

In my work as a Database Consultant, I enjoy technical challenges, but it’s always a painful process when I need to transfer my technical knowledge into a report. Writing the report isn’t really the difficult part; the real challenge is making it accessible for those who don’t have the same degree of technical knowledge.

Part of this can be minimized by creating report templates. Let’s say that you need to create a performance analysis to identify and report problems and recommend solutions. By creating a report template, with the basics already filled in, then you only need to add the technical information to the template, interpret the results and share your conclusions.

Using a template makes the process less painful and faster. You will also need a standard template to be reused when you have to perform a similar review for other servers or customers. However, collecting the data, adding it to the template and formatting it, is still something that may take a lot of time, and it is not the most exciting task for a highly technical person. As someone who solves problems for a living, I had a thought “why can’t we just automate it?”.

“Automation” isn’t just a buzzword

With Microsoft supporting the open-source community more than ever, projects like the outstanding dbatools are popping up and getting stronger. PowerShell is our best friend for this challenge and the rule is always the same: if the probability to repeat a task is high, let’s automate it!

'Automation' isn't just a buzzword.
Click To Tweet

Sometimes prepare all the automation can take a long time; just remember, this is an investment in the future…

An “automation” real case

Last year we worked on a project where several SQL Server boxes were to be deployed. Some like a simple standalone and others with AlwaysOn Availability Groups feature ready to be used. In the worst-case scenario the following tasks could be performed (after having the base VMs deployed):

  1. Format the SQL Server disks in 64K
  2. Configure the best practices for the OS
  3. Create a two-node Windows Server Failover Cluster
  4. Install a standalone SQL Server instance on both servers
  5. Configure SQL Server base on the best practices
  6. Enable AlwaysOn AG and create a group with a listener and an IP

How long would you take to complete all those steps manually on hundreds of servers? Well, a lot of time. To automate this, we took advantage of the PowerShell Desired State Configuration, mixed with some custom code. In the end, we produced a set of scripts that helped us to have a two-node cluster with AlwaysOn AG ready in some minutes.

Of course it took some time to develop and test it, but now we are ready face another similar challenge and start producing, with minor changes.

Going back to the report 

Ok, now that I proved that automation is the way to go, let’s go back to the report issue: How did I solve the  problem?

Let’s summarize it…

  • Objective: Find a way to insert data into the Word template.
  • Used Technology: Powershell
  • Strategy: We have a Word template with the common report text. We need to complete it by placing the results of the collected data on specific zones of the template.

With this, I am going to list all the challenges I encountered and how I solved them:

How can I edit a Word file using PowerShell? 

After some research I found out that we can create a COM Object based on Word.Application, by using the following function you can open a Word file:

Function OpenWordDoc($Filename)
$Word=NEW-Object –comobject Word.Application
Return $Word.documents.open($Filename)

With the document opened, I could perform all of the necessary changes, save the file and use the following function:

Function SaveAsWordDoc($Document, $FileName)

But what about the document changes? I found two challenges: how to replace a text tag (like <TAG_YOUR_NAME>) and how to add an image on a specific location. Here’s how I did it:

Replacing a text tag: This one was simple. I just wanted to identify specific text in the document and replace all the instances with other text. In order to achieve that I used the following function.

Function ReplaceTag($Document, $FindText, $ReplaceWithText)
$matchCase = $false;
$matchWholeWord = $true;
$matchWildCards = $false;
$matchSoundsLike = $false;
$matchAllWordForms = $false;
$forward = $true;
$format = $false;
$matchKashida = $false;
$matchDiacritics = $false;
$matchAlefHamza = $false;
$matchControl = $false;
$read_only = $false;
$visible = $true;
$replace = 2;
$wrap = 1;

$FindReplace.Execute($findText, $matchCase, $matchWholeWord, $matchWildCards, $matchSoundsLike, $matchAllWordForms, $forward, $wrap, $format, $replaceWithText, $replace, $matchKashida ,$matchDiacritics, $matchAlefHamza, $matchControl) | Out-Null

Adding an image: I spent quite some time trying to find a way to achieve this. Basically, I wanted to add an image from a file located on my computer, in a specific place. I tried to create a text tag and replaced it, but it wasn’t working. So I found a way to add images to a Word document using the method $Document.ActiveWindow.Selection.InlineShapes.AddPicture(). This partially solved my problem. With this method I could add an image in the document, in the location of the pointer.

How did I solve this? As creating text tags were not the way to go, I learned that I could achieve this by creating bookmarks (in Word, go to Insert->Bookmarks); this way I could move the pointer to a bookmark and then add the image. Here is the function to do this:

Function AddImage($Document, $BookmarkName, $ReplaceWithImage)


Note that the bookmark won’t be removed once you add the image. If you try to add multiple images on the same bookmark, it will work.


Now that we have the main functions, how can we make this work altogether? This is simple:

$TemplateFile = “C:\reports\Template_Report.docx”
$FinalFile = “C:\reports\FinalReport.docx”

# Open template file
$Doc=OpenWordDoc -Filename $TemplateFile

# Replace text tags
ReplaceTag –Document $Doc -FindText ‘<client_name>’ -replacewithtext “Pythian”
ReplaceTag –Document $Doc -FindText ‘<server_name>’ -replacewithtext “WINSRV001”

# Add image
AddImage –Document $Doc -BookmarkName ‘img_SomeBookmark’ -ReplaceWithImage “C:\reports\img.png”

# Save FInal Report
SaveAsWordDoc –document $Doc –Filename $FinalFile



With automation we can achieve more than purely technical tasks.
Click To Tweet

With automation we can achieve more than purely technical tasks. As shown, we can take advantage of it and use this time to focus on what matters by leaving the repetitive and easy work to our “personal robot”. With a template and a few PowerShell lines, we can automate reports, which will help standardize our work and provide a more in-depth report because there is more time for technical input and analysis.


Looking for ways to resolve recurring problems in your business through the strategic use of automation? Pythian can help! Click here to learn more.

2017-07-27 Thu

02:44 How a successful cloud migration boosted SaaS platform performance for Mapp Digital (2174 Bytes) » Official Pythian Blog

Today’s innovative organizations have come to rely on the flexibility, modularity, and scalability of the cloud for their business-critical SaaS platforms and associated data. Mapp Digital’s cloud migration project is just one example of how Pythian can help SaaS companies take full advantage of the benefits of the cloud.

Marketing technology provider Mapp Digital needed to increase the performance of their customer engagement platform, specifically during peak times (for example on Black Friday), refresh the current hardware platform and consolidate different databases. Migrating to the cloud was the answer. When it came to making the move, they turned to their trusted IT partner, Pythian.

“Pythian has been supporting our databases for multiple years, and knows the complexities and requirements specific to our environment. Collaborating with Pythian, we were able to successfully plan, evaluate, test, and migrate our databases from an on-site datacenter to AWS Cloud.” Dean Smith, Director IT Operations, Mapp Digital

With Pythian’s help they migrated to the Amazon Web Services (AWS) platform. Pythian helped the company with all aspects of the migration, from planning, to testing, to implementation, and continues to monitor the company’s database environments for outages while making recommendations on configurations for optimizing availability, and opportunities to save on cost.

Read the full story to find out how Mapp Digital migrated from an on-premises datacenter to the cloud with help from Pythian.

2017-07-26 Wed

23:37 A Gentle Introduction to Google Cloud Platform for Machine Learning Practice (11234 Bytes) » Official Pythian Blog

In a previous post titled Machine Learning on the cloud, we examined in plain language what is machine learning, what is the cloud, and the merits of leveraging cloud resources for machine learning practice.

In this post, we introduce the Google Cloud Platform (GCP) as a simple, yet powerful, and cost effective cloud option for performing machine learning. Whats more, we provide a simple walkthrough on how to set up the environment for machine learning model development on GCP.

Why GCP?

One of the big competitors in the cloud computing space is Google, with their cloud resource offering termed as “Google Cloud Platform” popularly referred to as GCP for short.

Google is also one of the top technology leaders in the internet space with a range of top web products such as Gmail, YouTube, Google Hangouts, Google Maps, and Google+, to mention just a few. The aforementioned products generate, store and process tons of Terabytes of data each day from internet users around the world.

To deal with this significant data, Google over the years has invested heavily in processing and storage research and infrastructure. Google, as of today, boasts some of the most impressive data center design and technology in the world to support their computational demands and computing services.

Google Cloud Platform makes available to the public lighting fast computational speed (it is getting faster!) and high-tech storage capabilities with extremely low latency (meaning minimal delays in data transfer) and high throughput (which can be naively described as the time taken to complete a job). Made possible by state of the art networking technology/ infrastructure.

Why this is simply amazing is that the storage and processing platform on which products like Gmail, Google Docs and the like are built, are now accessible to and available for everyone to utilize.

Giving the public access to powerful, highly-distributed computational power has ushered in a new wave of “real-time intelligence” to software products. Moreover, this has pushed the boundaries and can yet extend the frontiers of machine learning research.

As we see later in this post, utilizing GCP for machine learning practice is extremely easy, and gets the reader one step closer to using bleeding-edge machine learning technology in the cloud in no time.

Storage & Processing: The Core of Intelligence

The end-goal of artificial intelligence (of which machine learning is a subset) is to build machines that think like humans, and reason, as well as machines that act like people and act rationally. To achieve this feat, figuring how to store information received from the environment, as well as how to process that information to uncover new levels of intelligence is critical. Hence, we see that the two immovable pillars of artificial intelligence are storage and processing, with networking coming in-between to facilitate data flow.

the two immovable pillars of artificial intelligence are storage and processing
Click To Tweet

An Inspiration from Natural Intelligence

As with the artificial, so is it with the natural. Human intelligence is likewise precipitated on our uncanny ability to store and process information at incomprehensible speeds (think networking!) zillions of times faster than the speed of light. I call it the “speed of thought.”

Let’s now touch briefly on the key components of GCP storage and processing capabilities for machine learning practice.

Storage: Google Cloud Storage

Google Cloud Storage (GCS) on GCP is the stage for storing virtually any data, from flat files, video, audio, you name it!

Google Cloud Storage is immeasurably scalable, that is to say, the storage space grows as storage needs increases. There is virtually no limit to the volume of data stored. Data stored on Google Cloud Storage is available anytime, and from any location around the world. Whats more, this massive storage power comes at an almost negligible cost, taking into consideration the size and economic value of the stored data. Moreover, acknowledging the accessibility, security, and consistency provided by cloud storage, the cost seems like nothing.

Processing: Google Compute Engine

Google compute engine via GCP makes available to users virtual machines that are physically running on Google’s data centers around the world. Likewise, with cloud storage, the virtual machine scales (i.e. grows) as your processing needs increases.

From a previous post on Machine Learning on the Cloud, we see that the real gains of leveraging cloud compute resources are when we use all the computational resource as is needed to solve a particular task and relinquish these resources after processing.

Playground: Google Datalab

Google data lab on GCP is where machine learning model design and development takes place. This platform is where we run our packages for data cleaning, data exploration, data inference and data preparation for machine learning, and see the results instantly in an interactive manner. It is here that the iterative process of machine learning model development takes place.

Datalab runs on a Google compute engine instance and provides access to data stored on Google cloud storage.

Setting Up Datalab for Machine Learning: A step-by-step walkthrough

Step 1: Create a free account on GCP with a $300 starting credit
1). Go to Google Cloud Platform
2). Create a free account – A credit card is needed to open this free account, although Google does not issue any billing without consent.

Google Cloud Platform

Step 2: Create a project
A project on GCP is like a container of some sort, which houses all GCP resources used.

GCP Project Dashboard

Step 3: Upload data to Google Cloud Platform
1). Go to Storage from the project dashboard

Google Cloud Storage

2). Create a bucket on Google cloud storage to store data

Create bucket

– Choose a unique bucket name and click create

Cloud storage bucket

3). Click Upload files to upload data set to GCS. If data is in a folder, use Upload folder

Upload files to GCS

Step 4: Launch a Datalab interactive environment running on Google compute instance

1). Click on the icon circled in red below to activate Google cloud shell.

activate cloud shell

2). From cloud shell, run the command:
datalab create my-first-lab

start datalab

3). From the list of zones, select a geographically close zone. Here, we use [24] us-east1-b

select zones

4). When prompted to continue, Press Y and then press Enter

press Y

5). When asked for a passphrase, simply skip for now by pressing Enter. When asked again, press Enter once more.


6). After the connection to data lab is established, click on web preview and select Port 8081 to start using data lab

start datalab

7). The data lab interface opens up in a new tab as shown below

datalab interface

8). Click on notebook in the top left corner to start a data lab notebook.
9). Congratulations! We are now ready for Machine Learning practice using Datalab on Google Cloud Platform.

new datalab notebook

Note: This was originally published in ekababisong.org

22:33 Replicating from a higher to lower MySQL version (3560 Bytes) » Official Pythian Blog

As we know, replication is only supported officially between consecutive major MySQL versions, and only from a lower version master to a higher version slave.

This means for example, the following scenario is supported:
5.6 master –> 5.7 slave

while these two scenarios are not supported:
5.5 master –> 5.7 slave
5.7 master –> 5.6 slave

That being said, in some contexts (e.g a MySQL upgrade) it can be valuable to be able to replicate from a master that is using a newer version of MySQL to an older version slave.

This could be used as part of a rollback strategy, and/or be needed in the case of upgrading a master-master replication topology.

The idea of the article is to provide you with instructions on how to make replication work for scenarios where the master is a higher version than the slaves.

Replicating from a 5.6 master to a 5.5 slave

The following parameters need to be present on a 5.6 master’s my.cnf to be able to replicate to a lower version slave:

slave_sql_verify_checksum = OFF
binlog_checksum = NONE

NOTE: remember to bounce the server after any needed parameter changes

The temporal datatypes (TIME/DATETIME/TIMESTAMP) support fractional values starting from MySQL 5.6.4. Hence the storage requirement and encoding differ in comparison to pre-MySQL 5.6.4 temporal datatypes.

For this reason, writing to tables with columns using any of (TIME/DATETIME/TIMESTAMP) created AFTER the upgrade to 5.6 will break replication to 5.5, unless the binlog format is set to STATEMENT.

Replicating back temporal columns using the new format in ROW based format will not work.

If using MIXED binlog format, unsafe statements are logged using ROW format, so it may or may not work depending on the presence of said unsafe statements.

Replicating from a 5.7 master to a 5.6 slave

There is no need for any special my.cnf parameters on this case. However, there is a bug with ignorable events in older MySQL 5.6 versions that prevents replication from a 5.7 master.

This is fixed on 5.6.24, so if you want to replicate from 5.7 master to 5.6, make sure the slave is using MySQL >=5.6.24.

Finally, avoid using SET PASSWORD statements (those are deprecated anyway), as they will lock you out of your slave in a mixed 5.7/5.6 topology.

You can use ALTER USER user IDENTIFIED BY ‘auth_string’; instead. There is a bug reported about this as well (closed as Won’t fix).


While not a supported configuration, replicating from a higher to lower MySQL version is still possible. Keep in mind that there may be some edge cases where you still encounter problems.

I look forward to hearing about your experiences in the comments section below.