Friday, December 19, 2014

MongoDB - Backing up a Sharded Cluster (is tricky)

*UPDATE*
I am blind.  Just found this:
http://docs.mongodb.org/manual/tutorial/backup-sharded-cluster-with-database-dumps/



Maybe I'm blind, but I haven't seen a concise, high level summary of how to backup a sharded cluster.


  1. halt balancer.
  2. make sure no chunks flying around (isBalancerRunning lets u know if chunk is movin).
  3. backup preferably from Primary shards.
  4. backup config server data.
  5. re-enable balancer.

THEY ARE ONE UNIT.  AND MUST BE RESTORED AS ONE UNIT.

Thursday, December 18, 2014

Puppet - Containment Again

Most of the content is a poor regurgitation of these pretty damn good links:
http://blog.mayflower.de/4573-The-Puppet-Anchor-Pattern-in-Practice.html
http://bombasticmonkey.com/2011/12/27/stop-writing-puppet-modules-that-suck/

include does NOT cause the included class to be Contained within the including class
require -does- .....

When you write a module, make sure you CONTAIN ALL THE THINGS.
Otherwise you may screw a user of your module... as your classes 'float off' and may not be applied when the user anticipated.

Your containing class could use a buncha REQUIREs to contain other classes, BUT
if you need to declare parameterized classes, you are SOL or 
if you need the classes you declare in a certain sequence, you'll have to use another option:


You'll either need "The Anchor Pattern" or "Contain" (contain is only available in 3.4+)
-------------------------------------
"Anchor Pattern" example:
class wrapper {
  anchor { 'wrapper::begin': } ->
  class { 'foo': }             ->
  class { 'bar': }             ->
  class { 'end': }             ->
  anchor { 'wrapper::end': }
}
-------------------------------------
"Contain" Example:
class wrapper {
  contain foo
  contain bar
  contain end

  Class['foo'] ->
  Class['bar'] ->
  Class['end']
}
-------------------------------------

StatsD - links

https://github.com/etsy/statsd#concepts
https://github.com/etsy/statsd#more-specific-topics
** Sneaky Caveat with Graphite **
https://github.com/etsy/statsd/blob/master/docs/graphite.md



Tuesday, December 16, 2014

MongoDB - Finding Slow Operations

db.currentOp()
db.currentOp({"secs_running" : { "$gt" : 2 }})

currentOp.client can give you the IP and Port of where the query originated from.
(in our case, we have many Services....... this can help isolate commands coming from a specific Service)

http://docs.mongodb.org/manual/reference/method/db.currentOp/#db.currentOp


Saturday, December 13, 2014

Linux - netcat aka nc

So for some reason some of our VMs dont seem to be able to deliver their collectd data to the graphite VM.  So I wondered about port blocking..

Ran the following commands:

SERVER:
  nc -lu 25827

CLIENT 1:
  echo 'test1' |nc -u -w1 thatServer 25827
CLIENT 2:
  echo 'test2' |nc -u -w1 thatServer 25827

Linux - APT

I wish I could get a count of how many times I've used these commands, forgotten them, and had to google them again. 


(show what version of package is installed)
dpkg -s collectd-core
  blah blah blah
  Version: 5.4.0-2ppa1

(uninstall package)
sudo apt-get remove collectd-core

(show versions available for the package)
apt-cache showpkg collectd-core
  blah blah blah
  Provides:
  5.4.0-2ppa1 -
  5.2.0-2ubuntu1 -
  4.10.1-2.1ubuntu7 -

(install a specific version)
sudo apt-get install collectd-core=5.2.0-2ubuntu1

(show where the various package versions are coming from)
apt-cache madison collectd-core
  collectd-core | 5.4.0-2ppa1 | http://something.com/junk/ precise/main amd64 Packages
  collectd-core | 5.2.0-2ubuntu1 | http://barf.com/apt/ precise/main amd64 Packages
  collectd-core | 4.10.1-2.1ubuntu7 | http://trash.com/ubuntu/ precise/universe amd64 Packages





(show where your box is configured to pull packages from)
/etc/apt# cat sources.list
  ###### Ubuntu Main Repos
  blah blah
  ###### Ubuntu Update Repos
  blah blah

Tuesday, December 9, 2014

MongoDB - Delayed Secondaries Timeline

Just an illustration of a sample sequence of events that could cause you to need to recover from a secondary.  (or other backup option)

Monday, December 8, 2014

Puppet - links

How NOT to write a puppet module:
http://bombasticmonkey.com/2011/12/27/stop-writing-puppet-modules-that-suck/

Puppet-Lint 1.0
http://bombasticmonkey.com/2011/12/27/stop-writing-puppet-modules-that-suck/

RSpec Puppet 1.0
http://bombasticmonkey.com/2013/12/05/rspec-puppet-1.0.0/

Puppet Resource Ordering
http://blog.mayflower.de/4573-The-Puppet-Anchor-Pattern-in-Practice.html

Misc - Scoville Units

I've had a slightly strange obsession with hot sauces over the past few years.
Here are two crazy varieties from Tropical Pepper Co:
 
Scorpion Pepper:  780,000 units
and
Ghost Pepper:  500,000 units


Tuesday, December 2, 2014

MongoDB - Delayed Secondaries

Setting up a delayed secondary is a good way to help you recover from a fat-finger error.
In the picture below, you'll notice that a test object can be found on the Primary, but not yet on the delayed Secondary.  To accomplish this, you need to set 'hidden' and 'priority' and 'slaveDelay' attributes.

More about recovering with a delayed secondary by tweaking the OpLog:
http://stackoverflow.com/questions/15444920/modify-and-replay-mongodb-oplog/15451297#15451297


Monday, December 1, 2014

Puppet - Containment Revisited

So as of Puppet 3.4.0, they addressed classes-containing-classes.


Read this as they will describe it much better than me.
http://puppetlabs.com/blog/class-containment-puppet

So the latest docs mention the old anchor pattern:
https://docs.puppetlabs.com/puppet/latest/reference/lang_containment.html#anchor-pattern-containment-for-compatibility-with-puppet--340

Some of the older docs mention the new-in-3.4.0 'contain' function:
https://docs.puppetlabs.com/puppet/3/reference/lang_containment.html#in-puppet-340--puppet-enterprise-32-and-later

And here's the 'contain' function again in the new docs:
https://docs.puppetlabs.com/puppet/3.5/reference/lang_containment.html#the-contain-function

Thus no more Anchor Pattern needed:
http://projects.puppetlabs.com/projects/puppet/wiki/Anchor_Pattern



Vagrant - revisited

Happy Thanksgiving.

Revisiting Vagrant.

Apparently there is a v2 now with new Vagrantfile format.
Also found a VagrantCloud.com.
Used to be frustrating locating trustworthy vagrant boxes.
Got myself a Ubuntu 14.04 LTS box:  https://vagrantcloud.com/ubuntu/boxes/trusty64

Thursday, November 20, 2014

Linux - curl example

Recently was asked to try a curl by a team mate. 
Turns out it was kinda ugly and I had to use:
  • cookies
  • body
  • header
  • redirect
 This is what it ended up looking like:

curl -b cookies.txt -c cookies.txt -L -v -H "Content-Type:application/x-www-form-urlencoded" --data 'login_fail_url=http://junksample.com/damnitall&login_success_url=http://junksample..com/woopeedoo' 'http://junksample.com/v1/login'

the "-L" portion told curl to follow the 302 redirect that comes back
the "-b cookies.txt -c cookies.txt" portion stored a cookie that came back and then used the cookie when it followed the redirect
the "-H ...." portion was for adding a header
the "--data ...." portion was for specifying the POST body

Monday, November 17, 2014

MongoDB - MMS and alerting

Seems to be a relatively new feature, but we now have Alerting from MMS:


Metrics - Statsd to CYA


So we have a dependency on a 3rd party service.
We publish events to this message bus.
We ran performance tests in our PQA environment one day and we very disappointed with the results: some 1.5 second response times. 
So we came up with the idea of NOT publishing events.
Response times dropped to 0.1 seconds.

Now we should wrap the call with a Statsd timer so we can let the other service folks know they need to improve performance.


"You should have published your events asynchronously"
I know.  shut up... :)

Tuesday, November 11, 2014

The Big Time

Wow, I actually had a few page views from France and one from Germany.
Kinds cool.. first time I've noticed any hits that are obviously not me.

Fabric - Fab Squared

So we have some vCloud boxes that I setup a week back.  (See lame picture below)
My local VM would blow up when I tried to run too many Fab tasks in parallel.
So I decided to farm out the fab tasks to some dedicated VMs.
What better tool to run Fab on remote boxes than Fab itself?   :)
fab -H 10.10.10.10 -- 'cd /home/urmom/fabstuffs; fab svcs.provision:prd,standby,use1b,piapi-affiliation,1.0.8,count=2' </dev/null > b_affiliation.out 2>&1 &

Yes, that hideous command above is redirecting StdOut, StdErr and StdIn.


AWS - Evaluating instance types & quantities to meet RAM and Disk Capacity requirements

A screenshot of a spreadsheet I made:


Wednesday, October 29, 2014

Python - Revisiting Python 2.7

Tutorial
https://docs.python.org/2.7/tutorial/index.html

Python Standard Library
https://docs.python.org/2.7/library/index.html#library-index

Language Reference
https://docs.python.org/2.7/reference/index.html


Sunday, October 26, 2014

AWS - Disks and MongoDB - The shocking conclusion

Probably not so shocking.

We spun up some new primaries on aws "m3.2xl" instances.
These had enough drive capacity for our data.
Yes, we would have liked to use "i2" instance types, but they weren't available to us (dont ask).

We ran 'mongoperf' on them and here is a sample of the results:
------------------------------------------------------------------------------
EBS
  • 986 ops/sec 3 MB/sec
  • 1154 ops/sec 4 MB/sec
  • 1139 ops/sec 4 MB/sec
  • 1268 ops/sec 4 MB/sec
  • 1119 ops/sec 4 MB/sec
  • 930 ops/sec 3 MB/sec
  • 929 ops/sec 3 MB/sec
  • 1330 ops/sec 5 MB/sec
  • 1341 ops/sec 5 MB/sec
  • 946 ops/sec 3 MB/sec
  • 892 ops/sec 3 MB/sec
  • 1131 ops/sec 4 MB/sec
  • 1153 ops/sec 4 MB/sec
  • 1073 ops/sec 4 MB/sec
  • 1071 ops/sec 4 MB/sec
  • 1316 ops/sec 5 MB/sec

SSD (ephemeral)
  • 8790 ops/sec 34 MB/sec
  • 8879 ops/sec 34 MB/sec
  • 8986 ops/sec 35 MB/sec
  • 8976 ops/sec 35 MB/sec
  • 8109 ops/sec 31 MB/sec
  • 8610 ops/sec 33 MB/sec
  • 8942 ops/sec 34 MB/sec
  • 8574 ops/sec 33 MB/sec
  • 8565 ops/sec 33 MB/sec
  • 8292 ops/sec 32 MB/sec
  • 8357 ops/sec 32 MB/sec
  • 8722 ops/sec 34 MB/sec
  • 7123 ops/sec 27 MB/sec
  • 7863 ops/sec 30 MB/sec
------------------------------------------------------------------------------------------
 
So like.. 7x 8x performance.
 
Needless to say... when we ran our performance tests again, we Crushed our old timings.
With EBS, our slowest 4 API calls were taking (90th percentile):  2.5, 2.4, 2.4 and 2.4 seconds.
With SSD, our slowest 4 API calls were taking (90th percentile):  0.8, 0.8,  0.4 and 0.4 seconds.
 
 
 
Oh.. and why did we think we were I/O bound in the first place?
Cuz IOSTAT said this about our EBS:

 
 
 
And maybe i'll get around to posting some IOSTAT graphs w/the SSD measurements...
 

Tuesday, October 21, 2014

AWS - Disks and MongoDB - Part 1

So the general consensus that I'm getting is that Ephemeral Storage is generally quicker than EBS storage.

EBS volumes are accessed via network.

Good article about Mongoperf & IOStat:
http://java.dzone.com/articles/tips-check-and-improve-your

Read the conclusions here:
http://victortrac.com/ec2-ephemeral-disks-vs-ebs-volumes-in-raid.html

Note Ephemeral VS EBS comparisons:
http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html


Tools Takeaway:
mongoperf
iostat
hdparm -t
bonnie++
iozone

Monday, October 20, 2014

TODO - Read SICP

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-4.html#%_toc_start


Java - Lambda

(String first, String second)
     -> Integer.compare(first.length(), second.length())

Just 2 parts:
A lambda expression is simply:
  a block of code,
  together with the specification of any variables that must be passed to the code.

http://www.drdobbs.com/jvm/lambda-expressions-in-java-8/240166764
http://java.dzone.com/articles/why-we-need-lambda-expressions
http://java.dzone.com/articles/why-we-need-lambda-expressions-0

http://stackoverflow.com/questions/3259322/why-use-lambda-functions

Monday, October 13, 2014

Linux - Wait To Run script

Had to wait for a process to finish before I could run a cleanup job.
Got sick of waiting for process to finish so scripted a "wait loop":


while [ $(ps -ef | grep dupl |wc -l) -ne 1 ]
do
echo waiting....
sleep 30
done

duply prd-usw2a-pr-05-prdmdbshard-modb-s02-0001.prv-openclass.com purge --force

Tuesday, October 7, 2014

Linux - IOSTAT

Ended up in the weeds on this topic in a hurry....

http://www.r71.nl/index.php?option=com_content&view=article&catid=7:technical-docs&id=185:disk-queue-length-vs-disk-latency-times-which-is-best-for-measuring-database-performance&Itemid=50

http://www.pythian.com/blog/basic-io-monitoring-on-linux/

http://blog.jcole.us/2007/05/08/on-iostat-disk-latency-iohist-onward/

http://dom.as/2009/03/11/iostat/

http://www.admin-magazine.com/HPC/Articles/Monitoring-Storage-with-iostat

https://www.igvita.com/2009/06/23/measuring-optimizing-io-performance/

http://linux.die.net/man/1/iostat

Saturday, October 4, 2014

Graphite - Resources

We are going to have to scale graphite sooner or later... the following links include info on the topic:
http://www.aosabook.org/en/graphite.html
http://graphite.readthedocs.org/en/latest/carbon-daemons.html



Carbon Daemons:
  1. carbon-cache.py
  2. carbon-relay.py  - replication and sharding
  3. carbon-aggregator.py - for buffering metrics to reduce I/O (data is available later though)
 Config Files:
  1. carbon.conf
  2. relay-rules.conf - send certain metrics to certain backends (sharding)
  3. storage-schemas.conf - defines retention polices. Whisper preallocates these files.
  4. storage-aggregation.conf - defines how to aggregate data to lower-precision retentions (dflt is avg)
  5. aggregation-rules.conf - allow you to add several metrics together as the come in

Send data to graphite (actually carbon component) via:
  1. plain text
  2. pickle
  3. amqp


Wednesday, October 1, 2014

AWS - Xen Reboot

So... AWS reboot impacted us a bit.
We had to provision all our instances in 'us east 1c'
Then we had to start adding the use1c haproxies to the ELB so that traffic would start flowing into use1c.  Then we had to disconnect the use1b haproxies so traffic would stop going to use1b.
Then we just waited until AWS finished doing use1b reboots.

Next step is to bring up live nodes in both AZs.   Yay redundancy.

Friday, September 26, 2014

Puppet - GraphViz and showing Dependency Cycles

can run puppet with --graph  and it'll make a dot file.
i think it automatically makes a dot file if it detects a dependency cycle.. 
 
1) apt-get install graphviz 
2) dot -Tpng somePuppetGraph.dot > output.png
___________________________________________________________ 
 
 _______________________________________________________________________
 
References:
https://docs.puppetlabs.com/guides/faq.html#how-do-i-use-puppets-graphing-support 
http://stackoverflow.com/questions/1494492/graphviz-how-to-go-from-dot-to-a-graph

Thursday, September 25, 2014

Linux - A to Z of linux commands

http://ss64.com/bash/

Puppet - Containment

INCLUDE  !=  CONTAINMENT

Containment of CLASSES and RESOURCES are treated differently.

Classes never contain the classes they include

Classes contain the resources they declare.
Defined type instances contain the resources they declare.

The purpose of containment, in general, is to let you control where and when certain parts of your Puppet code are executed.

The anchor pattern is a MUST for all modules which may be used with Puppet 2.6.x or Puppet Enterprise 1.x.    A module author SHOULD contain the declared classes by declaring a begin and an end anchor, then relating the declared classes to these anchors.

Paraphrasing Again:
A class contains all of its resources. This means any relationships formed with the class as a whole will be extended to every resource in the class.
Classes can also contain other classes, but you must manually specify that a class should be contained.
Then there is another thing called require that I am not talking about.
Also not talking about contain which is relatively new.

http://blog.mayflower.de/4573-The-Puppet-Anchor-Pattern-in-Practice.html
http://puppetlabs.com/blog/class-containment-puppet
http://projects.puppetlabs.com/projects/puppet/wiki/Anchor_Pattern


https://docs.puppetlabs.com/puppet/latest/reference/lang_containment.html

Monday, September 22, 2014

AWS - S3

So we use Duply to back up LVM snapshots of our mongo database.
And Duply is configured to send the backups to a bucket on S3.
Well.... occasionally we'd delete a Mongo VM and then we'd end up with orphan data in S3.
Stumbled around and found a tool to view S3 easily and realized we had quite a buncha junk in there.
So wrote some python to list the files in S3 and then to delete the junk.


 CyberDuck:
http://cyberduck.ch/

Boto.S3:
http://boto.readthedocs.org/en/latest/s3_tut.html

Sunday, September 21, 2014

Linux - Disown

scp -r [email protected]:/home/sbuttrick/jenkins/jobs .
CTRL + Z       //to stop
bg                    //to background
jobs                 //to see job number
disown %n      //where n is the job number.. and yes.. it requires the weird % thing in there



AND if you had planned ahead in the first place (crazy huh?) then you should have used "nohup"

Friday, September 19, 2014

Linux - Packages

Why does what version of a package get installed on Ubuntu?

apt-get install jenkins
(installs jenkins)

dpkg -l |grep jenk
(to list what version is installed)
ii  jenkins     1.424.6+dfsg-1ubuntu0.2     Continuous Integration and Job Scheduling Server

Then head over to http://packages.ubuntu.com/
and pick your version of Ubuntu and do a SEARCH.

And hopefully what comes back is the same as what you had installed.

ALSO..........
And it gets an upstart script put in place in  /etc/init/jenkins
which reads from /etc/default/jenkins

Monday, September 15, 2014

AWS - Disk IO

Goal:
  • Test a few different instance sizes and storage configurations to gauge their performance relative to each other.
 Constraints:
  • Our provider will not support EBS Optimized instances.
Other things to keep in mind:
  • Critical to test multiple instances of same flavor (link)
  • Its important to test the Ephemeral/Instance storage because its physically attached to the machine (link) and that probably offers performance benefits.  *Of course if we use instance storage we need to  keep data safe by using a replication strategy across multiple instances and storing backups in Amazon S3.
  • R3.large & Family:  These instances are the sweet spot for your MongoDB instances. They have the right balance of memory and compute power. They are good candidates to run your larger MongoDB server. MongoDB is mainly a memory game – the more memory you supply the better it works and these instance types offer the most memory. The previous generation of these instances used to be called M2. If your MongoDB server is still getting disk bound then I would consider the High IO instances.  (link)
So here are the combinations I'd like to test:


m1.large
  • m1.large with EBS storage
  • m1.large with Ephemeral/Instance storage (412gb spindisk)
  • cost:  $.175 / hour  (link)
  • "moderate" network performance  (link)
  • *optional* ebs optimized  (link)
  • regular spinning disk  (link)

m3.large
  • m3.large with EBS storage
  • m3.large with Ephemeral/Instance storage  (32gb ssd) 
  • cost: $.140 / hour  (link)
  • "moderate" network performance  (link)
  • *NOT* ebs optimized  (link)
  • SSD disk  (link)

m3.xl
  • m3.XL with EBS storage
  • m3.large with Ephemeral/Instance storage  (80gb ssd)
  • cost: $.280 / hour
  • "high" network performance
  • SSD disk


Measurements with FIO
(doing measurements with fio)
(interpreting link)

Results:
  1. m1.large & ephemeral & read
  2. m1.large & ephemeral & write
  3. m1.large & ebs & read
  4. m1.large & ebs & write
  5. m3.large & ephemeral & read
  6. m3.large & ephemeral & write
  7. m3.large & ebs & read
  8. m3.large & ebs & write
  9. m3.xl & ephemeral & read
  10. m3.xl & ephemeral & write
  11. m3.xl & ebs & read
  12. m3.xl & ebs & write



Appendix:

ephemeral/instance storage types
  1. spinning disk
  2. ssd disk
network performance values
  1. low
  2. moderate
  3. high
  4. 10gigabit

Friday, September 12, 2014

Linux - lsof

list open files.

awesome for figuring out what files a program is using...
maybe to determine where logs are being written...
maybe to show which data files mongo is using...

lots of use cases

Sunday, September 7, 2014

Linux - mdadm adventure

Saturday night we were alerted to high load on various mongo-prod-boxes.
I ssh-ed into the machine and found a few processes that I was unfamiliar with consuming about 10% cpu.
They were  md0_raid10   and   md0_resync .

Googled around a bit.... seems to be something to do with the RAID array configured for us.
 
Used tooling to run "sudo mdadm -D /dev/md0" across various machines and found a good chunk of them with "Check Status : 93% complete" or similar percentages.

Seems that some event happened cuz these all flaked about the same time.
 
 still not sure what the root was.....
 
 
 

Friday, September 5, 2014

Puppet - setting up a Jenkins Server

Used this module to stand up a Jenkins Master and some Slaves:
https://forge.puppetlabs.com/rtyler/jenkins


Eventually hope to use this python API to attach the slaves to the master after their creation:
https://pypi.python.org/pypi/jenkinsapi

Unfortunately the Jenkins-Swarm plugin doesn't work on AWS cuz they block UDP Broadcasts.

You can also poke at the Jenkins API through your browser:
 http://someJenkinsHost:8080/api/json?pretty&depth=3


and for some reason our legacy Jenkins server has like 90 plugins... so we'll probly want to weed that down quite a bit.

Tuesday, September 2, 2014

Linux - making sense of all those attached devices

-----------------------------------------------------------------------------------------------------
intmdbshard-modb-s11-0001:~$ lsblk

NAME                       MAJ:MIN RM   SIZE RO TYPE   MOUNTPOINT

xvda1                      202:1    0     8G  0 disk   /
xvdb                       202:16   0    30G  0 disk
xvdf                       202:80   0   100G  0 disk
└─md0                        9:0    0 199.9G  0 raid10
  └─data (dm-0)            252:0    0 199.9G  0 crypt
    └─datavg-datalv (dm-1) 252:1    0 159.9G  0 lvm    /data
xvdg                       202:96   0   100G  0 disk
└─md0                        9:0    0 199.9G  0 raid10
  └─data (dm-0)            252:0    0 199.9G  0 crypt
    └─datavg-datalv (dm-1) 252:1    0 159.9G  0 lvm    /data
xvdh                       202:112  0   100G  0 disk
└─md0                        9:0    0 199.9G  0 raid10
  └─data (dm-0)            252:0    0 199.9G  0 crypt
    └─datavg-datalv (dm-1) 252:1    0 159.9G  0 lvm    /data
xvdi                       202:128  0   100G  0 disk
└─md0                        9:0    0 199.9G  0 raid10
  └─data (dm-0)            252:0    0 199.9G  0 crypt
    └─datavg-datalv (dm-1) 252:1    0 159.9G  0 lvm    /data

-------------------------------------------------------------------------------------------------

intmdbshard-modb-s11-0001:/mnt$ df -h
Filesystem                            Size     Used      Avail     Use%       Mounted on
/dev/xvda1                            7.9G     7.3G     207M     98%         /
udev                                      3.7G     8.0K      3.7G      1%          /dev
tmpfs                                     1.5G     264K     1.5G     1%           /run
none                                      5.0M        0         5.0M     0%          /run/lock
none                                      3.7G        0          3.7G      0%          /run/shm
/dev/mapper/datavg-datalv  158G      18G      133G    12%          /data
/dev/xvdb                               30G     173M      28G     1%            /testingMount


-------------------------------------------------------------------------------------------------

src:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html

Puppet - Dependency Graphs

run puppet with the  --graph  option and Puppet will create a DOT file.
View the DOT file with Omnigraffle or even LucidCharts.

https://docs.puppetlabs.com/guides/faq.html#how-do-i-use-puppets-graphing-support
http://bitfieldconsulting.com/puppet-dependency-graphs

Saturday, August 30, 2014

TODO - Play more with Sensu

Esp after reading this guys thoughts:

http://petey5king.github.io/2012/03/30/sensu-a-collectd-replacement.html

TODO - Grafana


Grafana is a frontend for Graphite

http://play.grafana.org/#/dashboard/file/default.json

http://grafana.org/blog/


TODO - ElasticSearch Marvel

I think marvel is to elasticsearch as mms is to mongodb...

http://www.elasticsearch.org/guide/en/marvel/current/

http://grey-boundary.com/elasticsearch-marvel-is-awesome/

Work - VM count and breakdown

2 person team (myself included) responsible for about 700 nodes:

  • ~200 production
  • ~200 performance
  • ~200 integration
  • ~50 test
  • ~50 dev

Production breakdown:
  • APIs
    • 14 api x 4 instance x 2 stacks = 112
    • 3 api x 8 instances x 2 stack = 48
  • Mongo
    • 8 shards x 5 replica set members  = 40
    • 3 config nodes
  • Infrastructure
    • metrics (graphite / statsd / collectd)
    • logging (logstash / elasticsearch / kibana)

Thursday, August 28, 2014

Metrics - StatsD and Graphite

Nice write up:
http://blog.pkhamre.com/2012/07/24/understanding-statsd-and-graphite/

Gritty details for Graphite:
https://graphite.readthedocs.org/en/latest/

Nice article about scaling Graphite:
http://grey-boundary.com/the-architecture-of-clustering-graphite/

A billion tools that work with Graphite:
http://graphite.readthedocs.org/en/latest/tools.html

Wednesday, August 27, 2014

Tuesday, August 26, 2014

Metrics - Beauty in Metrics

Load of 30ish mongodb servers.


Disk Time Writes for some mongo nodes in an availability zone.
These nodes were randomly assigned a time every hour to backup.
Graph shows six hours so you can see the repetition.

Hadoop - Hive and Pig and Impala

Hive and Pig seem to both be query languages for Hadoop.
Hive seems to have a more familiar SQL style language.
Hive is a framework for performing analytic queries.
Pig uses a new language (pig latin).


http://vision.cloudera.com/impala-v-hive/ 
http://www-01.ibm.com/software/data/infosphere/hadoop/hive/
http://www-01.ibm.com/software/data/infosphere/hadoop/pig/




Other notes:

Hadoop is basically 2 things - a Distributed FileSystem(HDFS) + a Computation or Processing framework(MapReduce).

HBase sits on top of HDFS. HBase provides random read/write access.
HBase similar to Google's BigTable.

Kafka.. eventing & queuing ....somewhat similar to Rabbit

Friday, August 22, 2014

Kibana - Great start for search syntax

http://www.elasticsearch.org/guide/en/kibana/current/working-with-queries-and-filters.html

Monday, August 18, 2014

TechTodo - Storm, Mesos, Chronos

Maybe someday I'll read these in full:

http://storm.incubator.apache.org/

http://mesos.apache.org/


http://airbnb.github.io/chronos/

Linux - netstat almost made my head explode

Created a newer faster metrics server cuz our old metrics server was too heavily loaded.
Updated the DNS entry to point to the new machine.
Kicked a buncha clients so they'd read the updated DNS entry.
ran  'netstat -pant' on the old metrics server and found a TON of IPs still connected to it.
scripted a reverse-dns-lookup against all the IPs.

Turns out there were a buncha hosts from OTHER teams connected to our server !
(they should not be!!!!)
Now I have to go track them all down and tell them to go away.

Linux - one of the best commands ever

sudo du -sh *

Gives you the sizes of all subdirectories.  Awesome.

Wednesday, August 13, 2014

Puppet, Hiera - Some notes on Hiera

Puppet Class Parameters priority:
  1.. use value that was explicitly set
  2.. check hiera for ClassName::ParameterName
  3.. use default value in class def
  4.. fail compilation with error

Hiera lookups get a copy of ALL variables currently available to puppet.
Hiera can use these in "interpolation tokens"

Hiera lookup functions:
  hiera - priority lookup
  hiera_array - array merge lookup
  hiera_hash - odd


src:  https://docs.puppetlabs.com/hiera/1/puppet.html#hiera-lookup-functions

MongoDB, Metrics - Taste the Rainbow


If that graph doesn't make you say "holy shit", then you just don't know what you are looking at.

Tuesday, August 5, 2014

ElasticSearch - good resources


good intro
http://joelabrahamsson.com/elasticsearch-101/

elasticsearch guide
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/index.html

elasticsearch reference
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index.html

exploring elasticsearch Beta book
http://exploringelasticsearch.com/overview.html

Tuesday, July 29, 2014

Puppet - Trials and Tribulations

So... one day all of our 'deploy service to VM' Jenkins jobs failed (about 12 of them) due to an error related to the 'sensu-puppet' module we use.

The codebase that pulls the service WARs from Nexus and configures Tomcat and the rest of the stuff on the VMs had not changed.

The sensu-puppet module itself had also not changed.

Was stumped as to what other variables there were in the equation...

Finally found out that the 'sensu-puppet' module defaulted the Sensu package version to "latest".  And someone finally got around to updating the (ubuntu) package.  And the new package didn't have a particular script that the 'sensu-puppet' module was looking for... and thus was failing.

ugly.