Wednesday, 25 May 2016

Advanced Metrics Visualization Dashboarding With Apache Ambari






by Sarang Nagmote



Category - Data Analysis
More Information & Updates Available at: http://vibranttechnologies.co.in




At Hortonworks, we work with hundreds of enterprises to ensure they get the most out of Apache Hadoop and the Hortonworks Data Platform. A critical part of making that possible is ensuring operators can quickly identify the root cause if something goes wrong.
A few weeks ago, we presented our vision for Streamlining Apache Hadoop Operations, and today we are pleased to announce the availability of Apache Ambari 2.2.2 which delivers on the first phase of this journey. With this latest update to Ambari, we can now put the most critical operational metrics in the hands of operators allowing them to:
  • Gain a better understanding of cluster health and performance metrics through advanced visualizations and pre-built dashboards.
  • Isolate critical metrics for core cluster services such as HDFS, YARN, and HBase.
  • Reduce time to troubleshoot problems, and improve the level of service for cluster tenants.

Advanced Metrics Visualization and Dashboarding

Apache Hadoop components produce a lot of metric data, and the Ambari Metrics System (introduced about a year ago as part of Ambari 2.0) provides a scalable low-latency storage system for capturing those metrics.  Understanding which metrics to look at and why takes experience and knowledge.  To help simplify this process, and be more prescriptive in choosing the right metrics to review when problems arise, Grafana (a leading graph and dashboard builder for visualizing time-series metrics) is now included with Ambari Metrics. The integration of Grafana with Ambari brings the most important metrics front-and-center. As we continue to streamline the operational experiences of HDP, operational metrics play a key role, and Grafana provides a unique value to our operators.

How It Works

Grafana is deployed, managed and pre-configured to work with the Ambari Metrics service. We are including a curated set dashboards for core HDP components, giving operators at-a-glance views of the same metrics Hortonworks Support & Engineering review when helping customers troubleshoot complex issues.
Metrics displayed on each dashboard can be filtered by time, component, and contextual information (YARN queues for example) to provide greater flexibility, granularity and context.
Ambari_Grafana

Download Ambari and Get Started

We look forward to your feedback on this phase of our journey and encourage you to visit the Hortonworks Documentation site for information on how to download and get started with Ambari. Stay tuned for updates as we continue on the journey to streamline Apache Hadoop operations.

Fixing MySQL Scalability Problems With ProxySQL or Thread Pool






by Sarang Nagmote



Category - Databases
More Information & Updates Available at: http://vibranttechnologies.co.in




In this blog post, we’ll discuss fixing MySQL scalability problems using either ProxySQL or thread pool.
In the previous post, I showed that even MySQL 5.7 in read-write workloads is not able to maintain throughput. Oracle’s recommendation to play black magic with innodb_thread_concurrency and innodb_spin_wait_delay doesn’t always help. We need a different solution to deal with this scaling problem.
All the conditions are the same as in my previous run, but I will use:
  • ProxySQL limited to 200 connections to MySQL. ProxySQL has a capability to multiplex incoming connections; with this setting, even with 1000 connections to the proxy, it will maintain only 200 connections to MySQL.
  • Percona Server with enabled thread pool, and a thread pool size of 64
You can see final results here:
Fixing MySQL scalability problems
There are good and bad sides for both solutions. With ProxySQL, there is a visible overhead on lower numbers of threads, but it keeps very stable throughput after 200 threads.
With Percona Server thread pool, there is little-to-no overhead if the number of threads is less than thread pool size, but after 200 threads it falls behind ProxySQL.
Here is the chart with response times:
I would say the correct solution depends on your setup:
  • If you already use or plan to use ProxySQL, you may use it to prevent MySQL from saturation
  • If you use Percona Server, you might consider trying to adjust the thread pool

Summary

Coupling in Distributed Systems






by Sarang Nagmote



Category - Cloud Computing
More Information & Updates Available at: http://vibranttechnologies.co.in




Coupling and cohesion are key quality indicators. We strive for highly cohesive and loosely coupled systems, but high doesnt mean pure. The same goes with functional programming, we aim for isolating and reducing side effects, but we need them unless we want a useless system. Its good to modularise our systems, so whenever those modules need to talk to each other theyll effectively couple themselves. Our work is to create cohesive modules and minimize coupling as much as possible.
Lets provide an example. Our system has the following structure:
  1. Different deployables, aka, microservices architecture.
  2. Intracommunication through Kafka (pub-sub messaging). No HTTP involved.
  3. 1 Producer to N Consumers scenarios.
  4. JSON for data serialization.
The messages that are published and consumed in this system have a schema and its our choice making it implicit or explicit and validating that schema at compile or runtime execution. Before analysing the trade-offs of every approach lets say some words about compile vs runtime approaches.

Proofing Software Correctness as Soon as Possible

Ive been a user of statically typed languages most of my career, so Im really biased with this topic. I strongly believe in Lean concepts as the importance of minimizing waste. At the same time, I love the therapeutic ideas behind Agile, TDD, or BDD about exposing the truth as soon as possible. Static types, and in the end the compiler, help me to achieve those goals.
I would prefer spending my time creating tests under the motivations of providing living documentation, easing future refactors, or helping me to drive the design, more than helping to catch bugs that the type system should take care of. Writing a test that checks the behaviour of a method when receiving null is a waste of time if we can make it impossible to write a line of code that passes a null.
The compile world is not perfect though, as its definitively slower on development and constrains developers (someone could say that less freedom might be a nice-to-have in this context).

Runtime Approaches

Now that Ive been honest with you about my compile bias, I can explain different approaches and trade-offs for the schema validation problem.

Implicit Schemas

The first runtime approach is the loosest one: using implicit schemas and trusting in the good will of producers. As nobody is checking the validity of messages before being published into Kafka that means that consumers could blow up.
The first corrective measure is assuring that only the processing of the poisoned message will blow and not the whole consumer. An example of that would be providing a resume supervision strategy on Akka Streams when the message doesnt hold the expected implicit schema.
The second corrective measure would be not simply swallowing those crashes but being able to communicate them to proper actors (being humans or software). A good practice is to provide dead letter queues for poisoned messages just in case we want to manipulate and retry the processing of those messages at that level.
Before getting into explicit schemas, I would say that those measures are usually not enough but they are a good safety net, as shit happens, and we need to be prepared.

Explicit Schemas

If we want to avoid poisoned messages getting into our topics we could provide a middle-man service to intercept and validate explicit schemas. Schema registry is an example of that for Kafka, and its documentation is full of insights about how to implement that in a distributed, highly available, and scalable way.
Thats an integration service that could be a single point of failure but, at the same time, it could be valuable to have a centralized repo of schemas when we have a lot of consumers and the complexity of the system would be hard to grasp in a decentralized fashion. That service will be stateless, so in order to avoid single point of failures, we could make it redundant in a farm of services to allow high availability.

Compile Approaches

The last approach would be creating software that makes it impossible to create messages that do not hold the expected schema. Assuming Scala, we could create a jar that contains case classes that are object materializations of a schema.
What are the benefits of this approach?
  1. Fail early. We dont have to wait until testing or production to verify that the messages published by some producer are correct.
  2. Centralized knowledge.
What is the solvable problem?
  • Cascade updates. If our microservices live in different repos, then we need to make sure that updates into that common binary are applied into producer and consumers. Thats cumbersome and if its not done could generate unexpected bugs as we introduced a false sense of security with that library. That could be solved using a monorepo.
What is the biggest problem?
  • Breaking isolation of deployables. One of the points of microservices is being able to deploy its services independently. If youre forced to redeploy N consumer services every time you upgrade the consumer with a non-backward compatible change of the schema library then youre losing that perk. Being able to do small releases is a big enabler of Continuous Delivery, so its a remarkable loss.
You could argue that only non-backward compatible changes force a redeploy of consumers and that we should design our schemas in a way that predicts and minimizes those kinds of changes.

Generalizing Coupling Problem

If we generalize the problem, well see that there are two kinds of coupling: avoidable and mandatory.
Avoidable coupling comes when we strive for reducing duplication in our codebase. Lets say that we want to extract some requestId from the header of a HTTP request and put it into some MDC in order to be able to trace logs across different threads or services. That code will hardly vary from service to service so its a good candidate to be extracted and therefore adding some coupling between services. Before doing that, its good to think about the following:
  1. Coupling is the enemy of microservices and its effects in the future are not easily visible.
  2. Following Conways law, breaking isolation of your services breaks the isolation of your teams, so be sure that your organization is able to cope with that level of communication and integration.
  3. The key measure is the rate change. A library that is going to be constantly updated (as could be your schema library) will be more painful to manage as a common dependency than some fairly static library.
Mandatory coupling comes when some info needs to reside in a third entity as it doesnt make sense to be held by one of the integration entities or its not worthy to share and duplicate that info into every single entity.

Conclusion

Even if I am a strong supporter of compiled languages, I think that sharing code through binaries in a distributed environment deserves a deep analysis of the structure and needs of your system. I hope that this post has provided some valuable insights into this topic.

Running a Multi-Container Application Using Docker Compose






by Sarang Nagmote



Category - Enterprise Integration
More Information & Updates Available at: http://vibranttechnologies.co.in




Running an application typically involves multiple processes whether that’s just an application server and database or multiple service implementations in a microservice architecture. Docker Compose provides a quick and easy mechanism to build, deploy, and run your application, whether locally on your system for development purposes or on a platform such as Docker Datacenter for production.
For the purposes of this example we have taken Docker’s example voting application and modified it so that the voting and results web applications are Java applications running on WebSphere Liberty. You can clone the updated application from GitHub and, assuming that you have Docker and Docker Compose installed, run it using the following simple commands:
git clone https://github.com/kavisuresh/example-voting-app.gitcd example-voting-appdocker-compose up
The following video gives an introduction to the example voting application and how it can be run using Docker Compose:

The following video talks about Docker Datacenter configuration and how the application can be run in Docker Datacenter using Docker Compose:

Unconventional Logging: Game Development and Uncovering User Behaviour






by Sarang Nagmote



Category - Website Development
More Information & Updates Available at: http://vibranttechnologies.co.in




Not long ago, I was given the chance to speak with Loggly’s Sven Dummer about the importance of logging for game development. However, I got more a lot more than just that… Sven actually gave me a comprehensive tour of Loggly via screenshare, telling me a bit about the basics of logging—its purpose and how it’s done—and what particular tools Loggly offers up to make it easier for those trying to sort through and make sense of the endless haystack of data that logs serve up. And after my crash course in logging and Loggly, Sven did indeed deliver a special use-case for logging that was particular to game development, though I think it can be applied creatively elsewhere, as well.
First off, let me recap a bit of information that Sven provided about logging and log management. If you want to skip ahead to the game dev use-case, feel free.

Crash Course: My Experience With Logging and Loggly

Upon sharing his screen with me, Sven first took me to the command line of his Mac to illustrate just how much information logging generates. He entered a command revealing a list of all the processes currently happening on his laptop and, as you’ve probably already guessed, there was a lot of information to show. Data spat out onto the page in chunks, and I quickly became overwhelmed by the velocity and disorganization of words and numbers perpetually scrolling onto the screen. This information—some of it obviously very useful to those that know what they’re looking for—was delivered piece by piece very quickly. The format of the data was “pretty cryptic to people like you and me” as Sven put it, but what we were looking at was comparatively easy compared to the data formats of some logs.
And that’s just it, there is no standard format for log data. It can come in a variety of file types and is displayed differently depending on the type. In Sven’s words:
“Every application or component typically writes its own log data in its own log file, and there is no one standardized format, so these log files can look very different. So, if you want to make sense of the data, you have to be somewhat familiar with the formats.”
And continuing on to explain how this can become even more difficult to manage when pulling data from a multitude of sources, Sven gave this example:
“Let’s imagine you’re running a large complex web application… you’re in a business that’s running a webstore. In that case, you might have a very complicated setup with a couple of databases, a web server, a Java application doing some of your business logic—so you have multiple servers with multiple components, which all do something that basically makes up your application. And so, if something goes wrong, then the best way to trace things down is in the log data. But you have all these different components generating different log files in different formats. And, if your application is somewhat complex and you have 25 different servers, they all write the log data locally to the hard drive so you can imagine that troubleshooting that can become quite difficult.”
He continued on to explain how a log management tool like Loggly can gather together these many different logs (it supports parsing of many different formats out of the box) and display them in a unified format—not only to make the information more presentable, but also to securely provide access to an entire team:
“Let’s say you have an operations team and these folks are tasked with making sure that your system is up and running. If they were supposed to look at the log data on all these individual servers and on all these components, they would have to know how to: 1. log into those servers, 2. be able to reach the servers inside your network, 3. have credentials there, 4. know where the log files reside; and then, they would still be looking at all of these individual log files without getting the big picture.
However, instead, they could send all of their logs to Loggly, locating them in one place. This not only allows for a cohesive picture of all the logs that make up a web application [for example], but it also removes the need for everyone on your operations team to log into every single server in order to get access to the logs, which is important from a security perspective. Because, you don’t necessarily want everybody to be able to have administrative privileges to all these different servers.”
At this point, Sven launched Loggly and it was a complete sea-change. Rather than a black terminal window overflowing with indistinguishable blobs of text, the interface proved to be much more organized and user-friendly. With Loggly, Sven showed me how to search for particular logs, filter out unwanted messages, drill down to a specific event and grab surrounding logs, and display information in one standardized flow, so that it was much easier for the viewer to sort, scan, and find what he/she is after. He also pointed out how one might automate the system to track and deliver specific error messages (or other information) to corresponding team members who that information is best suited for. Through Loggly’s available integrations, one might have this information delivered via a specific medium, like Slack or Hipchat, so that team members receive these notifications in real time and can act in the moment. Honestly, there were so many features available that Sven showed to me, I don’t think I can cover them all in this post—if you want to see more about the features, take a look around this page for a while and explore the tutorial section.
Image title
Loggly integrated with Hipchat.

One thing I remember saying to Sven is that the command line view of logs looks a bit like the familiar green-tinged code lines that spastically scatter across the monitors in The Matrix, endlessly ticking out information on and on and on… and on. He was quick to point out that Loggly still provided a command line-esque views of logs via their Live Tail feature, but with more control. I highly recommend checking it out.
Image title
What logs look like to the untrained eye.
Image title
Loggly Live Tail in action... running in an OS X Terminal and on Windows PowerShell.

The Importance of Logging for Game Development

So, typically one might use logging to discover performance bottlenecks, disruptions in a system, or various other disturbances which can be improved after some careful detective work. However, when looked at through another lens, logs can offer up much more interesting information, revealing user behavior patterns that can be analyzed to improve a game’s design and creative direction. Let me take you through a couple of examples that Sven offered up which shed light on how one might use logs to uncover interesting conclusions.
Sven launched a Star Fox-esque flying game in his browser (a game written in Unity, a format which Loggly supports out of the box) and began guiding his spaceship through rings that were floating in the air. It was pretty basic, the point being to make it through each ring without crashing into the edges.
Image title
This is an image of Logglys demo game... not Star Fox!

While flying, he opened up Live Tail and showed me the logs coming in at near real-time (he explained there was a very small network delay). Back in the game, he began switching camera angles, and I could see a corresponding log event every time he triggered the command. This is where it gets interesting…
“The camera changes are being recorded in the log and I can see them here. Now, this is interesting because it will also tell me from which IP address they’re coming. And this is just a very simple demo game, but I could also log the ID of the user for example and many more things to create a user behaviour profile. And, that is very interesting because it helps me to improve my game based on what users do.
For example, let’s say I find out that users always change the camera angle right before they fly through ring five, and perhaps a lot of users fail when they fly through this ring. And, maybe that’s a source of customer frustration… maybe they bounce the site when they reach that point. And maybe then, I realize that people are changing the camera because there’s a cloud in the way that blocks the view of ring five. Then I can tell my design team you know, maybe we should take the cloud out at this point. Or, we can tell the creative team to redesign the cloud and make it transparent. So, now we’re getting into a completely different area other than just IT operations here. When you can track user behavior you can use it to improve things like visuals and design in the game.”
Image title
Logs gathered from the change in camera angles within the demo game.

I found this idea fascinating, and Sven continued on describing a conversation he had with unnamed Mobile game publisher who recently used Loggly in a similar way…
“We showed this demo at GDC and there was actually somebody who visited our booth who I had a very interesting conversation with on this topic. Basically, they told me that they develop mobile games for smart phones and had plans to introduce a new character in one of their games for gamers to interact with. Their creative team had come up with 6 or 7 different visuals for characters, so their idea was to do simple A/B testing and find out which of these characters resonated best with their users through gathering and studying their logs in Loggly. Then they planned on scrapping the character models that didn’t do well.
However, when they did their AB testing they got a result that they were not at all prepared for. There was no distinctive winner, but the regional differences were huge. So, people in Europe vs. Asia vs. America—there was no one winner, but rather there were clear winners by region. So, that was something that they were not expecting nor prepared for. And they told me that they actually reprioritized their road map and the work that they had planned for their development team and decided to redesign the architecture of their game so that it would support serving different players different characters based on the region that they were in. They realized, they could be significantly more successful and have a bigger gaming audience if they designed it this way.”
Once again, I found this extremely interesting. The idea of using logs creatively to uncover user behavior was something completely novel to me. But apparently, user behavior is not the only type of behavior that can be uncovered.
“Just recently there was an article on Search Engine Land where an SEO expert explained how to answer some major questions about search engine optimization by using Loggly and log data. Again, a completely different area where someone is analyzing not user behavior but, in this case, search engine robots—when do they come to the website, are they blocked by something, do I see activity on webpages where I don’t want these search robots to be active? So, I think you get the idea, really these logs can be gathered and used for any sort of analysis.”
And, that’s basically the idea. By pulling in such vast amounts of data, each offering its own clues as to what is going on in an application, log management tools like Loggly act as kind of magnifying glass for uncovering meaningful conclusions. And what does meaningful mean? Well, it’s not limited to performance bottlenecks and operational business concerns, but can actually provide genuine insights into user behavior or creative decision-making based on analytics.

Effective and Faster Debugging With Conditional Breakpoints






by Sarang Nagmote



Category - Developer
More Information & Updates Available at: http://vibranttechnologies.co.in




In order to find and resolve defects that prevent correct operation of our code, we mostly use debugging process. Through this process, in a sense, we "tease out" the code and observe the values of variables in runtime. Sometimes, this life-saving process can be time consuming.
Today, most IDEs and even browsers make debugging possible. With the effective use of these tools we can make the debugging process faster and easier.
Below, I want to share some methods that help us make the debugging process fast and effective. You will see Eclipse IDE and Chrome browser samples, but you can implement these methods to other IDEs and browsers, too.
In order to debug our Java code in Eclipse, we put a breakpoint to the line which we want to observe:
Image title
When we run our code in debug mode, the execution of code suspends in every iteration of line which we put breakpoint on. We can also observe the instant values of variables, when the exceution suspends.
Image title
When we know the reason of a defect, instead of observing the instant values of variables in every iteration, we can just specify the condition in the properties of breakpoint. This makes the execution suspended only when the condition  is met. By this way, we can observe the condition we expected quickly:
Image title
Image title
Image title
With the help of this property, we can even run any code when the execution passes the breakpoint, without suspending the execution.
Image title
We can also, change the instant value of variables when the execution passes the breakpoint. So, we can prevent the case which makes a code throw exception.
Image title
With the help of this property, we can also throw any exception from the breakpoint. By this way, it is possible to observe the handling of a rare exception.
Image title
It is possible to debug this in Chrome, too. This time, we will debug our Javascript code. To do this, we can press F12 or open the "Sources" menu under Tools>Developer Tools menu and select the code which we want to debug and add breakpoint. After that, we can also specify a condition to suspend the execution only when the condition meets.
Image title

Swiftenv: Swift Version Manager






by Sarang Nagmote



Category - Mobile Apps Development
More Information & Updates Available at: http://vibranttechnologies.co.in




Swift 3 development is so fast at the moment, that a new development snapshot is coming out every couple of weeks. To manage this, Kyle Fuller has rather helpfully written swiftenv which works on both OS X and Linux.
Once installed, usage is really simple. To install a new snapshot:

swiftenv install {version}

Where {version} is something like: DEVELOPMENT-SNAPSHOT-2016-05-09-a, though you can also use the full URL from the swift.org download page.
The really useful feature of swiftenv is that you can set the swift version on per-project basis. As change is so fast, projects are usually a version or so behind. e.g. at the time of writing, Kituras current release (0.12.0) works withDEVELOPMENT-SNAPSHOT-2016-04-25-a.
We register a project specific Swift version using:

swiftenv local {version}
i.e. For Kitura 0.12: swiftenv local DEVELOPMENT-SNAPSHOT-2016-04-25-a
Nice and easy!

The Top 100 Java Libraries in 2016 After Analyzing 47,251 Dependencies






by Sarang Nagmote



Category - Programming
More Information & Updates Available at: http://vibranttechnologies.co.in




Top 100 Java Libraries
Who’s on top and who’s left behind? We analyzed 47,251 dependencies on GitHub and pulled out the top 100 Java libraries
Our favorite pastime for long weekends is to go through GitHub and search for popular Java libraries. We decided to share the fun and the information with you.
We analyzed 47,251 import statements of 12,059 unique Java libraries that are used by the top 3,862 Java projects on GitHub. From that list we extracted the top 100, and now we’re sharing the results. Cue the drum roll.

The Top 20 Most Popular Java Libraries

Top 20 Libraries
Holding the crown from last year, junit is the most popular Java library on GitHub. While the Java logging API slf4j reached second place, log4j reached the fourth place.
A rising trend in the list is Google’s open-source Guava, that reached third place. It contains a range of core Java libraries that were born internally at Google. If you’re not familiar with Guava or if you’re not sure how to use it, you can read our post about some of the lesser known features of Google Guava that every developer should know.

The Rise of Spring Libraries

The Spring framework became popular in the Java community as a main competitor to Java EE, and this popularity is also reflected in GitHub; out of the 100 most popular libraries, 44 are Spring related. The most interesting part here is the meteoric rise of Spring Boot, that allows developers to create Spring-powered applications and services with minimum boilerplate. Do you want to get a production ready Java application off the ground in the shortest time possible? Check out our post about Java Bootstrap: Dropwizard vs. Spring Boot.
Top Spring Libraries
#13 – springframework.spring-context
 #17 – springframework.spring-test
 #22 – springframework.spring-webmvc
 #24 – springframework.spring-core
 #27 – springframework.spring-web
 #36 – springframework.spring-jdbc
 #37 – springframework.spring-orm
 #38 – springframework.spring-tx
 #40 – springframework.spring-aop
 #47 – springframework.spring-context-support
 #72 – springframework.boot.spring-boot-starter-web
 #81 – springframework.security.spring-security-web
 #82 – springframework.security.spring-security-config
 #88 – springframework.boot.spring-boot-starter-test
 #99 – springframework.security.spring-security-core

The Most Popular JSON Libraries

Since Java doesn’t have native support for JSON (although it almost made it into Java 9!), we wanted to see how popular these libraries are among GitHub projects.
You shouldn’t judge a library by its cover. Not all JSON libraries perform the same, and picking the right one for your environment can be critical. If you want to know which one you should use, check out our latest JSON benchmark.
The Top JSON Libraries Are…
#14 – fasterxml.jackson.core.jackson-databind
 #19 – google.code.gson.gson
 #43 – json.json
 #80 – googlecode.json-simple.json-simple
 #89 – thoughtworks.xstream.xstream

The Fantastic 4 (That’s Worth Mentioning)

There are plenty of interesting and even new libraries that caught our attention, but we decided to focus on these 4:
#68 – projectlombok.lombok – This project aims to reduce boilerplate in Java, replacing some of the worst offenders with a simple set of annotations.
#90 – jsoup.jsoup – A Java library for working with real-world HTML. It provides an API for extracting and manipulating data using DOM manipulation, CSS and jquery-like methods.
#92 – io.netty.netty-all – A network application framework for quick and easy development of maintainable high-performance protocol servers & clients.
#98 – dom4j.dom4j – Open source framework for processing XML. It’s integrated with XPath and offers full support for DOM, JAXP and Java platform.

Top 100 Libraries by Type

Top Types

The Math Behind the Magic (or: How Did We Come Up With Our List)

You’re probably asking yourself how did we get this information. We first pulled out the top Java projects from GitHub by their ratings. We took that data and extracted the projects who use Maven or Ivy for dependency management to gain quick access to their pom.xml / ivy.xml dependencies, this left us with 47,251 data points.
We did some mad crunching and analyzing, which left us with 12,059 unique Java libraries that are used by the top 3,862 Java projects on GitHub. From there it was easy to get the top 100 libraries, based on the number of GitHub projects they appear in.
If you want to look into our raw data, the file is available here. Although we were sober this time around, you’re still welcome to take a look and make sure we didn’t miss any interesting insight.

Final Thoughts

When we compare our current Top 100 list vs last year’s results, we can detect some movements among smaller libraries, with a rising interest in Spring and the depart of MongoDB.
However, when looking at the majority of Java developers they’re pretty consistent when it comes to their choice of GitHub libraries. It’s not a big surprise, considering the amount of existing projects using these libraries that will keep on using them through 2017 and beyond.
If you already have your choice of libraries but you’re still looking for the ultimate tools, we have the perfect advice for you. Check out The Top 15 Tools Java Developers Use After Major Releases.

Tuesday, 24 May 2016

State of Global Wealth Management—Ripe for Technology Disruption






by Sarang Nagmote



Category - Data Analysis
More Information & Updates Available at: http://vibranttechnologies.co.in




“If (wealth management advisors) continue to work the way you have been, you may not be in business in five years” – Industry leader Joe Duran, 2015 TD Ameritrade Wealth Advisor Conference.
The wealth management segment is a potential high growth business for any financial institution. It is the highest customer touch segment of banking and is fostered on long term and extremely lucrative advisory relationships. It is also the ripest segment for disruption due to a clear shift in client preferences and expectations for their financial future. This three-part series explores the industry trends, business use cases mapped to technology and architecture and disruptive themes and strategies.

Introduction to Wealth Management 

There is no one universally accepted definition of wealth management as it broadly refers to an aggregation of financial services. These include financial advisory, personal investment management and planning disciplines directly for the benefit of high-net-worth (HNW) clients. But wealth management has also become a highly popular branding term that advisors of many different kinds increasingly adopt. Thus this term now refers to a wide range of possible functions and business models.
Trends related to shifting client demographics, evolving expectations from HNW clients regarding their needs (including driving social impact), technology and disruptive competition are converging. New challenges and paradigms are afoot in the wealth management space, but on the other side of the coin, so is a lot of opportunity.
A wealth manager is a specialized financial advisor who helps a client construct an entire investment portfolio and advises on how to prepare for present and future financial needs. The investment portion of wealth management normally entails both asset allocation of a whole portfolio as well as the selection of individual investments. The planning function of wealth management often incorporates tax planning around the investment portfolio as well as estate planning for individuals as well as family estates.
There is no trade certification for a wealth manager. Several titles are commonly used such as advisors, family office representatives, private bankers, etc. Most of these professionals are certified CFPs, CPAs and MBAs as well. Legal professionals are also sometimes seen augmenting their legal expertise with these certifications.

State of Global Wealth Management 

Private banking services are delivered to high net worth individuals (HNWI). These are the wealthiest clients that demand the highest levels of service and more customized product offerings than are provided to regular clients. Typically, wealth management is a subsidiary of a larger investment or retail banking conglomerate. Private banking also includes other services like estate planning and tax planning as we will see in a few paragraphs.
The World Wealth Report for 2015 was published jointly by Royal Bank of Scotland (RBS) and CapGemini. Notable highlights from the report include:
  1. Nearly 1 million people in the world attained millionaire status in 2014
  2. The collective investible assets of the world’s HNWI totaled $56 trillion
  3. By 2017, the total assets under management for global HNWIs will rise beyond $70 trillion
  4. Asia Pacific has the world’s highest number of millionaires with both India and China posting the highest rates of growth respectively
  5. Asia Pacific also led the world in the increase in HNWI assets at 8.5%. North America was a close second at 8.3%. Both regions surpassed their five year growth rates for high net worth wealth
  6. Equities were the most preferred investment vehicle for global HNWI with cash deposits, real estate and other alternative investments forming the rest
  7. The HNWI population is also highly credit friendly
Asia Pacific is gradually becoming the financial center of the world—a fact that has not gone unnoticed among the banking community. Thus banks need to as a general rule get more global and focused on non-traditional markets (North America and Western Europe).
The report also makes the point that despite the rich getting richer, global growth this year was more modest compered to previous years with a slowdown of 50% in the production of new HNWIs. This slower pace of growth now means that firms need to move to a more relationship centric model, specifically among highly coveted segment: younger investors. The report stresses that currently wealth managers are not able to serve the different needs of HNW clients under the age of 45 from both a mindset, business offering and technology capability standpoint.
The Broad Areas of Wealth Management

Areas Wealth Management SchematicThe Components of Wealth Management Businesses

As depicted above, full service wealth management firms broadly provide services in the following areas:
  • Investment Advisory
  • A wealth manager is a personal financial advisor who helps a client construct an investment portfolio that helps prepare for life changes based on their respective risk appetites and time horizons. The financial instruments invested in range from the mundane (equities, bonds, etc.) to the arcane (hedging derivatives, etc.)
  • Retirement Planning
  • Retirement planning is an obvious function of a client’s personal financial journey. From a HNWI standpoint, there is a need to provide complex retirement services while balancing taxes, income needs, estate prevention and the like.
  • Estate Planning Services
  • A key function of wealth management is to help clients pass on their assets via inheritance. Wealth managers help construct wills that leverage trusts and forms of insurance to help facilitate smooth inheritance.
  • Tax Planning
  • Wealth managers help clients manage their wealth in such a manner that tax impacts are reduced from a taxation perspective (e.g., the IRS in the US). As the pools of wealth increase, even small rates of taxation can have a magnified impact either way. The ability to achieve the right mix of investments from a tax perspective is a key capability.
  • Full Service Investment Banking
  • For sophisticated institutional clients, the ability to offer a raft of investment banking services is an extremely attractive capability.
  • Insurance Management
  • A wealth manager needs to be well versed in the kinds of insurance purchased by their HNWI clients so that the appropriate hedging services can be put in place.
  • Institutional Investments
  • Some wealth managers cater to institutional investors like pension funds and hedge funds and offer a variety of back office functions.
    It is to be noted that the wealth manager is not necessarily an expert in all of these areas but rather works well with the different areas of an investment firm from a planning, tax and legal perspective to ensure that their clients are able to accomplish the best outcomes.
    Client Preferences and Trends
    There are clear changing preferences on behalf of the HNWI, including:
    1. While older clients gave strong satisfaction scores to their existing wealth managers, the younger client’s needs are largely being missed by the wealth management community.
    2. Regulatory and cost pressures are rising leading to commodification of services.
    3. Innovative automation and usage techniques of data assets among new entrants (aka the FinTechs) are leading to the rise of “roboadvisor” services which have already begun disrupting existing players in a massive manner in certain HNWI segments.
    4. A need to offer holistic financial services tailored to the behavioral needs of the HNWI investors.

    Technology Trends

    The ability to sign up new accounts and offer them services spanning the above areas provides growth in the wealth management business. There has been a perception that wealth management as a sub sector has trailed other areas within banking from a technology and digitization standpoint. As with wider banking organizations, the wealth management business has been under significant pressure from the perspective of technology and the astounding pace of innovation seen over the last few years from a cloud, Big Data and open source standpoint. Here are a few trends to keep an eye on:
  • The Need for the Digitized Wealth Office
  • The younger HNWI clients (defined as under 45) use mobile technology as a way of interacting with their advisors. They demand a seamless experience across all of the above services using digital channels—a huge issue for established players as their core technology is still years behind providing a Web 2.0 like experience. The vast majority of applications are still separately managed with distinct user experiences ranging from client onboarding to servicing to transaction management. There is a crying need for IT infrastructure modernization ranging across the industry from cloud computing to Big Data to micro-services to agile cultures promoting techniques such as a DevOps approach.
  • The Need for Open and Smart Data Architecture
  • Siloed functions have led to siloed data architectures operating on custom built legacy applications. All of which inhibit the applications from using data in a manner that constantly and positively impacts the client experience. There is clearly a need to have an integrated digital experience both regionally and globally and to do more with existing data assets. Current players possess a huge first mover advantage as they offer highly established financial products across their large (and largely loyal and sticky) customer bases, a wide networks of physical locations, and rich troves of data that pertain to customer accounts and demographic information. However, it is not enough to just possess the data. They must be able to drive change through legacy thinking and infrastructures as things change around the entire industry as it struggles to adapt to a major new segment (millennial customers) who increasingly use mobile devices and demand more contextual services and a seamless and highly analytic-driven, unified banking experience—an experience akin to what consumers commonly experience via the Internet on web properties like Facebook, Amazon, Google, Yahoo and the like.
  • Demands for Increased Automation
  • The need to forge a closer banker/client experience is not just driving demand around data silos and streams themselves. It’s forcing players to move away from paper based models to a more seamless, digital and highly automated model to rework countless existing back and front office processes—the weakest link in the chain. While “Automation 1.0” focuses on digitizing processes, rules and workflow; “Automation 2.0” implies strong predictive modeling capabilities working at large scale—systems that constantly learn and optimize products & services based on client needs and preferences.
  • The Need to “Right-size” or Change Existing Business Models Based on Client Tastes and Feedback
  • The clear ongoing theme in the wealth management space is constant innovation. Firms need to ask themselves if they are offering the right products that cater to an increasingly affluent yet dynamic clientele.

    Conclusion

    The next post in this series will focus on the business lifecycle of wealth management. We will begin by describing granular use cases across the entire lifecycle from a business standpoint, and we’ll then examine the pivotal role of Big Data enabled architectures along with a new age reference architecture.
    In the third and final post in this series, we round off the discussion with an examination of strategic business recommendations for wealth management firms—recommendations which I will believe will drive immense business benefits by delivering innovative offerings and ultimately a superior customer experience.

    A Procedure for the SLM Clustering Algorithm






    by Sarang Nagmote



    Category - Databases
    More Information & Updates Available at: http://vibranttechnologies.co.in




    In the middle of last year, I blogged about the Smart Local Moving algorithm which is used for community detection in networks and with the upcoming introduction of procedures in Neo4j I thought it’d be fun to make that code accessible as one.
    If you want to grab the code and follow along it’s sitting on the SLM repository on my GitHub.
    At the moment, the procedure is hardcoded to work with a KNOWS relationship between two nodes but that could easily be changed.
    To check that it’s working correctly, I thought it’d make most sense to use the Karate Club data set described on the SLM home page. I think this data set is originally from Networks, Crowds, and Markets.
    I wrote the following LOAD CSV script to create the graph in Neo4j:
    LOAD CSV FROM "file:///Users/markneedham/projects/slm/karate_club_network.txt" as rowFIELDTERMINATOR " "MERGE (person1:Person {id: row[0]})MERGE (person2:Person {id: row[1]})MERGE (person1)-[:KNOWS]->(person2)

    Graph

    Next, we need to call the procedure which will add an appropriate label to each node depending which community it belongs to. This is what the procedure code looks like:
    public class ClusterAllTheThings{ @Context public org.neo4j.graphdb.GraphDatabaseService db; @Procedure @PerformsWrites public Stream<Cluster> knows() throws IOException { String query = "MATCH (person1:Person)-[r:KNOWS]->(person2:Person) " + "RETURN person1.id AS p1, person2.id AS p2, toFloat(1) AS weight"; Result rows = db.execute( query ); ModularityOptimizer.ModularityFunction modularityFunction = ModularityOptimizer.ModularityFunction.Standard; Network network = Network.create( modularityFunction, rows ); double resolution = 1.0; int nRandomStarts = 1; int nIterations = 10; long randomSeed = 0; double modularity; Random random = new Random( randomSeed ); double resolution2 = modularityFunction.resolution( resolution, network ); Map<Integer, Node> cluster = new HashMap<>(); double maxModularity = Double.NEGATIVE_INFINITY; for ( int randomStart = 0; randomStart < nRandomStarts; randomStart++ ) { network.initSingletonClusters(); int iteration = 0; do { network.runSmartLocalMovingAlgorithm( resolution2, random ); iteration++; modularity = network.calcQualityFunction( resolution2 ); } while ( (iteration < nIterations) ); if ( modularity > maxModularity ) { network.orderClustersByNNodes(); cluster = network.getNodes(); maxModularity = modularity; } } for ( Map.Entry<Integer, Node> entry : cluster.entrySet() ) { Map<String, Object> params = new HashMap<>(); params.put("userId", String.valueOf(entry.getKey())); db.execute("MATCH (person:Person {id: {userId}}) " + "SET person:`" + (format( "Community-%d`", entry.getValue().getCluster() )), params); } return cluster .entrySet() .stream() .map( ( entry ) -> new Cluster( entry.getKey(), entry.getValue().getCluster() ) ); } public static class Cluster { public long id; public long clusterId; public Cluster( int id, int clusterId ) { this.id = id; this.clusterId = clusterId; } }}
    I’ve hardcoded some parameters to use defaults which could be exposed through the procedure to allow more control if necessary. The Network#create function assumes it is going to receive a stream of rows containing columns ‘p1’, ‘p2’, and ‘weight’ to represent the ‘source’, ‘destination’, and ‘weight’ of the relationship between them.
    We call the procedure like this:
    CALL org.neo4j.slm.knows()
    It will return each of the nodes and the cluster it’s been assigned to and if we then visualize the network in the Neo4j browser we’ll see this:

    Graph  1

    Which is similar to the visualisation from the SLM home page:
    Image title
    If you want to play around with the code feel free. You’ll need to run the following commands to create the JAR for the plugin and deploy it.
    $ mvn clean package $ cp target/slm-1.0.jar /path/to/neo4j/plugins/ $ ./path/to/neo4j/bin/neo4j restart
    And, you’ll need the latest milestone of Neo4j which has procedures enabled.

    Using Hybrid Clouds for App Migration






    by Sarang Nagmote



    Category - Cloud Computing
    More Information & Updates Available at: http://vibranttechnologies.co.in




    This post is part 3 in a series on how companies use the hybrid cloud to solve real-world problems. Part 1 covers using hybrid clouds to add data center capacity, while part 2 addresses leveraging hybrid clouds to incorporate cloud-based functionality. 
    For many companies, the goal isn’t to share their applications across both their own data center and the public cloud. Rather, they want to move some of their applications lock, stock, and barrel to the cloud. If some of the company’s apps live in the cloud while others remain in the on-premise data center, then intentionally or not, these companies also have hybrid clouds.
    The process of migrating entire apps to the cloud, virtually unchanged, is sometimes called “Lift and Shift.” In some cases, companies lift and shift entire applications to the cloud pretty much as they are, while others may re-architect those applications to make them better cloud citizens, or to make greater use of cloud features.
    hybrid cloud iconApp migration is often part of the process of outsourcing as much of a company’s data center infrastructure as possible. Many of these migrations are in process at companies of all sizes, and most companies choose to migrate some applications but not others. Typically, “internal only” apps are migrated first, while the big, clunky mainframe apps are the last to be moved—and often they never make the transition.
    In fact, many companies stop their app-migration process after moving only some of their applications to the cloud, usually for some business or technical reason. They may find that the cost/benefit ratio for moving some applications, such as older or “problematic” applications, is not worth the effort. This creates an ongoing hybrid cloud architecture.

    Monitoring Challenges for App Migration

    You need a solid monitoring story to understand how your application works both before and after an app migration. That’s because you need to compare your application’s performance before the migration and after the migration. Variations in performance between the two could indicate a problem, or a need for further tuning and refinement in order for the application to function successfully in the cloud.
    In order to monitor the results before and after migration, you need to use the same monitoring tools in both environments or the comparison may not be meaningful. This implies using a monitoring tool that works in the cloud and on-premise.
    Even if you plan on completing the “lift and shift” maneuver and move 100% to the cloud, it is important that your monitoring solution work with your entire infrastructure, including both on-premise and cloud infrastructure components, during the migration itself. Depending on the size, complexity, and number of applications in question, that process could take months or even years.
    Want the opportunity to learn more about the hybrid cloud? Be sure to check out this recording of Lee’s super-informative webinar on Monitoring the Hybrid Cloud: How do you measure and make decisions across on-premises data centers, dynamic clouds, and hybrid clouds?

    Kotlin Meets Gradle






    by Sarang Nagmote



    Category - Enterprise Integration
    More Information & Updates Available at: http://vibranttechnologies.co.in




    Many readers will be familiar with JetBrains’ excellent Kotlin programming language. It’s been under development since 2010, had its first public release in 2012, and went 1.0 GA earlier this year.
    We’ve been watching Kotlin over the years, and have been increasingly impressed with what the language has to offer, as well as with its considerable uptake—particularly in the Android community.
    Late last year, Hans sat down with a few folks from the JetBrains team, and they wondered together: what might it look like to have a Kotlin-based approach to writing Gradle build scripts and plugins? How might it help teams—especially big ones—work faster and write better structured, more maintainable builds?
    The possibilities were enticing.
    Because Kotlin is a statically-typed language with deep support in both IDEA and Eclipse, it could give Gradle users proper IDE support from auto-completion to refactoring and everything in-between. And because Kotlin is rich with features like first-class functions and extension methods, it could retain and improve on the best parts of writing Gradle build scripts—including a clean, declarative syntax and the ability to craft DSLs with ease.
    So we got serious about exploring these possibilities, and over the last several months we’ve had the pleasure of working closely with the Kotlin team to develop a new, Kotlin-based build language for Gradle.
    We call it Gradle Script Kotlin, and Hans just delivered the first demo of it onstage at JetBrains’ Kotlin Night event in San Francisco. We’ve released the first milestone towards version 1.0 of this work today, along with open-sourcing its repository at https://github.com/gradle/gradle-script-kotlin.
    KotlinGradleBanner
    So what does it look like, and what can you do with it? At a glance, it doesn’t look too different from the Gradle build scripts you know today:
    pasted image 0
    But things get very interesting when you begin to explore what’s possible in the IDE. You’ll find that, suddenly, the things you usually expect from your IDE just work, including:
    • auto-completion and content assist
    • quick documentation
    • navigation to source
    • refactoring and more
    The effect is dramatic, and we think it’ll make a big difference for Gradle users. Now, you might be wondering about a few things at this point—like whether existing Gradle plugins will work with Gradle Script Kotlin (yes, they will), and whether writing build scripts in Groovy is deprecated (no, it’s not). You can find complete answers to these and other questions in the project FAQ. Do let us know if you have a question that’s not answered there.
    Of course, all this is just the beginning. We’re happy to announce that Kotlin scripting support will be available in Gradle 3.0, and we’ll be publishing more information about our roadmap soon. In the meantime, there’s no need to wait—you can try out Gradle Script Kotlin for yourself right now by getting started with our samples.
    And we hope you do, because we’d love your feedback. We’d love to hear what you think, and how you’d like to see this new work evolve. You can file issues via the project’s GitHub Issues and please come chat with us in the #gradle channel of the public Kotlin Slack.
    I’d like to say a big thanks to my colleague Rodrigo B. de Oliveira for the last few months of working together on this project—it’s been a lot of fun! And a big thanks to the Kotlin team, in particular Ilya Chernikov and Ilya Ryzhenkov for being so responsive in providing us with everything we needed in the Kotlin compiler and Kotlin IDEA plugin. Onward!

    An Intro to Encryption in Python 3






    by Sarang Nagmote



    Category - Website Development
    More Information & Updates Available at: http://vibranttechnologies.co.in




    Python 3 doesn’t have very much in its standard library that deals with encryption. Instead, you get hashing libraries. We’ll take a brief look at those in the chapter, but the primary focus will be on the following 3rd party packages: PyCrypto and cryptography. We will learn how to encrypt and decrypt strings with both of these libraries.

    Hashing

    If you need secure hashes or message digest algorithms, then Python’s standard library has you covered in the hashlib module. It includes the FIPS secure hash algorithms SHA1, SHA224, SHA256, SHA384, and SHA512 as well as RSA’s MD5 algorithm. Python also supports the adler32 and crc32 hash functions, but those are in the zlib module.
    One of the most popular uses of hashes is storing the hash of a password instead of the password itself. Of course, the hash has to be a good one or it can be decrypted. Another popular use case for hashes is to hash a file and then send the file and its hash separately. Then the person receiving the file can run a hash on the file to see if it matches the hash that was sent. If it does, then that means no one has changed the file in transit.
    Let’s try creating an md5 hash:
    >>> import hashlib>>> md5 = hashlib.md5()>>> md5.update(Python rocks!)Traceback (most recent call last): File "<pyshell#5>", line 1, in <module> md5.update(Python rocks!)TypeError: Unicode-objects must be encoded before hashing>>> md5.update(bPython rocks!)>>> md5.digest()b‚ì#döN}*+[ôw
    Let’s take a moment to break this down a bit. First off, we import hashlib and then we create an instance of an md5 HASH object. Next, we add some text to the hash object and we get a traceback. It turns out that to use the md5 hash, you have to pass it a byte string instead of a regular string. So we try that and then call it’s digest method to get our hash. If you prefer the hex digest, we can do that too:
    >>> md5.hexdigest()1482ec1b2364f64e7d162a2b5b16f477
    There’s actually a shortcut method of creating a hash, so we’ll look at that next when we create our sha512 hash:
    >>> sha = hashlib.sha1(bHello Python).hexdigest()>>> sha422fbfbc67fe17c86642c5eaaa48f8b670cbed1b
    As you can see, we can create our hash instance and call its digest method at the same time. Then we print out the hash to see what it is. I chose to use the sha1 hash as it has a nice short hash that will fit the page better. But it’s also less secure, so feel free to try one of the others.

    Key Derivation

    Python has pretty limited support for key derivation built into the standard library. In fact, the only method that hashlib provides is the pbkdf2_hmac method, which is the PKCS#5 password-based key derivation function 2. It uses HMAC as its psuedorandom function. You might use something like this for hashing your password as it supports a salt and iterations. For example, if you were to use SHA-256 you would need a salt of at least 16 bytes and a minimum of 100,000 iterations.
    As a quick aside, a salt is just random data that you use as additional input into your hash to make it harder to “unhash” your password. Basically it protects your password from dictionary attacks and pre-computed rainbow tables.
    Let’s look at a simple example:
    >>> import binascii>>> dk = hashlib.pbkdf2_hmac(hash_name=sha256, password=bbad_password34, salt=bbad_salt, iterations=100000)>>> binascii.hexlify(dk)b6e97bad21f6200f9087036a71e7ca9fa01a59e1d697f7e0284cd7f9b897d7c02
    Here we create a SHA256 hash on a password using a lousy salt but with 100,000 iterations. Of course, SHA is not actually recommended for creating keys of passwords. Instead you should use something like scrypt instead. Another good option would be the 3rd party package, bcrypt. It is designed specifically with password hashing in mind.

    PyCryptodome

    The PyCrypto package is probably the most well known 3rd party cryptography package for Python. Sadly PyCrypto’s development stopping in 2012. Others have continued to release the latest version of PyCryto so you can still get it for Python 3.5 if you don’t mind using a 3rd party’s binary. For example, I found some binary Python 3.5 wheels for PyCrypto on Github (https://github.com/sfbahr/PyCrypto-Wheels).
    Fortunately, there is a fork of the project called PyCrytodome that is a drop-in replacement for PyCrypto. To install it for Linux, you can use the following pip command:

     pip install pycryptodome
     
    Windows is a bit different:

     pip install pycryptodomex
     
    If you run into issues, it’s probably because you don’t have the right dependencies installed or you need a compiler for Windows. Check out the PyCryptodome website for additional installation help or to contact support.
    Also worth noting is that PyCryptodome has many enhancements over the last version of PyCrypto. It is well worth your time to visit their home page and see what new features exist.

    Encrypting a String

    Once you’re done checking their website out, we can move on to some examples. For our first trick, we’ll use DES to encrypt a string:
    >>> from Crypto.Cipher import DES>>> key = abcdefgh>>> def pad(text): while len(text) % 8 != 0: text += return text>>> des = DES.new(key, DES.MODE_ECB)>>> text = Python rocks!>>> padded_text = pad(text)>>> encrypted_text = des.encrypt(text)Traceback (most recent call last): File "<pyshell#35>", line 1, in <module> encrypted_text = des.encrypt(text) File "C:ProgramsPythonPython35-32libsite-packagesCryptoCipherlockalgo.py", line 244, in encrypt return self._cipher.encrypt(plaintext)ValueError: Input strings must be a multiple of 8 in length>>> encrypted_text = des.encrypt(padded_text)>>> encrypted_textb>üx‡²“üHÕ9VQ
    This code is a little confusing, so let’s spend some time breaking it down. First off, it should be noted that the key size for DES encryption is 8 bytes, which is why we set our key variable to a size letter string. The string that we will be encrypting must be a multiple of 8 in length, so we create a function called pad that can pad any string out with spaces until it’s a multiple of 8. Next we create an instance of DES and some text that we want to encrypt. We also create a padded version of the text. Just for fun, we attempt to encrypt the original unpadded variant of the string which raises a ValueError. Here we learn that we need that padded string after all, so we pass that one in instead. As you can see, we now have an encrypted string!
    Of course, the example wouldn’t be complete if we didn’t know how to decrypt our string:
    >>> des.decrypt(encrypted_text)bPython rocks!
    Fortunately, that is very easy to accomplish as all we need to do is call the **decrypt** method on our des object to get our decrypted byte string back. Our next task is to learn how to encrypt and decrypt a file with PyCrypto using RSA. But first we need to create some RSA keys!

    Create an RSA Key

    If you want to encrypt your data with RSA, then you’ll need to either have access to a public / private RSA key pair or you will need to generate your own. For this example, we will just generate our own. Since it’s fairly easy to do, we will do it in Python’s interpreter:
    >>> from Crypto.PublicKey import RSA>>> code = nooneknows>>> key = RSA.generate(2048)>>> encrypted_key = key.exportKey(passphrase=code, pkcs=8, protection="scryptAndAES128-CBC")>>> with open(/path_to_private_key/my_private_rsa_key.bin, wb) as f: f.write(encrypted_key)>>> with open(/path_to_public_key/my_rsa_public.pem, wb) as f: f.write(key.publickey().exportKey())
    First, we import RSA from Crypto.PublicKey. Then we create a silly passcode. Next we generate an RSA key of 2048 bits. Now we get to the good stuff. To generate a private key, we need to call our RSA key instance’s exportKey method and give it our passcode, which PKCS standard to use and which encryption scheme to use to protect our private key. Then we write the file out to disk.
    Next, we create our public key via our RSA key instance’s publickey method. We used a shortcut in this piece of code by just chaining the call to exportKey with the publickey method call to write it to disk as well.

    Encrypting a File

    Now that we have both a private and a public key, we can encrypt some data and write it to a file. Here’s a pretty standard example:
    from Crypto.PublicKey import RSAfrom Crypto.Random import get_random_bytesfrom Crypto.Cipher import AES, PKCS1_OAEPwith open(/path/to/encrypted_data.bin, wb) as out_file: recipient_key = RSA.import_key( open(/path_to_public_key/my_rsa_public.pem).read()) session_key = get_random_bytes(16) cipher_rsa = PKCS1_OAEP.new(recipient_key) out_file.write(cipher_rsa.encrypt(session_key)) cipher_aes = AES.new(session_key, AES.MODE_EAX) data = bblah blah blah Python blah blah ciphertext, tag = cipher_aes.encrypt_and_digest(data) out_file.write(cipher_aes.nonce) out_file.write(tag) out_file.write(ciphertext)
    The first three lines cover our imports from PyCryptodome. Next we open up a file to write to. Then we import our public key into a variable and create a 16-byte session key. For this example we are going to be using a hybrid encryption method, so we use PKCS#1 OAEP, which is Optimal asymmetric encryption padding. This allows us to write a data of an arbitrary length to the file. Then we create our AES cipher, create some data and encrypt the data. This will return the encrypted text and the MAC. Finally we write out the nonce, MAC (or tag) and the encrypted text.
    As an aside, a nonce is an arbitrary number that is only used for crytographic communication. They are usually random or pseudorandom numbers. For AES, it must be at least 16 bytes in length. Feel free to try opening the encrypted file in your favorite text editor. You should just see gibberish.
    Now let’s learn how to decrypt our data:
    from Crypto.PublicKey import RSAfrom Crypto.Cipher import AES, PKCS1_OAEPcode = nooneknowswith open(/path/to/encrypted_data.bin, rb) as fobj: private_key = RSA.import_key( open(/path_to_private_key/my_rsa_key.pem).read(), passphrase=code) enc_session_key, nonce, tag, ciphertext = [ fobj.read(x) for x in (private_key.size_in_bytes(), 16, 16, -1) ] cipher_rsa = PKCS1_OAEP.new(private_key) session_key = cipher_rsa.decrypt(enc_session_key) cipher_aes = AES.new(session_key, AES.MODE_EAX, nonce) data = cipher_aes.decrypt_and_verify(ciphertext, tag)print(data)
    If you followed the previous example, this code should be pretty easy to parse. In this case, we are opening our encrypted file for reading in binary mode. Then we import our private key. Note that when you import the private key, you must give it your passcode. Otherwise you will get an error. Next we read in our file. You will note that we read in the private key first, then the next 16 bytes for the nonce, which is followed by the next 16 bytes which is the tag and finally the rest of the file, which is our data.
    Then we need to decrypt our session key, recreate our AES key and decrypt the data.
    You can use PyCryptodome to do much, much more. However we need to move on and see what else we can use for our cryptographic needs in Python.

    The Cryptography Package

    The cryptography package aims to be “cryptography for humans” much like the requests library is “HTTP for Humans”. The idea is that you will be able to create simple cryptographic recipes that are safe and easy-to-use. If you need to, you can drop down to low=level cryptographic primitives, which require you to know what you’re doing or you might end up creating something that’s not very secure.
    If you are using Python 3.5, you can install it with pip, like so:

     pip install cryptography
     
    You will see that cryptography installs a few dependencies along with itself. Assuming that they all completed successfully, we can try encrypting some text. Let’s give the Fernet symmetric encryption algorithm. The Fernet algorithm guarantees that any message you encrypt with it cannot be manipulated or read without the key you define. Fernet also support key rotation via MultiFernet. Let’s take a look at a simple example:
    >>> from cryptography.fernet import Fernet>>> cipher_key = Fernet.generate_key()>>> cipher_keybAPM1JDVgT8WDGOWBgQv6EIhvxl4vDYvUnVdg-Vjdt0o=>>> cipher = Fernet(cipher_key)>>> text = bMy super secret message>>> encrypted_text = cipher.encrypt(text)>>> encrypted_text(bgAAAAABXOnV86aeUGADA6mTe9xEL92y_m0_TlC9vcqaF6NzHqRKkjEqh4d21PInEP3C9HuiUkS9f b6bdHsSlRiCNWbSkPuRd_62zfEv3eaZjJvLAm3omnya8=)>>> decrypted_text = cipher.decrypt(encrypted_text)>>> decrypted_textbMy super secret message
    First off we need to import Fernet. Next we generate a key. We print out the key to see what it looks like. As you can see, it’s a random byte string. If you want, you can try running the generate_key method a few times. The result will always be different. Next we create our Fernet cipher instance using our key.
    Now we have a cipher we can use to encrypt and decrypt our message. The next step is to create a message worth encrypting and then encrypt it using the encrypt method. I went ahead and printed our the encrypted text so you can see that you can no longer read the text. To decrypt our super secret message, we just call decrypt on our cipher and pass it the encrypted text. The result is we get a plain text byte string of our message.

    Wrapping Up

    This chapter barely scratched the surface of what you can do with PyCryptodome and the cryptography packages. However it does give you a decent overview of what can be done with Python in regards to encrypting and decrypting strings and files. Be sure to read the documentation and start experimenting to see what else you can do!

    Related Reading

    Related Refcard: