The Hussey Coding blog: January 2016

27 January 2016

Magento database clone script

We recently added a new gist to our github account - a Magento 1 command line database syncing tool. Most store owners will find the need to have at least a staging site for their store, and probably a development site as well. While it's easy enough to get the staging environment mirroring production in terms of configuration, and the fileset handled by version control, syncing the database contents is less straightforward - but none the less desirable.

In an environment where you want to mirror production as closely as possible both in terms of configuration and data, having database contents lagging far behind production isn't ideal. Take for instance a situation where you find a bug that is apparent on production but not in any other environment. In my experience 99% of the time you eventually find issues like this to be down to differences in admin configuration, and therefore the database. So it's important to have your stores staging environment properly synced to production to give yourself the best chance of finding any bugs before code is deployed to your production environment.

That's exactly what our syncing tool does - it requires n98-magerun and takes an already exported dump of your production database, imports it to your staging site and updates the install specific contents such as base url, cookie domain and secure front/backend settings.

You can find the script here with full usage instructions underneath. I hope you find it useful!

20 January 2016

What's the point of unit testing?

I'll start this post by saying that I have never been a huge fan of unit testing (this is in PHP), and though I have read a lot about it and also written unit tests myself, I have never read about or experienced anything with unit tests that really shows me it's worth in my development cycle.

Let me explain why. Firstly I'll make this clear - I can absolutely see the importance of testing code and I always thoroughly test every execution path of every method I write. Releasing code that you just write and don't test is clearly not a good idea. So it's not that I am against testing the code I write at all.

The reason I don't see the point in unit testing is that I develop, and always have developed using an IDE and xdebug (which is the same thing I would hope any serious developer would be doing). This of course allows me to fully test any code I want by setting breakpoints and stepping through code execution to inspect what is happening on a line by line basis. To me stepping through code execution like this line by line, being able to see exactly what variables are being set to what, inspect objects and their contents, step into/out/over code offers so much more than unit testing when it comes to testing your code and verifying that it is working correctly. Both at a single method/class level as with unit testing, and at a wider level relying on external objects and services as with integration testing. While a unit test allows you to verify the end result of a particular code execution path, there are serious limitations on what can be realistically tested, for instance methods which don't return anything and just contain essentially business logic, protected/private methods etc. You can't directly test this with unit testing (or at least not without some kind of hacky approach using reflection and so on) but you can step through all of this with xdebug and breakpoints as you are testing the actual code execution flow on the actual class and method you want to test.

You will probably not have missed the fairly recent general release of Magento 2, and may well have noted that this ships with it's own testing framework for unit and integration testing. Finding unit testing to now be such an integral part of the Magento codebase prompted me to revisit it in a bid to find what I may have missed before. Surely there is just some important point to unit testing I just haven't managed to grasp yet? Well...not that I have managed to find at the time of writing.

One of the things I have been doing recently is deciding whether or not to port our current Magento 1 extensions to work under Magento 2. I say port, but in reality it's basically a total rewrite from scratch for every extension (which isn't a thrilling proposition, but that's another story). I have been really enjoying working with the Magento 2 codebase and I think it genuinely is a long way ahead of Magento 1. So rewriting extensions under Magento 2 is largely an enjoyable, if not substantial task.

So why not write unit tests at the same time for these extensions? And this is how I was prompted to revisit unit testing. So after a quick refresher on assertions, mock objects, mock and stub methods etc I came to writing some tests for one of our modules, which quickly moved to looking at existing unit tests in core modules. I came across the following test for the catalog module in test class Magento\Catalog\Test\Unit\Block\NavigationTest:

public function testGetIdentities()
{
    $this->assertEquals(
        [\Magento\Catalog\Model\Category::CACHE_TAG, \Magento\Store\Model\Group::CACHE_TAG],
        $this->block->getIdentities()
    );
}

To me this is a good test to demonstrate the title of the post, what's the point of unit testing?

So as you can probably see, this is a simple test that checks that an array containing 2 class constants is the same as the return value of the getIdentities() method called on an object. Looking at $this->block we see that it is an instance of class Magento\Catalog\Block\Navigation instantiated by the object manager in the setUp() method (so not a mock). Looking at the class we can see that the getIdentities() method does the following:

public function getIdentities()
{
    return [\Magento\Catalog\Model\Category::CACHE_TAG, \Magento\Store\Model\Group::CACHE_TAG];
}

Yes, the method returns exactly the class constants tested for in the assert. Now unless I am missing something pretty fundamental, what is the point in this test exactly? It is nothing more than testing if a value equals itself, which of course it does. This is a test which can never fail as whatever either class constant may be changed to in the future it will still equal itself. To my mind this test just has no value at all, and due to the limited scope unit tests are designed for, summarises the overall usefulness of any unit test. I could understand if the class constant needed to be a certain value in order to be used to say pull some data from a database (which would cause the test to fail in the case of a change), but allowing reliance on external dependencies like this is not what unit tests are for. Any external dependency like this would be mocked with any methods which return data collected from some external source like this stubbed with the return value hard coded. Again it wouldn't matter what the class constants were changed to, the test would still pass because of the hard coded return value in the stub method. To me it seems that with the removal of dependencies all unit tests boil down to essentially the above and actually achieve nothing of worth.

So if somebody can tell me that the above test is just a poor example and enlighten me on the real point to unit testing I have been missing all this time, that would be great. At the moment the only point I can see to unit testing is to catch badly written code where the developer doesn't really understand the classes they are working/integrating with - and no I don't think that's a good enough reason for unit testing being as prevalent as it is.

I do understand that unit testing does have side benefits such as essentially acting as a type of documentation to the methods being tested, and giving developers who might be looking at code for the first time confidence in making changes by allowing them to run tests after making those changes. But that still doesn't get away from the fact that the developer still has to come to the point where they do understand the code they are working with, and in my company at least if you don't understand the code you are working with completely yet, the code you write doesn't get released until you do. Otherwise you are just intruding on cut and paste territory which is somewhere no worthwhile developer should ever be found.

Another side benefit, and the one that is perhaps the greatest is the way unit testing forces you to write your code. If you want to unit test you need to write code such that you can unit test, and this generally results in each method being as short and concise as possible, achieving only one goal. Dependencies should be passed to the methods that need them or be stored to the object rather than instantiated in the method itself, but Magento 2 pretty much has that covered with the object managers automatic dependency injection functionality for class constructors. Also logic which queries or uses any external dependency or service should happen in it's own method so when the object is mocked, that method can be stubbed to remove that dependency.

These side benefits are all great, and if wanting to write unit tests in a round about way forces you to follow these kind of coding conventions and principles then that can only be a good thing. But again I would hope that any developer with reasonable knowledge and experience would automatically be writing short and concise methods which are named sensibly, and perform only one task anyway. Regardless of coding this way being advantageous when it comes to writing unit tests, common sense should tell you this is a sensible way of doing things anyway as it makes your code so much more maintainable with it being very much quicker and easier to understand.

So please, use the comments below to give me your opinion and convince me I should be writing unit tests along side testing my code with xdebug!

18 January 2016

Magento 2 FPC and Evolved Caching

Magento 2 has finally seen it's general availability release as of late last year, and the rebuilt platform brings with it some valuable new additions such as ajax add to cart, editable admin grid rows and perhaps most significantly a full page cache for both community and enterprise versions of the platform. These are great additions as it means the core team has really been listening to what customers want and have built into the new version features which are real goto extension for Magento 1.

I have really enjoyed working with the Magento 2 codebase and I'll be honest, it really does feel light years ahead of Magento 1. Removal of the Mage 'god' class in favour of the functionality offered by the object manager and other more minor (but still significant) steps such as no more code pools and all extension files residing in the same directory location are really sensible steps. As well as listening to their customers, it seems like the core team have also really learnt from how Magento 1 was developed, used, and customised by third parties. Magento 2 feels like a really solid, defined platform to develop from with many areas that were considered to be open for discussion by many developers in Magento 1 being clearly defined in Magento 2.

If anything, you could argue that some of the functionality that is offered by the object manager has gone a bit too far, i.e. is basically overkill purely because at the moment it seems to cater only for real edge cases and might never prove that useful to third party developers. However these opinions may change as the platform matures and people get familiar with the full breadth of the codebase and what can be done with it. Ultimately, we developers certainly can't complain about Magento giving us more rather than less options, we just need to be sure we are using the right tools for the right application.

If you are interested in getting to grips with the object manager in Magento 2 then Alan Storm has a great series you can have a look at.

So finally getting to the point of the post, the addition of built in full page cache for Magento 2 creates a dilemma for third party extension developers (such as us) who have developed our own caching extensions for Magento 1. For us, we have to make a choice, do we port our caching solution to Magento 2 or does the built in solution negate the need for this? Unfortunately there is very little documentation covering the functionality of the full page cache in Magento 2, so we have instead been diving into the codebase in order to answer the above question.

It appears the core team have taken steps to make full page caching easier for their built in solution. They have made some noteworthy changes in Magento 2 such as category filtering now being restricted only to URL (where as it was also stored to the session in Magento 1), and storing the frontend form key to a cookie for quick and easy access (where it was only found again in the session for Magento 1). So removing dependency on the session when serving cached content seems to have been to some extent the order of the day.

When reviewing the full page cache logic my point of greatest interest was where and how the request was intercepted to serve cached content as opposed to the framework processing the request as normal. I expected to find an equivalent of the early request_processors hook in point used by the enterprise full page cache solution in Magento 1, but when investigating it seems that there is no equivalent in Magento 2. The core team have stuck firmly within the confines of the framework as it stands and used plugins to intercept the request, deciding at this point whether to serve cached content or send the request into the framework as normal.

My reason for being most interested in this intercept point is because full page caching id of course all about how fast you can deliver cached content to the browser, with the TTFB defining how long it takes for the page to start rendering in the browser - the longer this takes, the 'slower' the page is for the user. The choice of using a plugin is why I am rather disappointed with the performance of the built in full page cache for Magento 2 - it just isn't all that fast.

It's doing the right thing in development terms - using a plugin to intercept the request is not outside of the boundaries set by the framework, and is traceable and easy to find by third party developers as it's not using something else entirely (such as request_processors) to serve content, but the performance it gives is just pretty average. I wonder if any of the core team wanted to implement a faster solution with an earlier hook in point? We will probably never know.

The problem with using plugins is that in order to do this you need to instantiate the object manager, and this pretty much means instantiating the entire framework. Doing the equivalent of this in Magento 1 was slow, but Magento 2 is that much more complex and so takes even longer. An example - in my local development environment I see a TTFB of around 33% with the full page cache - so pages are served in around 6 seconds with caching disabled, and 2 seconds with caching enabled (using built in cache rather than Varnish). While this is a significant improvement, and production environments would have a faster setup with this process completing quicker, the percentage improvement would still be the same.

My issue here is that with Evolved Caching under Magento 1 you are looking at less that 1% of the time for TTFB (see the benchmarks) which is possible because full page cached content is served entirely outside of the Magento framework making it massively faster. There just really isn't any comparison between the two with around 33% being rather disappointing compared to less than 1%.

So to answer the original question, should we port Evolved Caching to Magento 2? Considering the performance we are seeing with the built in full page cache, I think the answer has to be yes. The challenge is building it in such a way that it gives equivalent performance to it's Magento 1 counterpart, yet falls within best development practices for Magento 2.

Keep your eyes open, but the Magento 2 version of Evolved Caching is under development and we will let you know when it's released.

14 January 2016

An effective git branching strategy

We have been using git for a number of years now to manage the codebase for all of our client projects, and have over this time gradually refined the branching strategy we use to the point where I think, for our setup (which I would think is probably a pretty common setup for many) it really works well.

Why not git flow?

I'm sure if you are reading this post you have probably come across git flow which is a popular strategy, but one that has always struck me as rather over engineered and actually unworkable for us. I really wanted to like git flow, but the strategy just seems (unless I have totally the wrong end of the stick) to have major flaws when working with new features, mainly revolving around the fact that for normal development - ignoring hotfixes - you only ever branch off develop. Another issue through is that the first time code is pushed remotely from a developers local environment, it should be production ready - because it won't be tested in any remote server environment until the code reaches the master branch. If it's not production ready then it will hold up future commits and deployments of other features. More often than not this initial push of local code won't be production ready immediately, and the problem with git flow is that there is no support for deploying from multiple branches, in turn supporting multiple testing environments before finally reaching production (yes you can deploy to multiple environments from the same branch but this still holds up production deployments).

Take for example the following scenario. You develop in a new feature branch (based off the develop branch), the feature is completed and merged back into develop. Work then starts in another feature branch (based off the updated develop branch), and this also is completed and merged back into develop. While the second feature is being developed however a bug is discovered with the first feature meaning it's not production ready. At this point because you have developed the second feature branch based off the updated develop branch (which now contains non production ready code), you can't release either the first or the second feature as the commits for the production ready second feature sit infront of the commits for the non production ready first feature. Essentially there can be no release of any feature until all features in develop are production ready.

Now this is what the release branches are designed for but realistically the workflow is rarely that cut and dry as to allow you to take a single point in the develop branch where everything is ready to go to production and create a release here. Also because the bad code still exists in the develop branch all further releases are still held up until a fix has been added to the release branch and the changes merged back into develop. Realistically work will always be required in the release branch and because this can only be merged back into the develop and master branches this also means that the feature branch then lacks a complete history of that feature. Having a complete history is not only extremely useful in the future should you need to work more on that feature, but it also makes the changes made to build that feature far easier to review than they would be in the merged develop or master branches. This is because unless a merge only contains a single commit you can only review the sum total changes of the merge, not the individual commits which make up that merge.

This is a simplified scenario but with a team of multiple developers/multiple features on the go at any one time you can imagine things can get very messy very quickly with the only options really to fix the situation being:

Work on the broken feature until it is production ready holding up all releases until it's fixed.
Revert all of the commits for the broken feature so that production can continue.
Cherry pick the required commits from develop into the release branch excluding the broken code - but this breaks merging and comparison as cherry picks are totally new commits. It's also time consuming and open to human error as it's basically a manual merge. Finally it still holds up further releases as the feature is still broken in develop.
Deliberately release the broken code and hotfix it once it's in the master branch.
Release anyway but deploy commits out of order skipping over the bad code (which again is open to human error, can produce undesired results further down the line when fixing the deploy order, and means the server is out of sync with the master branch)

None of the above are great options, but the first I would at least consider to be the right option in development terms (but perhaps not in business terms if the fix is not quick).

You could say restrict yourself to only developing one feature branch per release which you could, but however you work it if non production ready code makes it into the develop branch there can't be any releases until that code has been fixed or removed, and if it's not a straightforward issue to resolve, new commits can really pile up in front of the bad feature merge seriously holding up the project.

In the real world you are inevitably from time to time going to get non production ready code in the develop branch however much you test it in your feature branch - that's just the nature of code. This is particularly true as these feature branches also generally only exist in the developers repo meaning only the developer has tested it until it gets pushed to a remote repo. This very real scenario is an absolute show stopper when it comes to git flow for us, and I really can't see how it's workable in many git flow setups without majorly holding up production.

Something much simpler

After a fair amount of consideration and testing over time I eventually came to the conclusion that something much simpler than the approach taken by git flow was a far better solution for us. The approach we use is highly flexible, simple to pick up by any new developer and scalable whatever the size of your development team. It also does not need any software running on top of git to use - it is in fact somewhat similar to the approach taken by github who also did not find git flow to be a good fit.

First let me explain the setup we use for our client projects. In all of our projects we have various tools we use to manage the project, including ticketing for the client to assign tasks to us. The client will typically have at least one live testing environment we will always deploy to before anything is deployed to production (regardless of how trivial the change is). The client can then test and approve the changes, and once they are happy things are working correctly, the code gets deployed to the next closest environment to production (which could be production).

So for instance, often the client may have a development environment which is a very clean setup without any of the optimisation measures you are probably going to find on production (i.e caching). After that they might also have a staging environment which is a very close clone of production so that the code can be tested somewhere that has all the optimisation measures (or any other changes for that matter) that production has, but development does not have. Once code has been tested on staging and approved, you've got about as close as you realistically can to testing the code as it will run on production. At this point the code can be considered production ready and therefore be deployed to production for customer use.

In other cases the client won't have a complex production setup and so only one development/staging environment is required infront of production.

Our branching strategy

So that's the workflow that we use for our branching strategy, I'm sure nothing I've described above is unfamiliar to you. So the base idea behind our branching strategy is that all new branches that are developed are based off the production branch that is furthest behind (generally master). By doing this you can be sure that any new feature you develop in it's own branch can be safely merged into any other branch, including your production branch, and you will only ever add the code for that feature into the branch and nothing else (which is not the case with git flow). Doing this keeps things very clean and structured from the outset.

So our strategy has the master branch as the branch from which you deploy to production, and everything that is developed is done so in a branch which is based off master. As well as master you have one additional branch for each live testing environment, so you might have main branches of master, staging and development. These extra branches start out life also based off the master branch, and along with master are considered the 'main' repository branches.

So the starting point for the repository is the master branch containing up to date production environment code, with one extra branch for each live testing environment in the project. These extra branches are based off, and are initially identical to the master branch.

From this starting point each developer creates a new branch based off master to work on their particular task. In our case because we use ticketing we use the ticket number as the branch name and this also makes it quick to visually link an existing branch to a ticket. If the task needs to be worked on by multiple developers then this new branch can be pushed remotely, but typically a ticket will cover work which can be handled by a single developer and so these branches mostly will only be found in the developers local environment.

Once the developer is happy that the work in that branch is ready for testing, they can merge the changes into the branch furthest from production (so development for the 3 environment setup described above). Here the changes can then be deployed to the relevant live testing environment for approval - according to your deployment setup for the branch. Once approved, the branch would then be merged into staging and deployed, and finally merged into master and deployed to production once approved under staging. This can be happening simultaneously on any number of branches with any number of features and you are never in danger of putting any code onto production that isn't already there as your branch is only ever based off master. You also have the flexibility to deploy any combination of changes to any combination of testing environments or production. The only time you should ever get non production ready code in the master branch is if testing in the other live environments and by any developers involved has not revealed a bug, and this is only discovered later.

Typically you will find the most code to be found on the branch furthest from production, which in this case is development. This is for obvious reasons - all of the code which needs testing will be found in this branch, and it can often be the case that code in this branch stays here for further development for some time before making it to production. It's also more than possible that code in this branch may never actually reach production at all if a change is scrapped. This is the reason why we never base branches for new tasks on anything other than the master branch.

Initial setup

So to setup ready to use the branching strategy, first initialise the repo with the production codebase:

git init
git add .
git commit -m 'initial commit'

Then create the extra main branches you need based off master, one for each live testing environment:

git branch staging
git branch development

At this point you should push the repo to your remote git hosting and you are ready to start work on the project.

Ready to work

Each new developer should clone the repo to their local development environment. Once done and they have a task to complete they should create their new branch and work on that:

git checkout -b 43 master

This will create, and switch to a branch named '43' - the branch can be called anything, but this is an example of naming a branch after a ticket number. You then work on and commit the changes for the task and once happy they are ready for testing, merge into the furthest branch from master. A note here - when merging back into any of the 'main' branches (master, staging, development etc) from working on a task like this you should use the --no-ff argument to show the branch you are merging in the repo history.

git checkout development
git merge --no-ff 43

Once merged, push the branch ready for deployment:

git push origin development

This same process should be repeated for each main branch as required so:

git checkout staging
git merge --no-ff 43
git push origin staging

to deploy and test in the staging environment, and finally:

git checkout master
git merge --no-ff 43
git push origin master

when the code is production ready. If a bug is found on production, you just follow the same procedure as normal fixing it in the relevant branch, then merge, push and deploy in the relevant environments. If the fix must go straight to production that's fine, but remember to also merge into the other main branches - the other main branches should always contain everything master does.

If while working on a branch it is decided that the task should be passed to another developer, or that multiple developers should work on it then that branch in the developers local setup can just be pushed to the remote repo. From here all relevant developers can just pull down the branch and work on it.

A side note. As branching is so cheap in git you shouldn't find need to delete your local branches so we generally keep them all - you never know, that old ticket could still re-open. This means that over time the number of branches in your local repo will gradually increase, and if you find need to work on an old branch it could be quite out of date. Here you just need to merge master back in to update the branch (without --no-ff).

git checkout 43
git merge master

Conclusion

Not much to say here except that's it, pretty simple isn't it? Feel free to give this branching strategy a try yourself and if you have any feedback leave it in the comments below.

Pages