Category Archives: AEM CMS

AEM Solution: AEM Author activity reports

AEM CMS lacks so many fundamental features and one of the critical feature is author activity reporting. We can say AEM has reporting: Disk usage, user activity, page activity & workflow instances etc. But in reality none of them are useful when it comes to basic features of reporting.

In my opinion, Without ACS Commons tool, AEM as CMS hasn’t provided many capabilities except stupid touch UI. Every team has to develop so many custom solutions to support operation work. One of the example, Migrating content from one environment to another. Will talk many more issues in AEM. Let’s explore about reporting feature in AEM.

Scenarios/Need of reporting feature in CMS

Let’s say there are many websites & brands hosted in one AEM author environment and multiple content teams are putting content at the same time. Page deleting, modification etc would be normal activity for a large team. And, Team often struggles to find out who has modified their pages, deleted etc. The Biggest question is that how do we restore the content? but keeping track of normal activities is essential.

Page Activity Report Solution

AEM has reporting capabilities called page activity report. AEM Reporting lacks following basic features:

  • It is unresponsive & provides very basic information.
  • Filtering based on date, author etc isn’t provided.
  • Querying feature isn’t available.
  • No way you can check what section of the page was modified?

Solutions

AEM OOTB (Out of the box) Page activity report could be helpful if you know the page name or title and you want to track of that page. In above snapshot, Filter setting provides a way to find out about the page.

Custom Solution using PageEvent Handler

Here is the one custom solution to track all the events of the page in AEM. Keep one PageEvent handler and keep pushing activities into JCR node or other storages.

@Component
@Service
@Property(name="event.topics", value= {DamEvent.EVENT_TOPIC, PageEvent.EVENT_TOPIC})
public class PageActivityReport implements EventHandler {
    /**PageModification.ModificationType.CREATED
       PageModification.ModificationType.DELETED
       PageModification.ModificationType.MODIFIED
       PageModification.ModificationType.MOVED
       PageModification.ModificationType.VERSION_CREATED
       PageModification.ModificationType.RESTORED
    ***/    
   @Override
   public void handleEvent(Event event) {
     PageEvent pageEvent = PageEvent.fromEvent(event);
    if(pageEvent != null) {
       Iterator<PageModification> modifications = pageEvent.getModifications();
        while (modifications.hasNext()) {
            PageModification modification = modifications.next();
            if (PageModification.ModificationType.CREATED.equalsIgnoreCase(modification.getType().toString())) {
        //Log it or write code to save created pages.
        } else if (PageModification.ModificationType.DELETED.equalsIgnoreCase(modification.getType().toString())) {
        //Log it or write code to save deleted pages. Notification or alert can be triggered from here.
        }else if (PageModification.ModificationType.MODIFIED.equalsIgnoreCase(modification.getType().toString())) {
   //Log it or write code to save modified pages.
   }
   }
 }
}

In the above code, We have multiple events specific blocks to write custom reporting code. One of the way is to create records of these activities is to create simple JCR nodes for each activities. Data models for reporting could be as follows.

Path of the code could be: /page-report/<today’s data in yyyy/mm/dd>/<current time in hours>/<page-path-replace_slashwith_hyphen>/<author>

  • path: /content/abc/en/example.html
  • pageTitle: <title of the page>
  • event: delete/modify/moved
  • activityBy: who performed any activity
  • timestamps: <Format should be correct so that it can be queried>

Final Thoughts

The above solution can be implemented or scale for other types of reporting. For example, Keep tracking assets activity reporting. One Challenge in scaling this solution would be, Keeping activities records in JCR nodes and fetching them quickly. Also, Above data model needs more thoughts based on how query would look like when generating final reports.

AEM Solution: How to Clear dispatcher cache by myself?

In any web application, The caching has significant value in the overall performance of the application. But, At the same time, Developers like to test their changes & able to clear the cache frequently. The caching of AEM CMS content caching happens two places: Web Server (i.e apache) & CDN (i.e Akamai) Server. However, AEM comes with dispatcher module within the webserver to handle caching request coming from AEM author environment.

Basically, Whenever content author activates any content page path from AEM author environment, A HTTP request goes to AEM publish server to trigger an event for another HTTP push request to the dispatcher module (via web server). This dispatcher push request purge the cache of a requested path by changing the timestamp of state file. Anyway, An explanation isn’t required about how dispatcher works? We can skip that part.

AEM Cache clearing totally depends on content paths & type of the pages. For example, if you want to clear the cache of any AEM Page / Image, you can just publish the same AEM Page / Image from AEM author & cache gets refreshed provided dispatcher module is configured on the publishing server.

Problems in cache clearing

It may seem easy clearing the cache of AEM Pages & Assets. In the following cases, It is very problematic in many cases. Some of them listed here.

  • Clearing the cache of Javascript minified file. Path of the file & client libs does not match at all.
  • Clearing the cache of a content request which is a servlet path but do not existing in real content hierarchy. /bin/myapp/servlet/abc.html
  • Clearing cache of vanity url.
  • Clearing cache of url which has different path but AEM mapping helps to resolve the path. For instance, Live url is /myapp/abc/xyz.html but content hierarchies are /content/myapp/en/1/abc/xyz.html

What are the traditional solutions?

  • Ask someone who has access to login to web server & clears the cache manually. But here is the catch. How many times you can ask for it if you are testing your javascript code.
  • Run curl command to clear the cache but for this, You need to know web servers dispatcher IP/ domain etc. And if there are multiple web servers then you have to clear the cache of one server at a time.
  • Run Jenkins job which may clear all the cache. And it could be problematic if you do it in stage or prod.

Easiest Solution

All the above problems are not that bad & there are solutions to it. However, As a developer, I would like to have quick & an easy way to clear the cache by myself. The AEM dispatcher module purges the cache based on the path. And to use this feature, You can clear the cache of any file/Path/Assets. Following below steps to clear cache without anyone help.

Let’s take an example of purging the cache of your minified javascript file. Path of the file is/etc/designs/myapp/core.mini.js

  • Create a file with the same name & path.
  • Activate the same file.
  • The Dispatcher would update the cache file & start referring to your dummy file found as a new file.
  • De-activate the same dummy file right away. This is required because Your dummy file will not have correct content or code. So, Make sure you de-activate the same file again.
  • Once the file is de-activated, ClientLibs or AEM path resolution will happen as normal.
  • You can delete the same file in the author for future purpose or you can delete it.

Above solution works with any other path or file. Be it a JSON, XML, HTML etc. The only condition is that the path which you want to clear from the cache has to be created in the author first.

Finally Thoughts

These solutions are tested but can’t say that it solves all the problems. You can post your queries in the comment section. Will look into those & revert back.

AEM Security: How to secure the AEM application?

Overview

There is a set security practice followed by every development team in Adobe experience manager ( i.e AEM) CMS technology. And, Most of these are pretty straightforward suggested by the Adobe as best practices however there are many other security issues which have equal importance.

So, Let’s begin to know how to secure your application by putting right rules in your AEM environment.

All other recommendations from the open web application security project(i.e OWASP) should be applied. Below recommendations are very specific to AEM technology & AEM infrastructure.

There are many problems which are unknown to the AEM Solution provider & putting the whole thing at risk. I would like to state one of the examples here to showcase the security problems in AEM.

Use below Google Query to find out if your author instance is indexed by the google or not. I have used a very basic query in google. Try it, you would surprise to see how many author instances which are open to exploits. You might be wondering how to login in those authors. That is fairly easy once you know who has authored the pages.

Google Query: inurl:aemauthor

AEM Author Security:

First & foremost, Make sure your AEM author instance isn’t searchable by the search engine & It is not accessible outside of Intranet without VPN. Follow some author security guidelines below:

  • Keep robots.txt for all your domains including the authoring environment. make sure Google does not index author domain.
  • Enable HTTPS in AEM Author.
  • Changing Admin password in every AEM instance (i.e server).
  • Create groups for assigning access & follow the least privilege principle. Basically, Instead of denying on many hierarchies just allow what individual group needs.
  • Create a separate replication user to use in replication agent configuration. Admin should not be used for replicating anywhere.
  • Limit the number of users in admin groups.
  • Web dev, CRX explorer & CRXDE in prod author should be disabled or should be limited to certain users.

AEM Publish Security

Same as AEM author, publish instances should not be accessible to an outside of the intranet & connections to web servers, author etc should be internal connections. The most important thing to handle in publish security is to handle requests inputs & use proper request sessions. Serving requests with admin session or privileged user is a big problem. 

Assume some data you have to read & anonymous user does not have permission to that then avoid using admin session. Have a dedicated user for that to read/write the content for certain requests. Follow other guidelines respect with AEM Publish security:

  • Anonymous permissions should be checked & make sure not every directory accessible to the anonymous user. Even in etc design, There should be proper permission setup in cloud services etc.
  • Apache Sling Referrer Filter must be configured to handle unwanted publish requests.
  • The cross-site forgery framework should be enabled to filter requests.
  • All default tools (Crx explorer, Crxde, WebDev) etc should be disabled.
  • No one should be able to access publish server directly. Also should not be able to install packages directly.

Dispatcher security

When anyone thinks of AEM security, most of us just think of rules & filters in dispatcher.any configuration file. But, There are many more use cases where things are not pretty if you have not taken care of security:

  • Do not have dispatcher flush agent configured from AEM author. And if it is enabled then have https call for flushing cache. Otherwise, author flush agent exposes to your web server IP & credentials.
  • Limit the request headers information. Request headers are passed in every request to AEM publish based on dispatcher configuration.
  • Do not allow cross-origin requests. Set the SAME origin header at the web server level.
  • Proper input validation should be done in POST Requests & dispatcher filter should allow only certain POST requests.
  • Caching of selectors & URL extensions should be defined. Not every selector or extension should be cacheable. DOS or DDOS attacks are very easy to do in AEM application.
  • Website URL’s should not expose internal directories.

Final thought

We have to secure the infrastructure & security of important environments. Once you have security author, publish & proper dispatcher configuration, you would have a better chance to protect your application. Application security is another aspect follow the below links for Adobe recommendation.

Developer Opinion: A few basic problems in Sightly template

AEM Sightly has been a big buzz in AEM technology & there are thousands of articles written on it. Most interesting things to notice that most of the articles only talk about syntax & how to use sightly code. This is great for the Developers to just re-use code.

In this post, I would like to raise some important questions which I have not found in any articles or posts. If anyone finds answer & explanation, I would thank him if he/she let me know in comments. Will keep recording other Sightly problems but for now, Here are some of them:

Why else Condition not supported?

As far as I know, most of the languages support else condition whether language is scripting, template or OOPS except AEM Sightly template language. We don’t know what super logic & architecture drawbacks it has to support else condition. One argument is that Adobe does not want else condition in HTML tags. Fair enough but what problem you may have with ‘else’. I understand that Nobody wants business logic to be in HTML. But, That is the best practices Adobe can recommend.

Let’s understand a scenario for the clarity & why else is equally important. 

Let’s say, Based on some authoring checkbox, You would like to serve different HTML Content (i.e two types of DOM structure). If you don’t have else then developer has to put if with boolean condition & another if with the negative condition.

<sly data-sly-test=${IsFlagChecked}> // show checked content </sly>

<sly data-sly-test=${! IsFlagChecked}> // show non checked content </sly>

The workaround with NOT operator is fine & by this logic, someone might think that else statement isn’t required at all when there is a workaround. My argument would be workaround should not be objective when you design a new template language. There must be some solid use case not to support that.

Why no argument allowed in get method?

Let’s consider a scenario where you want to get output from Sling Model in different instances with the HTML but output varies based on parameters passed to the method. Can you pass the parameters to the GET method mentioned in USE class? The answer is NO. 

Passing parameters from Sightly is only allowed at the time of USE class initialization. It means you should know all types of inputs to different methods before calling the GET method. And, Sightly is forcing you to have global variables initialized when USE class gets initialized. In general, Not a good idea to keep every business logic with global variables. It seems to me that AEM works good because the objective is to generate HTML & cache it. Nothing else.

Why No method overloading not supported?

When you have a common abstract USE class with a bunch of common defined methods & you wish to overload that method with a different parameter, that is not the possible at all. You can’t call your overloaded method. Your USE class must have their own method & supporting GETTER method for the global variables. Basically, end up creating extra codes which may not be required at all.

Again, I would not deny that this can’t be solved with work around. However, There must be some explanation for not following concept which we have been practising & grow up learning many common design principles. 

Final Thoughts

I believe there may have some good reasons to do the way it is now. But, logically it doesn’t seem right to me. And, my intention is to get those answer through this article. Hoping that someone might know the answers.

AEM Solution: The easiest way to copy content from one AEM to another.

Moving Content in AEM is a big task regularly. In my personal opinion, it is big task for everybody. Let me try to explain in details. Let’s consider a scenario where you want to move content from one AEM environment to another. The easy thing is to do to use AEM Package manager. That is good. And just build a package from one AEM, download it & install somewhere else. Easy process? You may think it is but it is not. From the Business perspective, the Package Manager tool totally sucks & for the following reasons:

Lack of basic features in Package Manager:  There are many basic features missing. Some of them are:

  • No way you can schedule the content package as a whole. And, if 100 pages to be scheduled then Each individual pages must be scheduled to replicate them.
  • No way you could upload the individual pages content from one environment to another if individual pages are the parent pages in the content hierarchy. All the content has to be overridden.
  • Not easy to revert the certain content if installed by the package manager. either whole content or nothing can be reverted.

Not easy to use by the Non-Technical Person:  Authoring team must have a working knowledge of package manager tool. I know you might think working knowledge? My answer would be YES. Someone needs to know how to upload, build, install, download & uninstall etc. And needs access to the packages when someone can misuse it.

Time-consuming & does not work in most of the cases: Downloading from one environment & uploading in another is very old fashion & time-consuming. For heavy content like size GB’s, It does not even work. 

So, Here are the list of possible Solutions:

  • TWC Grabbit is one of them. It was developed by one of our team members however not sure if it is working in all the AEM versions. It has so many dependencies & Needs to install & managed in source & destination. But it was a quite good one.
  • AEM Package Manager Out of the box.
  • Copy whole source CRX-QUICKSTART folder & override the destination: Not a feasible option if the content has to be moved to production from stage or from stage to prod. Also not a solution if you want to move the only fewer pages or images. However, Not bad solution for Dev & QA but comes with lots of maintenance once the content is overridden.

The most easiest way move content regularly

All above solutions require some level of additional maintance however there is another the most easy solution. You need to have just two things: create a servlet in source code & Configure destination replication agent in source AEM Instance. Follow below steps to understand clearly.

Pros of this solution:

  • First, a good thing is that it is pretty easy & you can replication any JCR path. Include a content package, one page/child pages, one image/set of images. if you replicate a content package then no need to install in the destination environment. And, Helpful when you just need some pages in your QA or dev from the stage. Not whole content.
  • No dependency. No installation. Just one servlet, replication agent. And, using out of the box API. 
  • Pretty extensible. You can build fancy UI out of it & make it a tool out of it.
  • Cross-environment replication & replication only for content movement. Any environment can be a source or destination. Having a separate replication agent just for copying content does not cause any replication queue issue.
  • Cons is it is still using replication API & not any fancy third-party solution.

NOTE: I have build a tool which solves all the issue a content package has. But, not yet sure if I could simply provide source code here. However, let me know if you need some help or idea to understand the full solution.

Agent Configuration in AEM Source Instance: AEM content source is the AEM instance (author or publish) where you would be fetching content and destination where you want to upload content.

Replication authoring – Nothing different from other replication agent except Triggers configuration. Do the same as you see in the snapshot.

Hit this URL from a browser after your servlet & agent is done:  http://localhost:4502/bin/support/content/publisher?path=etc/packages/abc.zip&destEnvName=QA&publishChildNodes=true.&nbsp; publishChildNodes is required when you want to publish child nodes also.

Replication Request Handler

import com.day.cq.replication.*;
import org.apache.commons.lang3.StringUtils;
import org.apache.felix.scr.annotations.Component;
import org.apache.felix.scr.annotations.Reference;
import org.apache.felix.scr.annotations.sling.SlingServlet;
import org.apache.http.HttpStatus;
import org.apache.sling.api.SlingHttpServletRequest;
import org.apache.sling.api.SlingHttpServletResponse;
import org.apache.sling.api.resource.Resource;
import org.apache.sling.api.servlets.SlingAllMethodsServlet;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.jcr.Session;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

/**
 * Sample URL http://localhost:4502/bin/support/content/publisher?path=etc/packages/abc.zip&destEnvName=QA&publishChildNodes=true
 */
@SlingServlet (paths = "/bin/support/content/publisher",
 methods = "GET", metatype = true, label = "Content publisher to publish content across environments")
public class PackagePublisher extends SlingAllMethodsServlet {
    private static final Logger LOGGER = LoggerFactory.getLogger(PackagePublisher.class);

    @Reference
    private Replicator replicator;
    private List<String> activatedPathsList;
    @Override
    public final void doGet(final SlingHttpServletRequest request, final SlingHttpServletResponse response) throws IOException {
  String requestPath = request.getParameter("path");
  String publishChildNodes = request.getParameter("publishChildNodes");
  final String destEnvName = request.getParameter("destEnvName");
  if (StringUtils.isNotBlank(requestPath) && StringUtils.isNotBlank(destEnvName)) {
      activatedPathsList = new ArrayList<String>();
     Session userSession = request.getResourceResolver().adaptTo(Session.class);
  ReplicationOptions replicationOptions = new ReplicationOptions();
 AgentFilter agentFilter = new AgentFilter() {
    public boolean isIncluded(Agent agent) {
 if(agent.getId().toLowerCase().contains(destEnvName.toLowerCase())) {                   return true;
                    }
                    return false;
                }
            };
            replicationOptions.setFilter(agentFilter);
            LOGGER.info("replication starting ");
            try {
                replicator.replicate(userSession, ReplicationActionType.ACTIVATE, requestPath, replicationOptions);
                Resource childResource = request.getResourceResolver().getResource(requestPath);
                if ("true".equalsIgnoreCase(publishChildNodes)) {
                       publishChildPages(childResource, userSession, replicationOptions);
                }
                for (String path: activatedPathsList){
                    LOGGER.info("Activate paths" + path );
                }
                response.setStatus(HttpStatus.SC_OK);
                response.getWriter().print("given path is replicated to given environment. Check in destination env.");
            } catch (ReplicationException e) {
                response.setStatus(HttpStatus.SC_BAD_REQUEST);
                response.getWriter().print("Check Parameters. Also check author replication agents for " + destEnvName);
                e.printStackTrace();
            }catch (Exception ex){
                response.setStatus(HttpStatus.SC_BAD_REQUEST);
                response.getWriter().print("Something was wrong!!");
            }
        } else{
            response.setStatus(HttpStatus.SC_BAD_REQUEST);
            response.getWriter().print("Parameters are not passed.");
        }
    }

    private void publishChildPages(Resource childResource, Session userSession,
                                   ReplicationOptions replicationOptions) throws ReplicationException {
             if (childResource != null) {
                Iterator<Resource> itr = childResource.listChildren();
                while (itr.hasNext()) {
                    Resource temp = itr.next();
                    if (!temp.getPath().contains("rep:policy") && !temp.getPath().contains("jcr:content")) {
                        if (temp.hasChildren()) {
                            publishChildPages(temp, userSession, replicationOptions);
                        }
                        activatedPathsList.add(temp.getPath());
                        replicator.replicate(userSession, ReplicationActionType.ACTIVATE, temp.getPath(), replicationOptions);
                    }
                 }
            }
    }
}

Final Thought

I found it very easy in day to day work when you want to move content here & their. However, if there is any confusion & question. leave a comment. will respond asap. thanks.

You can further extend this utility and have automatic script to package and transport from source instance to destination.