Clean Browser Automation

I try not to pollute my machine with libraries and apps due to my experiments, so I cannot do without Homebrew and Docker. Here’s how I quickly set up an isolated environment for some dirty “browser automation” I needed 😉

https://github.com/SeleniumHQ/docker-selenium

Good quick start guide. As there is no selenium for M1/arm yet, it errors. Adding the platform (and an optional name) works wonders.

projects % docker run -d -p 4444:4444 -p 7900:7900 --shm-size="2g" selenium/standalone-firefox:4.1.1-20211217 
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

projects % docker run -d -p 4444:4444 -p 7900:7900 --shm-size="2g" --platform linux/amd64 --name selfox selenium/standalone-firefox:4.1.1-20211217
ce4ee1ba7b31c59b7c9964abd1c219b87a4ab49098fb4436dd0a1ba797a6896b

As mentioned in the quick start, browse to http://localhost:4444/ to check your sessions and http://localhost:7900/ for VNC to look at the browser.

For the code, here’s Kotlin-in-a-script:

import org.openqa.selenium.*
import org.openqa.selenium.firefox.FirefoxOptions
import org.openqa.selenium.remote.RemoteWebDriver
import java.net.URL

fun WebDriver.f(selector: String): WebElement = this.findElement(By.cssSelector(selector))

fun main() {
    val driver = RemoteWebDriver(URL("http://localhost:4444"), FirefoxOptions())
    try {
        driver.get("https://www.google.com")
        val text = driver.f("input[name=btnI]").getAttribute("value")
        println(text)
    } finally {
        driver.quit()
    }
}

Now I can destroy, recreate and reuse it for other projects easily.

ng-admin + JAX-RS: 400 Bad Request on DELETE

I’m tried of building admin UIs and I’m trying out ng-admin. It’s pretty straightforward to setup given the guides and demos.

List, create, updates were fine until I got to the DELETE method. The server was throwing 400 Bad Requests and upon Chrome network inspection I discover that ng-admin was sending a JSON body in the request. I don’t really care who is “following the standard” as long they work together (think browsers and jquery), so I’m fine to fix either side to either the client not send the body, or the server accepting the non-empty body.

ng-admin uses Restangular under the hood to make REST requests. Restangular did have this FAQ about DELETEs with body(s).

A little refactoring and presto! DELETE now works.


app.config(['RestangularProvider', function(RestangularProvider) {
  RestangularProvider.setRequestInterceptor(function(elem, operation) {
    return (operation === "remove") ? undefined : elem;
  });
}]);

Wrong encoding when JSP includes HTML

We had a standard JSP header with logo and menu, and we used it to include a few static HTML pages (e.g. privacy policy and about us). We had multiple versions of the static content in different languages, and the non-English ones shows up in gibberish, although the menu shows up fine.

We already specified the encoding in our enclosing JSP page.

<%@ page pageEncoding="UTF-8"%>

as well as specified the charset in the meta header

A common “workaround” is to convert the html to JSP to specify the encoding. However being lazy developers, we want a better solution. Eventually we found the hint to set the encoding in web.xml. We tweaked the extension from .jsp to .html and it works!



  
    *.html
    UTF-8
  

double HTTP requests

We first discovered that our webapp was receiving duplicate HTTP AJAX requests from clients that results in a database insert. Fortunately jQuery had a nocache timestamp as part of the AJAX request so we could recognize it as a duplicate and reject the 2nd request.

As we tried to narrow down the cause we found that it happens on both GET/POST, as well as on a variety of browsers and network providers. After days of trying to reproducing the behavior we resorted to using iMacro to do browser automation and finally manage to reproduce it occasionally. What was surprising was that we were receiving the response of the 2nd response, which was the rejection message! (while the database insert was successful) It was absolutely confusing and we set up the automation again with Wireshark.

The packet analysis confirmed that the client browser was sending the duplicate request. However we also scanned Firebug and confirmed that only a single AJAX call was made! It didn’t make any sense to any of us until I noticed a pattern in the Wireshark logs that the 1st request made was always terminated without a response (either a TCP Connection Reset or a proper TCP Connection Close initiated by the server), and the 2nd request would be made on another kept-alive TCP connection. Following that symptom with Google I chanced upon a blog that highlighted HTTP 1.1 RFC 8.2.4.

If an HTTP/1.1 client sends a request which includes a request body, but which does not include an Expect request-header field with the “100-continue” expectation, and if the client is not directly connected to an HTTP/1.1 origin server, and if the client sees the connection close before receiving any status from the server, the client SHOULD retry the request.

Could this be it? We mocked up a HTTP server using raw Java ServerSockets by intentionally closing the first connection and allowing the second. We logged the requests on the server-side and voila! The mock server showed that it received two HTTP requests, Firebug showed that it sent one only, and Wireshark showed that the browser sent two.

Essentially, we just proved that the browsers adhered to HTTP 1.1 RFC 8.2.4…….

Trimpath unterminated string literal in IE

We’re using an ancient library called TrimPath to render templated data with JavaScript, and a peer hit the “unterminated string literal” in eval() on IE only. I assisted to solve it, by extracting the offending string and isolated the cause to a ‘\n’ within the string. It was easy to cut away the additional newlines and IE is happy. (This is on the assumption that the newline are decorative and not necessary as part of the output)

However, I wasn’t satisfied what caused the newline to be parsed wrongly, and why IE only. The good thing with Open Source is we get to dig in ourselves to troubleshoot issues.

After stepping through the cause is identified: multiple continuous \n not parsed correctly. Apparently someone hit this at least 5 years ago and there was a fix for it, by detecting the additional \n and eating it up. But it still fails if there are more \n.

Why IE only? We were using script tags as the template, and in IE script.innerHTML is prefixed with an additional \n.

The following snippet will fail in IE, but pass in FF.



    
        
    
    
        
        

If another newline is added (with no additional whitepsace), FF will fail with the same error.
If <textarea> instead of <script> is used, it will pass, but adding more newlines will also cause it to fail.

As an anecdote, I’m not sure how our project got to use 1.1.2, when the last on the download site is 1.0.38 (legacy stuff…). But a search on the web shows it’s not just us

TimeUnit

How many times have you got a millisecond long timestamp and had to compare it to a duration e.g. 1000 * 60 * 60 * 24 * … ? I stumbled upon this more readable and less error prone way of representing, say, 5 hours, by using the java.util.concurrent.TimeUnit.


private static final long FIVE_HOURS = TimeUnit.MILLISECONDS.convert(5, TimeUnit.HOURS);
...
if (timestamp < System.currentTimeMillis() - FIVE_HOURS) {
    // do something.
}

The above will check if a timestamp is overdue for more than five hours. The variable twoDays will have the value 172800000. TimeUnit supports from nanoseconds, micro-, milli-, all the way up to days.

However, if the date/locale is important to you (e.g. daylight savings) then you should use the Calendar API/JodaTime rather than this "duration"-based TimeUnit.

maven compile jrxml to jasper

Unless your jasper reports change at runtime, .jrxml templates should be compiled at compile time into .jasper, and you will not need JDT at runtime, nor need to re-compile the reports each time it is run.

If you’re on maven, simply paste the usage guide into your pom and change your JasperCompileManager.compileReport(InputStream) into JRLoader.loadObject(InputStream).

When they say HashMap is not thread-safe, it means it is not.

We had this deadlock-like behavior where the UI froze sometimes when performing a particular action. But when run with our profiler, it did not report a deadlock when it froze.

After hours of tracing, the cause was identified as an infinite loop on the HashMap.get() method, running in the AWT Event Dispatch thread. The HashMap is the constraints object in a javax.swing.SpringLayout. Why would that happen???

More tracing revealed that there were multiple threads trying to add components to this container, triggering SpringLayout to add new constraints to its map. With reasonable luck the map hits its threshold and tries to resize itself. If you understand hashing you know that this is when it picks a new table size and re-hashes all the objects into its new buckets. When two of this happen at the same time the linked list screws up, and with more luck a later element in the link list points to the next element which is earlier in the same list. The worse part is this is a hidden problem, it doesn’t “surface” until you try to get() an object that’s even later than the later element, that’s when it iterates the bucket past the later element, back to the earlier element, and repeats.

The typical solution to swap the HashMap for a Hashtable is not available here, because the map exists in the core JDK. So it’s a good time to re-iterate to the team the significance of “single threaded model” of AWT/Swing. To be explicit, this issue was resolved by queuing the adding of components to the container on the Event Dispatch thread. Of course a more direct approach would be to synchronize the points where the components were added, but you shouldn’t be doing such things outside the AWT Event Dispatch thread anyway.

When I googled about the HashMap.get() infinite loop I realize it occurs in other areas as well, e.g. a Servlet’s session may be represented as a HashMap. Concurrent put()s and then a get() can cause the same problem.

abc.getXyz()

Why is it not good to keep using “abc.getXyz()” in the same method?

1. This is especially important if you are expecting the same value to be returned for each call. A 2nd call can potentially give you a different result from the first, therefore caching it in a local variable is a good idea.

2. The call may be expensive. Each call may be going through a web service to query a database for the return value. If you save it in a local variable it will only be done once. Even if you “know” that it doesn’t now, you should expect polymorphism.