NodeJS flush fs.writeFileStream()

Initial problem: Simple. Given a folder of .JSON files, extract attributes and write them to another file. Instead of relying on my trusty Groovy, I took this opportunity to implement it in NodeJS.

First attempt was straightforward. Read the folder, for each file, parse JSON, open new file and write it out.

var folder = '/temp/json/';
for (var file of fs.readdirSync(folder)) {
  var json = JSON.parse(fs.readFileSync(path.join(folder, file)));
  var out = fs.createWriteStream(path.join(folder, file.slice(0, -5) + '.csv'));
  for (var item of json.item) {
    out.write(util.format('%s,%s\n',, item.title));

Note: Exception handling, file type checking, etc were removed to retain conciseness and focus on the relevant aspects.

Tested this on folder with 1 file first. Good, output is correct. Tested on 10 files. Same correct output. Now for the first batch of 1000.

Took some time to run, but only 0-byte output files were created. Rate of new file creation also slowed down over time. More tests with less files show that output were all written only after the program ends. Aha! Buffered writes.

That’s still fine, since I get the correct results at the end of the batch. But I get this error before I reach the end, which discards all my buffered writes…

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed – process out of memory

Not ready to give up (nor just repeat runs with smaller batches), I turned to Google.

This guy has the same problem: no writing before program ends.

Event-Driven Model… Awkward for this case, but I refactored the script to trigger process.nextTick().

var folder = '/temp/json/';
for (var file of fs.readdirSync(folder)) {
  process.nextTick(function(file) {
    var json = JSON.parse(fs.readFileSync(path.join(folder, file)));
    var out = fs.createWriteStream(path.join(folder, file.slice(0, -5) + '.csv'));
    for (var item of json.item) {
      out.write(util.format('%s,%s\n',, item.title));

Nope, didn’t help. Is it because all calls were scheduled on the same “next tick”?
Let’s push each file to the subsequent tick.

var folder = '/temp/json/';
var files = fs.readdirSync(folder);

function json2csv(index) {
  if (index >= files.length) return;
  var file = files[index];

  var json = JSON.parse(fs.readFileSync(path.join(folder, file)));
  var out = fs.createWriteStream(path.join(folder, file.slice(0, -5) + '.csv'));
  for (var item of json.item) {
    out.write(util.format('%s,%s\n',, item.title));
  process.nextTick(json2csv.bind(null, index+1));

process.nextTick(json2csv.bind(null, 0));

Still no. Time to try the 2nd suggestion. out.write() did return false after some writes.

var folder = '/temp/json/';

function json2csv(files, start) {
  for (var i=start; i

And... it works! So much for starting with a 10-line script.

It may not be the best tool for the job (subjective), but sometimes it's more efficient to work with a tool you already know; imagine a NodeJS developer without Groovy knowledge would find this easier to write in Node than in Groovy/Bash/Perl/Python.

Disclaimer: I decided to continue pushing writes even when out.write() returns false to simplify the implementation, because I know each input file was only around 1MB, which is safe to buffer. If the input is unknown, writes within the same file may need to be deferred until drained (maybe by transforming the items into an input stream)

HACK: Change issue creator in Bitnami Redmine

I own the administrator account of a Bitnami Redmine that I installed, but I usually work using a regular user account (Unix rule of not using root). Unfortunately I made the unforgivable mistake of creating a regular issue using the Admin account. For “correctness” sake I tried, and searched if I could modify the creator… (talk about non-repudiation…)

Nope, no default method, or requires a plugin. I don’t intend to do this regularly, so I don’t really need a plugin. I decided to mess with the database directly and see if it was easy to understand the schema. Turns out it was too straightforward.


  1. The mysql root password is the same password as the Redmine admin.
  2. I am using a Bitnami Redmine 3.1.0-0 instance, you may need to use “SHOW DATABASES;” to figure out which database.
  3. In the process I used “SHOW TABLES;” and “DESC issues;” to probe the schema. I am just showing the final necessary commands to run.
  4. You can get the issue ID by looking at the URL when the issue is displayed in your browser.
  5. You can mouseover the desired user in the browser to peek at the user’s ID to be used as the author_id.
> ./mysql -u root -p
Enter password: 
mysql> USE bitnami_redmine;
Database changed

mysql> UPDATE issues SET author_id=3 WHERE id=59;

Refresh your browser.

R: Find rows that contains vector

Imagine stations on several train lines. Given a station pair, find the lines that allow travel between these stations (no transfers!)

> # install.packages("qpcR")
> library(qpcR)
> stations ="Pasir Ris", "Tampines", "Simei", "Tanah Merah", "Bedok", "Kembangan", "Eunos", "Paya Lebar", "Aljunied", "Kallang", "Lavender", "Bugis", "City Hall", "Raffles Place", "Tanjong Pagar", "Outram Park", "Tiong Bahru", "Redhill", "Queenstown", "Commonwealth", "Buona Vista", "Dover", "Clementi", "Jurong East", "Chinese Garden", "Lakeside", "Boon Lay", "Pioneer", "Joo Koon"),
+                 NSL=c("Jurong East", "Bukit Batok", "Bukit Gombak", "Choa Chu Kang", "Yew Tee", "Kranji", "Marsiling", "Woodlands", "Admiralty", "Sembawang", "Canberra", "Yishun", "Khatib", "Yio Chu Kang", "Ang Mo Kio", "Bishan", "Braddell", "Toa Payoh", "Novena", "Newton", "Orchard", "Somerset", "Dhoby Ghaut", "City Hall", "Raffles Place", "Marina Bay", "Marina South Pier"), 
+                 NEL=c("HarbourFront", "Outram Park", "Chinatown", "Clarke Quay", "Dhoby Ghaut", "Little India", "Farrer Park", "Boon Keng", "Potong Pasir", "Woodleigh", "Serangoon", "Kovan", "Hougang", "Buangkok", "Sengkang", "Punggol"),
+                 CCL=c("Dhoby Ghaut", "Bras Basah", "Esplanade", "Promenade", "Nicoll Highway", "Stadium", "Mountbatten", "Dakota", "Paya Lebar", "MacPherson", "Tai Seng", "Bartley", "Serangoon", "Lorong Chuan", "Bishan", "Marymount", "Caldecott", "Bukit Brown", "Botanic Gardens", "Farrer Road", "Holland Village", "Buona Vista", "one-north", "Kent Ridge", "Haw Par Villa", "Pasir Panjang", "Labrador Park", "Telok Blangah", "HarbourFront"),
+                 DTL=c("Bukit Panjang", "Cashew", "Hillview", "Beauty World", "King Albert Park", "Sixth Avenue", "Tan Kah Kee", "Botanic Gardens", "Stevens", "Newton", "Little India", "Rochor", "Bugis", "Promenade", "Bayfront", "Downtown", "Telok Ayer", "Chinatown"))

> apply(stations, 2, function(route) { all(c("Dhoby Ghaut", "Bishan") %in% route) })
  EWL   NSL   NEL   CCL   DTL 

Excel VLOOKUP in R via Rolling Join

Imagine a car park with different parking costs for parking per hour or part thereof. Assume also there is no pattern, thus a mapping table of hour -> cost:

hr cost
0 0.30
1 0.60
2 0.80
3 1.20
4 1.30
5+ 1.60

Parking beyond 5 hours will max your charges at $1.60.

In Excel there is the VLOOKUP function, with Range_lookup=TRUE to find the nearest match.

In R we can do a rolling join on a data table. Without the roll, it works like Range_lookup=FALSE; it finds an exact match.

> # install.packages("data.table")
> library(data.table)
> fees <- data.table(hr=c(0, 1, 2, 3, 4, 5), 
                   cost=c(0.3, 0.6, 0.8, 1.2, 1.3, 1.6))
> fees
   hr cost
1:  0  0.3
2:  1  0.6
3:  2  0.8
4:  3  1.2
5:  4  1.3
6:  5  1.6
> query <- data.table(parked=c(0.4, 1.5, 2, 2.14, 4.5, 10))
> setkey(fees, hr)
> fees[query]
      hr cost
1:  0.40   NA
2:  1.50   NA
3:  2.00  0.8
4:  2.14   NA
5:  4.50   NA
6: 10.00   NA
> fees[query, roll=TRUE]
      hr cost
1:  0.40  0.3
2:  1.50  0.6
3:  2.00  0.8
4:  2.14  0.8
5:  6.00  1.6
6: 10.00  1.6

PostgreSQL 9.4 on CentOS 6.6

As usual there are many guides out there on installing something on some OS, but with Linux I never got a guide that could bring me straight through (every environment, every version requires different setup). So here’s my very own steps for installing PostgresSQL 9.4 on CentOS 6.6. (also for my future self-reference)

Prerequisites: Ensure DNS and HTTP(S) working for yum, otherwise you may encounter Host not found, etc. (This is out of scope as it may be nameservers or firewall settings)

1. Configure yum repo

sudo rpm -Uvh
sudo yum install postgresql94-server postgresql94 postgresql94-contrib

2. Initialize the database

sudo service postgresql-9.4 initdb
sudo service postgresql-9.4 start

3. Connect and create the database
After default installation, only the “postgres” user can access the database, but it has no password.
Create the database and grant a user access, which you will use to manage the database subsequently (don’t use “postgres” user)

sudo -u postgres psql postgres
    CREATE USER devuser WITH PASSWORD 'devpass';
    GRANT ALL ON DATABASE devdb TO devuser;

4. Allow remote connections
pg_hba.conf allows any IP to connect ( and authenticate using md5. You can also restrict this to your webserver IP only.
postgresql.conf will let the server listen on all attached IPs.

sudo vi /var/lib/pgsql/9.4/data/pg_hba.conf
	host    all     all       md5

sudo vi /var/lib/pgsql/9.4/data/postgresql.conf
	listen_addresses = '*'

sudo service postgresql-9.4 restart

5. Move the data to another disk
My main disk was a default 10GB, enough for OS and programs but not for the database data. I have a spanking new 300GB disk attached, and I want to move the table space to the new disk.
There were several methods involving specifying the data directory but I found it was easier to just link it.

sudo service postgresql-9.4 stop
sudo mv /var/lib/pgsql/9.4/data /media/xvdb1/pgsql/9.4/
sudo ln -s /media/xvdb1/pgsql/9.4/data/ /var/lib/pgsql/9.4/data
sudo chown postgres:postgres /var/lib/pgsql/9.4/data
sudo service postgresql-9.4 start

6. Autostart
Finally, configure PostgreSQL to start itself on boot.

sudo chkconfig postgresql-9.4 on

Ready-to-use PostgreSQL.

ng-admin + JAX-RS: 400 Bad Request on DELETE

I’m tried of building admin UIs and I’m trying out ng-admin. It’s pretty straightforward to setup given the guides and demos.

List, create, updates were fine until I got to the DELETE method. The server was throwing 400 Bad Requests and upon Chrome network inspection I discover that ng-admin was sending a JSON body in the request. I don’t really care who is “following the standard” as long they work together (think browsers and jquery), so I’m fine to fix either side to either the client not send the body, or the server accepting the non-empty body.

ng-admin uses Restangular under the hood to make REST requests. Restangular did have this FAQ about DELETEs with body(s).

A little refactoring and presto! DELETE now works.

app.config(['RestangularProvider', function(RestangularProvider) {
  RestangularProvider.setRequestInterceptor(function(elem, operation) {
    return (operation === "remove") ? undefined : elem;

multiple definition of `R_running_as_main_program`

I inherited a C++ application that was using rcpp to embed R. After R was upgraded to 3.2, the make was failing miserably with the error:

file1.o:(.bss+0x0): multiple definition of `R_running_as_main_program'
file2.o:(.bss+0x0): first defined here

This was found to be caused by

To overcome this, edit /usr/share/R/include/Rinterface.h and search for this line:

int R_running_as_main_program;

Add the “extern” keyword at the start and save it:

extern int R_running_as_main_program;

After that there should be no problem building the program.

Lenovo T440p keyboard

I was just issued a Lenovo ThinkPad T440p, not that I get to choose the model. Immediately I got down to customizing the quirks (to me).

1. Fn/Ctrl key swap
Ctrl-C Ctrl-V Ctrl-Z Ctrl-W all don’t work. Because in the Ctrl’s position is a Fn. This link explains why (so we could find the Fn key for ThinkLight in the dark), but I didn’t really need the ThinkLight so I swapped it anyway.

Go to BIOS > Keyboard/Mouse > Fn/Ctrl Swap > Enabled.

2. F1-F12 lock
F1-F12 could only be accessed with the Fn key. so F2 to rename became Fn-F2, and F5 to refresh became Fn-F5.

Press Fn-Esc to lock the Function keys.

3. Trackpad Scroll
I use a MBA at home so the two-finger scroll was reversed.

To keep myself sane, go to Control Panel > Mouse > Change Mouse Settings > ThinkPad > Advanced > Scroll > Two-Finger Scrolling > Check Switch Direction.

4. Lenovo Message Center Plus
The big red icon on the taskbar was an eyesore to me.

Right click on the taskbar > Toolbars > Uncheck Lenovo Solution Center.

Glassfish timezone different from OS (Ubuntu)

We added a new Ubuntu server to deploy one of our new feature to isolate it from the core modules. The new module did not work, and we narrowed it to System.currentTimeMillis() returning a time in the future.

NTP was active; we ran the linux “date” command and it was showing the correct date. When we checked Glassfish’s JVM report, it showed that the user.timezone was different, and the timezone difference coincides with the time differences we observed. The straightforward answer: set -Duser.timezone in the JVM options and we are good to go.

But wait, why our original servers didn’t have this issue? We checked the original server, and there was no such JVM option setting. After some tracing it turns out that /etc/timezone of the new server was incorrect, and that influenced the timezone Glassfish used. Finally we matched the two server settings and reverted the user.timezone JVM option.

mysql ibdata keeps growing

Didn’t manage to collect complete information for this post, so I’ll just write whatever I have.

It all began when our development database server ran out of disk space and crashed. We mounted a temporary virtual hard disk and moved the database there, and all was well for a while. With df and du we narrowed our culprit to mysql’s ibdata, which was growing so fast that we will run out of disk space again soon enough. Public information tells us ibdata is supposed to always grow, but we do not expect our dev ibdata to grow at this rate. After multiple searches this blog finally closes us in.


With the command, the innodb history list length was a very large number (>1mil) and keeps going up. A check with our test and production databases show that the length goes at most to a few hundred and drops back down — a significant difference. Restarting the database doesn’t help.


This showed us we had 16 XA transactions, that were started 2 days ago, but never committed or rolled back. They are not locked, so their trx_mysql_thread_id is 0. We immediately linked the causes together. Stuck XA transactions -> history list growing.


According to mysql docs, this command can rollback the XA transactions. The user comment was especially helpful on how to reproduce the xid, reproduced verbatim here:

To rollback the transaction, first get its xid:

mysql> xa recover;

| formatID | gtrid_length | bqual_length | data                                                       |
|   131075 |           30 |           28 | 1-a00640d:c09d:4ac454ef:b284c0a00640d:c09d:4ac454ef:b284c2 |
1 row in set (2.13 sec)

The xid is present in this output, but you have to perform a little string manipulation to get it. The format of a xid is: gtrid,bqual,formatID. The column 'data' contains a concatenation of 'gtrid' and 'bqual'. The columns 'gtrid_length' and 'bqual_length' specify how many bytes each of these values uses; use them to split apart 'data'. In this example, the result is:

mysql> xa rollback '1-a00640d:c09d:4ac454ef:b284c0','a00640d:c09d:4ac454ef:b284c2',131075;

ERROR 1402 (XA100): XA_RBROLLBACK: Transaction branch was rolled back

The tricky part here was that, my data had binary characters, which I couldn’t directly copy and paste in the MySQL Workbench. I couldn’t bear to write a program to read the value and write it back either, so I was poking around for solutions on that. From the same mysql doc page,

gtrid and bqual must be string literals, each up to 64 bytes (not characters) long. gtrid and bqual can be specified in several ways. You can use a quoted string ('ab'), hex string (0x6162, X'ab'), or bit value (b'nnnn').

Good, I could write the xid in hex. So I right-clicked the data column, and “Open Value in Viewer”. In the binary tab I copied down the hex values and reconstructed the xid as described by the helpful comment.

XA ROLLBACK X'7e3ae860eb21de21b84d392cb03bf8363b41482b9b1207f6e6823355012e91858c',X'526801e7500b06fd05a2f5882d20be19982a46aed4b6c26dc63887',4264851;

Viola, one by one the transactions were gone. Once the last one was rolled back, the history list started to decrease and behave in a similar pattern as our other databases, and the ibdata stopped growing at the crazy rate. The one last part I haven’t figured out is: how do I copy the hex values from MySQL Workbench, or how do I show the XA RECOVER data column in hex?