Importing Excel XLSX files in script

Started by Manj75, October 11, 2021, 12:44:14 PM

Previous topic - Next topic

Manj75

I was wondering if anyone has had any success in using SheetJS (https://sheetjs.com) library to create an Excel .xlsx import script?

I have previously created an Excel export script using the myExcel library package, but this has no ability to import in.  Searching this forum I found ony one topic where JB suggested to use SheetJS (Topic 1086: https://forum.archimatetool.com/index.php?topic=1086.msg5801#msg5801).

I have tried to use SheetJS but it's not a simple API for loading in the xlsx file from localhost.  All examples used are web based and use HTTP methods to fetch the file, which seems complex for my entry level ability ;-)

What I have tried is to use the java.io.FileReader to read the file byte at a time and then convert to a typed buffer (Uint8Buffer), but when parsing it into XLSX.read(data, {type:'array'})

it just reads it as a new workbook and does not parse it.  A work book is returned but the first sheet name is 'Sheet 1' when it should be 'Test Sheet 1', and there is no data read in when there is in the xlsx file.

I would welcome any help from anyone who has successfully used SheetJS in a jArchi script - let me know if you want me to post the code snippet.

Phil Beauvoir

If you can post the code that you tried so far, I can take a look at it.
If you value and use Archi please consider making a donation! https://www.archimatetool.com/donate

Jean-Baptiste Sarrodie

October 15, 2021, 07:10:27 AM #2 Last Edit: October 15, 2021, 07:29:17 AM by Jean-Baptiste Sarrodie
Hi,

Quote from: Manj75 on October 11, 2021, 12:44:14 PM
I was wondering if anyone has had any success in using SheetJS (https://sheetjs.com) library to create an Excel .xlsx import script?

I've tried several time (and tried again since you published your message) and using different libraries, but I've never been able to make it work. As suggested by Phil, please share your current attempts and maybe together we'll make it work.

On my side, I've tested the following libs:

For most if not all of them, I had to use GraalVM script engine. And for some of them which can only be loaded through 'require()', I had to use jvm-npm.

In addition, some rely on readfully for which you can use this polyfill (which works most of the time):

function readFully(url) {
    var result = "";
    var imports = new JavaImporter(java.net, java.lang, java.io, java.nio.charset);

    with (imports) {

        var urlObj = null;

        try {
            urlObj = new URL(url);
        } catch (e) {
            // If the URL cannot be built, assume it is a file path.
            urlObj = new URL(new File(url).toURI().toURL());
        }

return new String(urlObj.openStream().readAllBytes(), StandardCharsets.UTF_8);
    }

    return result;
}


But at the end, I was not able to read the content of an XLSX file.

Regards,

JB

Edit: just in case, I've found another simpler, maybe useful, polyfill for "require". It is not meant to be used outside a brower (relies on WMLHttpRequest) but could easily be adapted for Nashorn and GraalVM. You can find it here.
If you value and use Archi please consider making a donation! https://www.archimatetool.com/donate

Jean-Baptiste Sarrodie

Hi,

FWIW, I've discovered that GraalVM includes a native module loader (ie. an implementation of require()). We might enable it by default in future version of jArchi, but I've found a workaround to enable it in current version.

For it to work:

1. Create a "node_modules" folder in your "scripts" folder. You can even copy one from your Node.js if you have one.
2. Add this to the top of your script, it will bootstrap the setup:

if($.process.engine != "com.oracle.truffle.js.scriptengine.GraalJSScriptEngine") {
  console.log("This script only works with GraalVM script engine.")
  exit();
}

var System = Java.type("java.lang.System");
if(System.getProperties().get("polyglot.js.commonjs-require") != "true") {
  System.getProperties().put("polyglot.js.commonjs-require", "true");
  System.getProperties().put("polyglot.js.commonjs-require-cwd", __SCRIPTS_DIR__);
  console.log("GraalJS Module loading was previously disabled. Restart your script");
  exit();
}


To test this setup, simply create a dummy module under "node_modules" folder, for example this one named "add.js":
// add.js
function add (a, b) {
  return a + b
}

module.exports = add


And then, you can use it in your own script:
console.clear();

if($.process.engine != "com.oracle.truffle.js.scriptengine.GraalJSScriptEngine") {
  console.log("This script only works with GraalVM script engine.")
  exit();
}

var System = Java.type("java.lang.System");
if(System.getProperties().get("polyglot.js.commonjs-require") != "true") {
  System.getProperties().put("polyglot.js.commonjs-require", "true");
  System.getProperties().put("polyglot.js.commonjs-require-cwd", __SCRIPTS_DIR__);
  console.log("GraalJS Module loading was previously disabled. Restart your script");
  exit();
}

const add = require('add');

console.log(add(4, 5))


With this, it should become possible to use a local install of NPM to dowload required modules and then load them from within jArchi. This should help, but will it be enough, I don't know.

Regards,

JB
If you value and use Archi please consider making a donation! https://www.archimatetool.com/donate

yveszoundi

October 18, 2021, 20:17:55 PM #4 Last Edit: October 19, 2021, 20:42:27 PM by yveszoundi
It is possible to successfully import Excel spreadsheets via 2 approaches:
  • JavaScript libraries such as SheetJS. I was only successful with SheetJS.
  • Java reflection load a jar with some Java code leveraging Apache POI or similar library and process the spreadsheet (i.e. return an array of rows from Java to JavaScript).

A) How to with SheetJS
I had to google for xslx.full.min.js, as I was getting some weird errors: I could share the file here, I don't recall exactly where I downloaded it from. Loading those npm libraries occasionally requires repackaging with rollup or webpack from my experience, sometimes it just doesn't work at all!

load(__DIR__ + "lib/xlsx.full.min.js")

let Base64 = Java.type("java.util.Base64")
let Files = Java.type("java.nio.file.Files")
let Paths = Java.type("java.nio.file.Paths")
let filePath = "/home/me/spreadsheet.xlsx"

let fileBytes = Files.readAllBytes(Paths.get(filePath))
let buffer = Base64.getEncoder().encodeToString(fileBytes)
let workbook = XLSX.read(buffer, { type: 'base64'})

let worksheet = workbook.Sheets['Test'];
let rowData  =  XLSX.utils.sheet_to_row_object_array(worksheet)

for (let i in rowData) {
  let rowDataByColumnName = rowData[i]
  // do something with the Map rowDataByColumnName (first row contains the column names)
}

B) How to with a custom jar leveraging Apache POI or similar
This is your typical URLClassLoader code to load an external jar (i.e. readExcel.jar) and call a method via reflection (method.invoke...).

The other "un-described option" is to run an external process and parse a resulting file in a structured format (JSON, etc.).

Jean-Baptiste Sarrodie

Hi,

Quote from: yveszoundi on October 18, 2021, 20:17:55 PMIt is possible successfully import Excel spreadsheets via 2 approaches:

Thank you for sharing !

Using your code and testing different version of SheetJS I was able to make it work in both Nashorn (ES6 mode) and GraalVM.

The only version that seems to work with both script engines is the upcoming v0.17.3 in its 'core' variant ('full' works in GraalVM but makes Archi hang with Nashorn). The version I've used for my test is available here: https://raw.githubusercontent.com/SheetJS/sheetjs/09bd0755b51591f6438ebc03013d70fab1337609/dist/xlsx.core.min.js

Here's my test code (heavily based on yours):
console.clear();

// SheetJS doesn't work with Nashorn in ES5 mode
// Only SheetJS v0.17.3alpha works with Nashorn in ES6 mode
// SheetJS v0.16.8, v0.17.0 and v0.17.3 (still not published) are known to work with GraalVM
load(__DIR__ + 'SheetJS/v0.17.3-09bd075/xlsx.core.min.js');

let filePath = __DIR__ + 'Test.xlsx';
let workbook = XLSX.read(base64encode(readFileSync(filePath)), { type: 'base64'});

// Dump file content in JSON
workbook.SheetNames.forEach(function(sheetName) {
  var XL_row_object = XLSX.utils.sheet_to_json(workbook.Sheets[sheetName]);
  var json_object = JSON.stringify(XL_row_object);
  console.log(json_object);
})

//=================================================================
function readFileSync(path) {
  let Files = Java.type("java.nio.file.Files");
  let Paths = Java.type("java.nio.file.Paths");
  return Files.readAllBytes(Paths.get(path));
}

function base64encode(byteArray) {
  let Base64 = Java.type("java.util.Base64");
  return Base64.getEncoder().encodeToString(byteArray);
}

Regards,

JB
If you value and use Archi please consider making a donation! https://www.archimatetool.com/donate

yveszoundi

Thanks for taking the time to test against different engines, I forgot to mention that I only tested my approach against the "GraalVM script engine". I'll also bookmark the specific SheetJS Git revision that you referenced.