Best way to upload & download documents using SAP Document Management Service

Let’s test and define one of the best ways to upload and download documents using the SAP Document Management Service,

Document Upload to Server usually takes up memory and disk space so let’s see if we can optimize on some of the things required.

The Solution

Let’s build a Node.js server & compute the difference between the different approaches of uploading and downloading files

Why Node.js ?

Node.js is built on an event-driven, non-blocking I/O model, making it particularly suitable for handling concurrent I/O operations. This asynchronous nature allows Node.js to efficiently handle a large number of concurrent connections without the overhead of threads, making it well-suited for applications requiring for high scalability.

We will be using the SAP Document Management Service Client mentioned in this blog Node Js Client

Node Server

const REPOSITORY_ID = "com.demo.test.sdm";

let sm = new CmisSessionManager(sdmCredentials);
// create Repository if not there
// await sm.createRepositoryIfNotExists(REPOSITORY_ID, "provider", {});


const app = express();
const port = 3000;
app.use(cors());

app.get('/', (req, res) => {
    res.send('This is a test Server!');
});

app.listen(port, () => {
    console.log(`Server is running on port ${port}`);
});

This is a simple node.js server that accepts incoming requests.

Download (Stream)

Let’s create a download API, that will receive a object path from client and return the document to the client, but the catch is that the process should be very memory efficient.

app.get('/download', async (req, res) => {
    // create or reuse cmis session
    let session = await sm.getOrCreateConnection(REPOSITORY_ID, "provider");
    // get object by path req.param.objectPath = "/temp/doc.pdf"
    let obj = await session.getObjectByPath(req.param.objectPath);
    // get the content Stream
    let result = await session.getContentStream(obj.succinctProperties["cmis:objectId"]);
    
    // update the needed Headers
    res.header('Content-Type', obj.succinctProperties["cmis:contentStreamMimeType"]);
    res.header('Content-Length', obj.succinctProperties["cmis:contentStreamLength"]);
    res.header('Content-Disposition', `attachment; filename="${obj.succinctProperties["cmis:name"]}"`);
    
    // pipe the doc store response to the client
    result.body.pipe(res);
});

The above approach has the following advantages:

1.Memory Efficiency: Streams process data in smaller chunks, allowing you to handle files larger than available memory.

2.Scalability: When dealing with multiple streams of data or concurrent I/O operations, using streams enables efficient handling of concurrent requests without blocking other operations.

3.Reduced Latency: Streams can start processing data as soon as they receive the initial chunks, reducing overall latency.

Upload (using Multer)

Let’s create a upload API using Multer, where the client will upload a single file and the server will upload the single file to DMS, but we have to do in the most memory efficient way.

// Set up storage for uploaded files
const storage = multer.diskStorage({
    destination: function (req, file, cb) {
        cb(null, 'uploads/');
    }
});

// Create multer instance with the storage configuration
const upload = multer({storage: storage})

app.post('/upload', upload.single('file'), async (req, res) => {
    if (req.file) {
        // create or reuse DMS session
        let session = await sm.getOrCreateConnection(REPOSITORY_ID, "provider");
        // multer collects a file parts and returns temp file path.
        // let's create a read stream from that path
        let readStream = fs.createReadStream(req.file.path);
        // upload the file to DMS
        let response = await session.createDocumentFromStream("/temp", readStream, req.file.originalname)

        res.status(200).end(response.data);
    } else {
        res.status(400).end('Uploaded File not found.');
    }
});

In this approach we are using Multer to parse the multipart form data, where the file is getting uploaded.As we are using disk storage options, the uploaded file will get cached in the server’s disk and the Multer will pass a file reference along with REQ for the controller to use.Next, we will stream the file to the DMS and storing it there.

Results

Results: 

Total Data Uploaded : 2 GB
No of files : 10 txt files
Time Taken: 12 mins
Total Memory Used: 3.8 GB 
Served Disk Used : 2.1 GB

Some advantages & disadvantages are:

Advantage:

  • Server-Side Validation: With Multer, developers can perform server-side validation on uploaded files.
  • Network Faults: removes network faults that can happend from client side. server has the entire file and then it gets uploaded to DMS.
  • Ease of Handling File Uploads: Multer simplifies the process of handling multipart/form-data uploads in Node.js.

Disadvantages

  • Server-Side Resource Consumption: When using Multer with disk storage, uploaded files are temporarily stored on the server’s disk before processing. This might consume server disk space, especially when dealing with large or numerous uploads.
  • Handling Large Files: Multer might encounter limitations when handling very large files that exceed server memory or disk space.

This approach is only good when you are prototyping for MVP or small files.

Upload (using DMS Append)

Lets create another upload API where we try to remove the Multer disk & memory limitation.

app.post('/upload-optimised', async (req, res) => {

    // get the file metadata from custom headers.
    const fileName = req.headers["cs-filename"];
    const opType = req.headers["cs-operation"];
    const mimeType = req.headers["content-type"];

    // create or reuse the DMS session
    let session = await sm.getOrCreateConnection(REPOSITORY_ID, "provider");
    let response = {success: "false"};

    // if operation is "create" then create the document in DMS with initial chuck
    if (opType === "create") {
        // create a document from the response stream
        response = await session.createDocumentFromStream("/temp", req, fileName);
    }

    // if operation is "append" then append the content an existing file
    if (opType === "append") {
        const obj = await session.getObjectByPath("/temp/" + fileName);
        // get the object id from the object path.
        const objId = obj.succinctProperties["cmis:objectId"];
        // append the content to the previously created file.
        response = await session.appendContentFromStream(objId, req);
    }

    res.json(response);
});

In this approach, we are appending the file content over multiple HTTP request, because of this a custom client handling is required.

This is a HTML form that uploads a file to our server, but it manually breaks the code into chunks and uses append functionality.

<html>
<head>
    <title>TEST</title>
</head>
<body>
<h1>Upload a File</h1>
<div>
    <input id="uploadFile" type="file" name="fileInput">
    <input value="Upload" onclick="uploadFile(this)">
</div>
<script>

    // trigger when upload button is clicked
    function uploadFile(event) {
        let elementById = document.getElementById("uploadFile");
        // get the selected file
        const file = elementById.files[0];
        if (file) {
            // read the file content and upload in chunks
            const reader = new FileReader();
            reader.onload = function (event) {
                const contents = event.target.result;
                console.log('File contents:', contents.length);
                uploadFileInChunks(file, contents);
            };
            reader.readAsText(file);
        }
    }

    async function uploadFileInChunks(file, content) {
        // Specify your desired chunk size
        const chunkSize = 1024;

        // console.log(content);

        // total no of chunks to be uploaded, may be created a progress bar
        const totalChunks = Math.ceil(content.length / chunkSize);

        for (let i = 0; i < totalChunks; i++) {
            // calculate start of the chunk
            const start = i * chunkSize;
            // calculate end of chuck
            const end = Math.min(start + chunkSize, content.length);
            // get the chunk from the entire content
            const chunk = content.slice(start, end);
            // Process the chunk
            console.log('Chunk', i + 1, 'of', totalChunks, ':', chunk); 
            // create if first chunk or else append
            const operation = i === 0 ? "create" : "append";

            const myHeaders = new Headers();
            myHeaders.append("cs-filename", file.name);
            myHeaders.append("cs-operation", operation);
            myHeaders.append("Content-Type", file.type);

            const requestOptions = {
                method: 'POST',
                headers: myHeaders,
                body: chunk,
                redirect: 'follow'
            };

            // upload to the server
            const response = await fetch("http://localhost:3000/upload-optimised/", requestOptions);
            console.log(await response.json());
        }
    }
</script>
</body>
</html>

Results

Results: 

Total Data Uploaded : 2 GB
No of files : 10 txt files
Time Taken: 9 mins
Total Memory Used: 200 Mb
Served Disk Used : 0 bytes

From the results we are very clear about the amount of server resource usage.

This method has its own set of advantages and disadvantages:

Advantages of using the Append Approach:

  • Support for Large Files: This approach allows uploading large files that may exceed the server’s memory or disk space limitations. By breaking the file into chunks.
  • Resume-able Uploads: If an upload is interrupted due to network issues or other reasons, the append approach facilitates resuming the upload from where it left off.

Disadvantages of using the Append Approach:

  • Complexity in Client-Side Handling: Implementing the append approach requires significant client-side code to break the file into chunks, manage chunking logic, handle HTTP requests for each chunk, and manage the upload process.
  • Network Overhead: Breaking the file into chunks and sending multiple HTTP requests introduces additional network overhead compared to a single-file upload.
  • Loss of Transactional Integrity: Append uploads do not guarantee transactional integrity during the upload process. In case of failure or interruptions, managing the state of uploaded chunks and ensuring the file’s consistency may pose challenges.
  • Limited Compatibility: Not all server-side storage or services may support append-style uploads. Compatibility issues might arise when integrating with certain storage systems or APIs that expect a complete file upload in a single request.

Conclusion

The choice of approach depends on specific project requirements, including file sizes, network conditions, server capabilities, and desired functionalities. Each approach has its strengths and trade-offs, and selecting the most suitable approach should be based on a careful consideration of these factors.

 

 

Scroll to Top