Convert PDF to XML via command prompt

Hello.

I need to covert a PDF file into an XML file using Studio Pro. I know there isn’t an activity for that, so I though maybe I could do it using the command prompt. I have researched onilne, but all I’ve found so far are online converters. I don’t want that because the idea is to do the conversion quickly since the bot is quite slow. Does anybody know a way to convert PDF into XML via command prompt?

Hi @cris-dsc,

If you don’t want to rely on an external website that offers PDF to XML conversion (via interface or API), I would recommend looking into Python libraries that you can install on your PC and interact with using the Execute Python activity. A quick Google search tells me there are many, but I have not tried any myself, nor does ElectroNeek endorse or recommend any of them.

In any case, might I ask what is the problem you are trying to solve by converting the PDF to XML? PDF to XML conversion won’t always provide useful data, and maybe there’s better ways to solve the same problem that we can help you find and execute.

Let me know if any Python libraries worked for you, and if you’d like to discuss any alternative solutions. Thank you!

Thank you for the suggestion. I think I’ve found a different much easier solution.

I need to send both the PDF and the XML files of invoices to the client via e-mail. It’s a SAP automation, so the bot downloads the PDF and then downloads the XML from SAP interacting with the GUI. However, it’s taking too long and the client wants the bot to work faster. I think maybe if it doesn’t download the files from SAP it will be faster, and a way to do that would be to download only the PDF and then convert it to XML via command prompt.

I’ve since found the folder where SAP keeps the XML files of all invoices. So the bot could simply attach the XML file to the e-mail without the need to download it from inside SAP GUI. That would make the bot a little bit faster.

Hi @cris-dsc, thanks for following up!

I understand. I believe converting PDF to XML for that purpose would have been overcomplicating, for sure. System delays such as the time that an application takes to process actions are often the largest “unmovable objects” in terms of speed optimization - even if the bot moves as fast as possible, it still needs to wait for the systems/applications (such as SAP GUI) to do its job in order to move forward.

The workaround you found using the paths of the XML files directly sounds fantastic, that should definitely make the bot faster, since it won’t encounter any system delays when attempting to download those same files from SAP GUI.

Let me know if there’s anything else I can help you with in this case!

1 Like