Pros and Cons of LabVIEW FPGA

Ever since I started developing this LabVIEW FPGA project that uses a MicroBlaze soft processor to process TCP streams, I have learned a lot and can comment on the pros and cons of using LabVIEW FPGA vs using a traditional Xilinx/Altera based FPGA development approach.

For starters, LabVIEW FPGA blows every single other FPGA development system out of the water when it comes to developing prototypes.  I made a prototype for implementing a Monero miner in record time.  I don’t remember how long it took, but you can see my commit history here: https://github.com/JohnStratoudakis/CryptoCurrencies

Then I was able to implement a UDP based orderbook proof of concept, again in record time, see my commit history here: https://github.com/JohnStratoudakis/LabVIEW_Fpga/tree/master/MarketData/MarketData_02/Fpga

Then I decided that I wanted to make my orderbook support TCP/IP, which is what most Market Data Feeds are using, so I embarked on learning how to make LabVIEW FPGA play well with Xilinx Vivado.  I did not realize it at the time, but the knowledge I have gained over the past year is enough to make one not have to live with any of the cons that LabVIEW FPGA comes with.

  • I have learned how to integrate basic VHDL/Verilog IP in to a LabVIEW FPGA project.
  • I have learned how to integrate more complex Xilinx IP such as Adder/Subtractors, Fast Fourier Transforms, and AXI Stream FIFOs.
  • I have learned how to integrate an entire soft-core processor based system in as well.  Including both the simplified MicroBlaze MCS, and the more complex MicroBlaze processors developed by Xilinx.
  • Furthermore, I have been able to communicate between LabVIEW FPGA and the MicroBlaze processor via AXI Stream FIFOs, General Purpose Input/Output registers, and have implemented Interrupt handlers.

Using all of this together, I can develop in a very efficient manner the perfect prototype that uses existing Xilinx IP, IP from opencores.org, or proprietary IP that can use a MicroBlaze soft-core processor all from within LabVIEW FPGA.  This serves a great risk-mitigating factor in that one can tell if an FPGA will be a viable solution for a particular type of problem.  Then, one can choose to keep the LabVIEW FPGA implementation and scale it out, or one can rewrite the portions written in LabVIEW in another language such as Verilog or VHDL.

Usually, the first product that works is what makes it to market and is successful, not because it is the best, but because it is the most adaptable to change. Think Evolution… think VHS, think DVDs, think about the iPod.  These products were market leading because they got the job done right now, not later when all of the features were fully implemented.  Additionally these products were easy to use.

Anyway, I have fully wired up the 10 Gigabit Transceiver in to my MicroBlaze, and have wired the MicroBlaze to my host application, and I am anxiously awaiting my FPGA synthesizer to complete so I can test it out…

10 Gigabit FPGA-based Network Card

So here is the most simple, FPGA-based Network Interface Card that I know of.

This application will start Port 0 of the 10 Gigabit Network interface that is provided by the PXIe-6592R (http://www.ni.com/en-us/support/model.pxie-6592.html) board by National Instruments, and will allow you to do any of the following:

  • Check if any new ethernet frames have been received, and display the information, including the raw bytes of any such received frame
  • Send a raw ethernet frame out of Port 0

I have included the necessary code to parse and generate the following types of packets, enabling you to communicate with another computer on your network that supports:

  • Ethernet II
  • ARP
  • ICMP
  • IPv4
  • UDP

The VI’s to do this are located in the directory “Tests/MAC/Protocols”, simply wire the incoming frame data to the “Parse” VI’s, or write the parameters in to the “Create” VI’s.

How to Parse Incoming Ethernet Frames

For an example of how to parse an incoming frame see the “Poll RX” case inside the bottom While Loop of the “MAC-Tester” vi:

How to Create Ethernet Frames

For an example of how to create a valid outgoing ethernet frame with a valid CRC32 on the end, see the “Transmit Packet” case inside the bottom While Loop of the “MAC-Tester” vi:

This vi calls the “UDP-Create.vi” and wires the size – in bytes – and the frame data in 64-bit words to the transmit FIFO.

Full Source Code

See the source code on GitHub here:

https://github.com/JohnStratoudakis/LabVIEW_Fpga/tree/master/07_10_Gigabit_CLIP

See the README.md for more documentation.

Next?

Now I have to take this code and wire it up to my MicroBlaze implementation that also sits inside the FPGA project.  Only problem right now is that I have only figured out how to configure a 32-bit FIFO, and not a 64-bit FIFO.  So I can either do some sort of translation inside the FPGA or hope and get lucky by configuring the FIFO to be 64 bits wide.  Note: by FIFO, I am referring to an AXI-Stream FIFO.

Screen Shot Generator for LabVIEW

I finished writing an application that exercises the first Port of the 10 Gigabit Ethernet Interface that is provided with the National Instruments PXIe-6592R board and as I started taking manual screenshots via the LabVIEW “File->Print” option I began to ponder, can this be done more easily? Or dare I say it “programmatically”?

The LabVIEW Report Generation Palette has a VI named “Easy Print VI Panel and Documentation”.  In addition to the plethora of options, this VI also is hard to use and proved to be unstable for my purposes.  If you want to try it in your application, see the documentation here:

http://zone.ni.com/reference/en-XX/help/371361H-01/lvreport/easy_print_panel_doc/

I ended up finding a way to manually save a png file with the Front Panel and the Block Diagram of a VI.  I then wrote a program that will recursively generate both a front panel and block diagram screenshot for each vi it encounters.  This makes is easy for me to quickly create and update any vi images so that you can view the source code directly from github, without having to wait until you get home and open the code in LabVIEW.

See the github project here:

https://github.com/JohnStratoudakis/ScreenShotGen

Here is a screenshot of the top level vi of the application:

 

 

 

10 Gigabit FPGA-based Network Code Coming Soon

I am getting real close to finishing my proof-of-concept FPGA-based network card that is based on the PXIe-6592 National Instruments Board which uses the Kinex-7 410t FPGA chip by Xilinx, and has 2GB of DDR3 RAM.

Using the Arty Arix board, I was able to make sure that the MicroBlaze code running the lwIP TCP/IP stack works fine, and I was able to use a NI example to make the 10 Gigabit Ethernet MAC part.  Only issue is that the NI code is quite complex and uses features and ideas that I have never seen before.

Nevertheless, I am iterating over some modifications to the example to allow for a LabVIEW Host network stack that uses the FPGA only for the sending and receiving of ethernet frames.  Once I get that working, I will just switch the connection from LabVIEW Host to the on-board MicroBlaze.

How to Multiply 64 bit Numbers in LabVIEW

What is the product of 0x9D0BF6FDAC70AB52 and 0x6408F6540A1384CB?  Well, according to LabVIEW for Windows, the answer is 0x2D90DE07C0C42206.  According to C++ on OSX (without any optimizations, usage of Intel Intrinsic functions), the answer is also 0x2D90DE07C0C42206.

The real answer is…  0x3D5E2BF7DCBCA6622D90DE07C0C42206.

How do you get this number? You have to use compiler intrinsics, or calculate this value yourself.  LabVIEW does not make it easy to call an Intel Compiler intrinsic, so I took it upon myself to implement this myself.  Here is a screenshot of the implementation in LabVIEW for Windows:

To download and use this code in your project, see:

https://github.com/JohnStratoudakis/CryptoCurrencies/blob/master/Monero/lv-monero/CryptoNight-Step-3/Host-Implementation/Step-3-Multiply-U64.vi

Note: FPGA version is coming soon, but I am busy working on something else right now

 

Some Time with the Arty Arix-7 35T Digilent Board

So I wanted to implement a simple, stripped down version of the open-source lightweight IP stack “lwIP” (https://savannah.nongnu.org/projects/lwip/) inside my LabVIEW FPGA project that I can handle TCP and UDP data streams.

I do not have a lot of experience with this, and I found that building such a project inside Vivado would take around 3 hours to simulate with all of the source code of the lwIP project embedded in the elf file.

I ended up purchasing a $99 board from Digilent that uses an Artix-7 35T board: https://www.xilinx.com/products/boards-and-kits/arty.html.

On this board I was able to run and debug the lwIP source code so that I could figure out how to use it with my configuration.  I creatd a public github repository with this source code, so if you happen to be trying to learn how to use the MicroBlaze processor with this board, check out:

https://github.com/JohnStratoudakis/artix7-35t

Enjoy and I will be working on integrating this lwIP source code in to my LabVIEW FPGA project now.

A Diversion for CryptoCurrencies

I spent some time analyzing the Monero CryptoCurrency source code to understand the algorithm, how it works and to see if it is doable with an FPGA via LabVIEW for FPGA, our secret weapon.
I learned that there are 4 steps to the Monero “CryptoNight” algorithm and that step 3 is the part that does the heavy lifting, with around 500k reads and writes to a small section of memory that is 2 megabytes in size.  This section of memory was specifically selected to be a size that coincides with the size of most processor Level 3 caches.  This is supposed to be what makes the algorithm “memory-hard”.
Locks are meant to be broken, codes cracked… and secrets revealed.
I am thinking – what if I put step 3 inside an FPGA have it use Block RAM?
  • Block RAM is limited on an FPGA, so this may not be worthwhile

Okay, what about DRAM?

  • My FPGA may have DDR3 RAM, but other FPGAs have faster RAM.  If my implementation works well on DDR3 RAM, then I can move it to another FPGA with faster RAM.
  • Will an FPGA user of DRAM be faster than a CPU usage of L3 Cache? Taking in to account of course that the FPGA is the only user of this DRAM controller? What about an FPGA with multiple DRAM controllers?
Well, I know that DRAM is “slow” when compared to other types of memory, but the difference here is that the FPGA is the only user of the DRAM controller.  On any operating system, there are many users, i.e. programs, processes, kernel threads.  So would doing this from an FPGA make the cut?  Would it make that much of a difference?
Well, there is only one way to find out.  Try it out!
I have created a github repository with my work so far here:
I went in to the Monero c++ source code (https://github.com/monero-project/monero/blob/master/src/crypto/slow-hash.c#L581) and saved to a binary file the following variables before the loop with 500k iterations starts (as of this date lines 591 and 600)
  • uint64_t a[2]
  • uint64_t b[2]
  • uint8_t *hp_state (<= this is the scratch pad of 2 megabytes of data)
  • uint8_t *hp_state_out (same scratch pad after CryptoNight Step 3 has run)
I implemented a sandboxed c++ version of this code that does CryptoNight Step 3 in an isolated program that runs with the same values each time.
This c++ program works on OSX and Windows (and probably linux), it uses gradle as its build tool and you can see the source code here:
I then implemented the same algorithm, based on the same source file by using LabVIEW for Windows.  The values match, so we have a working C++ version, a working LabVIEW for Windows version, and now we can determine if an FPGA version will be worth it.
Please note that the LabVIEW version is not optimized code, and I am not a LabVIEW for Windows Developer, and that is probably why it runs so slow… for now.  Yes, it takes over an hour to create one hash.  However, I have consulted with some LabVIEW experts, and they have told me what I should do to make it faster.  I will start working on that, and in the meantime, you can take a look at the ever-changing source code to see what the algorithm involves.  Remember, LabVIEW code is very easy to understand, so this may be the “flow-chart” explanation of what a cryptocurrency miner looks like.
See the LabVIEW code here:
(Requires LabVIEW 2017 to view…)  I will add some png versions of the code soon, but first I want to do some cleaning…

Issues with LabVIEW and Lack of Relative Directory References

So I wanted to mention that I have all of my LabVIEW (and Vivado) code saved on a RAID-1 mirrored location on my network.  From each of my workstations, I map the same network location to my Z drive.  This way any and all issues of LabVIEW referring to absolute paths goes away.  I do not develop in “offline” mode, I am always connected to one of my machines, whether it is by sitting directly in front of the machine or via a Remote Desktop Connection.  If you use a laptop, you could always split a piece off of your normal root partition and make a Z drive for yourself.

To do this yourself, create a network share and open it from Windows Explorer, and then select “Map Network Drive”.  This option will either be an icon or a menu option, and this all depends on the version of Windows that you are using.

So in my case, I have:

\\192.168.0.x\RAID-1 mapped to Z:\

So I work from:

Z:\work\git\LabVIEW_Fpga

More Code Posted to Github

So I have figured out how to use the MicroBlaze Core with an AXI-Stream FIFO, and I have also figured out how to export a project from Vivado by using the Vivado “Write Project TCL” option.

See the following project:

https://github.com/JohnStratoudakis/LabVIEW_Fpga/tree/master/06_MicroBlaze/04_lwIP_Ex

You have to re-generate the Vivado Project and create a new SDK workspace in order to get this to work on your machine.

How to regenerate a Vivado project from a TCL script:

Step 1 – Start Vivado

Step 2 – Change directory to where tcl script is located

Make sure you escape all Windows backslashes with another backslash

i.e.

Z:\work\git\LabVIEW_Fpga\06_MicroBlaze\04_lwIP_Ex\lwIP_Ex

becomes

cd “Z:\\work\\git\\LabVIEW_Fpga\\06_MicroBlaze\\04_lwIP_Ex\\lwIP_Ex”

Step 3 – Source the tcl script

source init.tcl

That’s it!

Note: I am still in the process of converting all of my projects to use this method, if you want a quick taste, check out the project here:

https://github.com/JohnStratoudakis/LabVIEW_Fpga/tree/master/05_MicroBlaze_Mcs/01_MicroBlaze_Mcs_GPIO

New Code Added to GitHub – MicroBlaze MCS, IO Bus and LabVIEW

I just uploaded some code to GitHub that is a full demonstration on how to use LabVIEW FPGA 2017, the MicroBlaze MCS core and the IO Bus that is attached to the MicroBlaze MCS.

Clone the following repository:

https://github.com/JohnStratoudakis/LabVIEW_Fpga/tree/master/05_MicroBlaze_Mcs/02_MicroBlaze_Mcs_IO_Bus

and open the LabVIEW project:

https://github.com/JohnStratoudakis/LabVIEW_Fpga/blob/master/05_MicroBlaze_Mcs/02_MicroBlaze_Mcs_IO_Bus/02_MicroBlaze_Mcs_IO_Bus.lvproj

Look at the Vivado 2015.4 project:

https://github.com/JohnStratoudakis/LabVIEW_Fpga/blob/master/05_MicroBlaze_Mcs/02_MicroBlaze_Mcs_IO_Bus/MicroBlaze_Mcs_IO_Bus/MicroBlaze_Mcs_IO_Bus.xpr

And finally, using the Xilinx SDK set your workspace to:

https://github.com/JohnStratoudakis/LabVIEW_Fpga/tree/master/05_MicroBlaze_Mcs/02_MicroBlaze_Mcs_IO_Bus/MicroBlaze_Mcs_IO_Bus/MicroBlaze_Mcs_IO_Bus.sdk

Now if you do not have access to LabVIEW from your current machine, I have included a screen shot for each VI with the words “Front” or “Back” added to the filename, and in the case where there are many case structures, I have added the case structure element number.

The example has three features:

  • Send a packet of data over the IO Bus to the MicroBlaze MCS and read the same packet back over the IO Bus
  • Write a value to GPI channel #1 and read the value multiplied by 2 over GPO channel #2
  • Read the values of GPO channels 1, 2, and 3

Now I am continuing to work on integrating the 10 gigabit ports with the MicroBlaze MCS and to get the lwip TCP/IP stack working on this board – NI PXIe-6592R.