So after some serious debugging, editing, and regenerating of the bitstream, I was able to send out an ARP response from the FPGA to my linux server, and for my linux server to send a UDP packet in to the MicroBlaze. This was all verified via UART debug statements.
Now while I work on cleaning this all up, you can actually use this code in your project, but only if you have enough knowledge of LabVIEW and Xilinx. My job is to help bridge that gap, but for now:
Note, that the ‘fpganow/lwip’ repository is currently referenced as a sub-module of ‘fpganow/MicroBlaze_lwIP’. I will move it of course… But I will first try to re-create this project and while I re-create the project I will move and clean things up.
You want to open the ‘Tests/FPGANic/FPGANic-Tester.vi’ program as your entry. Start the vi, wait for the FPGA bitfile to be downloaded and then click on the ‘polling’ button to start receiving debug information about incoming and outgoing packets.
Working from scratch, I created a LabVIEW FPGA project that imports a MicroBlaze design that communicates with LabVIEW via a UART, and has the ability to change the elf file in a much shorter time frame than before.
I did this by adding the MicroBlaze to the project after it had been exported to Vivado, and not from within the CLIP that is imported as before. The only bad news is that I have to synthesize the FPGA project from Vivado, which currently is not connected to the NI FPGA Compile Cloud. This may be a feature that is coming soon, but it will only come if users start using this the Project Export to Vivado feature in the first place. So please write me any comments if anything is confusing or hard to follow below!
We are creating a MicroBlaze design, settings all of our processor options, including adding an instance of the UARTlite IP core, and exporting this Block Design to a tcl script that we will later on import in to our LabVIEW FPGA generated Vivado Project. We will not export the hardware or create any elf files in this part.
Create a new project from within Vivado 2015.4 (this is important, it will likely not work from other versions of Vivado)
The first step is not that important, you can just click next
I selected the project location to be on my E drive, “E:/git/MicroBlaze_UART/xilinx_mb”, and the name of the project to be “mb_uart”.
This is an RTL project type, and we do not want to specify any sources at this time.
The PXIe-6592R board contains a Kintex-7 FPGA chip with the following parameters:
Part #: xc7k410tffg900-2
Speed Grade: -2
New Project Summary, just click finish
What an empty project looks like:
Now create a block design
I have been going with the “d_” d underscore followed by microblaze naming schema
Now look at the empy Block Design
Click on the “Add IP” button
Start typing in MicroBlaze, and make sure you select “MicroBlaze” and not “MicroBlaze MCS”. The MicroBlaze MCS is a striped down version of the MicroBlaze which is very easy to use, but hard to bring in to LabVIEW. Well, it is not hard to bring in to LabVIEW, I just have not figured out how to bring it in and for the UART to work!
Look at the MicroBlaze IP. I hear in older version of the Xilinx Tools – namely ISE – there was no such picture, but instead you were given a list of signals and ports…
Now click on “Run Block Automation”, this will bring up a wizard where you can set a bunch of parameters, such as how much memory should be used and what peripherals it can support
This is what the defaults look like:
I set the following parameters:
Local Memory: 128KB
Cache Configuration: 64KB
Debug Module: None (Can’t debug from LabVIEW at the moment)
Here is what it looks like after block automation. Notice the local memory block, the Processor System Reset icon and the Clocking Wizard
I want to remove the reset from the Clocking Wizard and to convert the input clock to a single-ended clock from a differential clock. A differential clock just means that there are 2 clock signals and they always have to be opposites of each other.
Here I switch the clock from “Differential Clock” to “Single-Ended Clock
Here I get rid of the Reset signal, notice how the Reset Type gets grayed out automatically.
Now I add the “AXI Uartlite” IP. There is another UART IP that is available, but I have arbitrarily chosen to learn by using this one.
Now I want to customize the Uartlite IP, so just as before with the Clocking Wizard, I right-click (away from any terminals) and select “Customize Block”.
I set the Baud Rate to 128,000, I leave the number of data bits to 8, and I set even parity. Note that I chose to add a parity because I want my UART connection to be more exact and to receive less (if no) garbled text.
Now for the fun part… Click on “Run Connection Automation” and watch as Vivado wires up all of our IP and components together!
All of the default options are fine, but I have included screen shots so that you can see all the details yourself:
Now after this completes, the block design will look pretty messy, so click on the “Refresh” looking icon below to regenerate the layout:
Here is a cleaned up version of the Block Design. Notice that the “Run Block Automation” option is still there. Nothing has gone wrong, this text is there because we now have to wire up the Data and Instruction Caches.
Again, the default options are fine
And finally… our block design is ready.
Now we will generate an HDL Wrapper file. This is not required for the Block Design, but it will help us with the importing of this design in to LabVIEW later.
Now we will click on “Export->Export Block Design”, this will generate a tcl script that we can run or “source” from another Vivado project and this block design will be regenerated for us.
Note the location of the wrapper VHDL. Copy this file to your clipboard
Place it in to the root directory of your project, a location that you will commit to source control.
The Tcl file should also be in the same directory
Part 2 – (Optional) How to Recreate MicroBlaze Design from Source TCL Script
So Vivado is not like other programming languages where you create your gitignore file and commit the rest to source code control. In Vivado, you generate a TCL script that will re-generate your entire project – or in our case – a specific Block Design. This script will also import any other files such as VHDL files or constraints files that are required. In our case we have a very simple design that does not require any such helper files.
Click on Create New Project:
Same as before, first couple of steps just click “Next”
This time I will call the project “mb_uart.imp”, to differentiate it from the project that I created.
Again, RTL project, and do not specify any sources at this time
Same part as before. Not sure if you can import a block design to other FPGAs, perhaps if they have the same family or series, but I have not tried this out yet.
Click on “Tcl Console”
Change to the directory where the Tcl export script is located
Type dir if you like. Notice how “known Tcl” commands are sent to the underlying os for execution
And finally, “source” the tcl script
And here is the imported Block Design
And that’s it! Create an HDL Wrapper if you like
Part 3 – Bring Design in to LabVIEW FPGA
Now we have to create a CLIP (Component Level IP) Node in LabVIEW FPGA that will import this MicroBlaze Block Design. A CLIP node contains a top-level vhdl wrapper that usually instantiates the IP that we want to bring in to LabVIEW FPGA, but in this case I am creating a CLIP node that contains an empty wrapper for the MicroBlaze Block Design.
Launch LabVIEW 2017 (32-bit) from the start menu
Here is the screen that appears after you start LabVIEW
Click on “Create Project”, Blank Project should be fine.
Right-click on the “My Computer” icon and select “New->Targets and Devices…”
Select “New target or device” and select the PXIe-6592R FPGA board
Here is what the project looks like after adding the FPGA device/target:
Create a FIFO for communicating from the Host to the FPGA Target, aka “Host to Target – DMA” by clicking New->FIFO:
I follow a naming standard that ALE System Integration follows which is to prepend “HT” or “TH” to the name of each FIFO, where HT stands for Host to Target, and TH stands for Target to Host, so I name this FIFO “HT-UART_TX”:
I also set the Data Type of this FIFO to U8, because it will be used to receive characters
The same thing for the Receive FIFO, “TH” for Target to Host, and RX for Receive.
Data type, again is U8.
Now do you remember where the microblaze wrapper vhdl was located? Find it and copy it to the root of the LabVIEW project
The LabVIEW project is located in the MicroBlaze_UART/labview_fpga_uart directory:
Rename the file by prepending “UserRTL_” to the name of the file.
Edit the file, here is what it looks like before: (Sorry for the screen shot, I will provide source code links soon):
Change the name of the entity to match the file right now
Now we will create a CLIP, right-click on the FPGA target and select “Properties”
Add the VHDL file that we edited before – “UserRTL_d_microblaze_wrapper.vhd”
Then click Next to go to step 2 of 8. Here select the “Limited to families selected below” option:
Then click next to go to step 3, here you have to click on “Check Syntax” and a Xilinx application is run to check the syntax
Now for step 4, click on the “clock_rtl” signal and make sure it is set to “Signal type” of clock. Don’t worry about the reset signal, we want to control this ourselves.
For step 5, nothing is required, so skip and go to step 6
Same thing for Step 6, just click next
For step 7, just make sure that all signals are “Allowed” inside a Single-Cycle Timed Loop. You can probably also required this, but we will not be doing that today.
Step 8 – Click Finish.
We are finished, now a CLIP is available for us to use in our FPGA design, but it has not been instantiated. I believe you can have multiple instances of the same CLIP.
Before we go any further, we have to add a clock for the MicroBlaze. Our design will be 100 MHz, so right-clock on the 40 MHz clock and click on “New FPGA Derived Clock”
Type 100 in to the “Desired Derived Frequency” box, and everything else should automatically update. Then click OK
Now we also have to set our top-level clock to be 100MHz. So right-click on the FPGA Target and select properties, and in the following dialog select “Top-Level Clock” and select the new clock.
Now we will add an instance of the CLIP. Right-click on the FPGA Target and select “New->Component-Level IP
Select the UserRTL_d_microblaze_wrapper
And on the “Clock Selection” page, select the 100MHz clock.
Now we will add some existing vi’s
And here is what our final project looks like:
Part 4 – Create LabVIEW Host Wrapper
LabVIEW Host applications run as native Windows processes
Now we will add the Host LabVIEW application vis. A LabVIEW Host application is a native Windows executable and thanks to a bunch of libraries written by National Instruments will handle all of our communications with the FPGA.
Add the existing files
Here is what the project looks like with all of the Host VIs.
Part 5 – Export Project to Vivado
LabVIEW has introduced a new feature, the ability to export an entire FPGA design to Vivado, which allows you to import any existing IP. All you have to do is define one or more CLIP IPs that define the interface with your design.
Right click on the Build Specification inside the FPGA Target and select “New->Project Export for Vivado”
Give the build specification a name, I recommend keeping it short and sweet. So change it from “My Project Export for Vivado Design Suite” to something like “FpgaUart”
I always set “Auto increment” to true.
Then on the “Source Files” tab and select the “Fpga-Uart-Exercisor.vi” to be the top level
Now there is a new Build Specification. Clock on Build
After it completes, you can launch Vivado by clicking on the button below.
Here is what the Vivado project looks like. All of the IP files are encrypted, except for the files we added for our CLIP. Since we added only one file, that is all we will see.
Now we will import our Block Design. Locate the “Tcl Console” in the bottom window.
This console supports TCL commands as well as regular operating system commands. Since we are on Windows, we would like to change our current working directory to be where our d_microblaze.tcl file is located. Remember, Vivado uses the backslash ‘\’ as its escape character, so you will have to enter the backslash twice for each time you would like to use it.
Now we have to source our file by issuing the “source d_microblaze.tcl” command:
This will take a couple of seconds, depending on the speed of your computer.
After it finishes importing/re-creating the design, this is what you should see:
Now we go back to our “UserRTL_d_microblaze_wrapper.vhd” vhdl wrapper file and remove the comments enabling the code that uses the MicroBlaze.
Now you will see that this new design appears under the VHDL wrapper file
Now we have to export this entire design to Xilinx SDK so we can generate an executable to run. Click “File->Export->Export Hardware”
We have not run “Generate Output Products” for the MicroBlaze, so we will be prompted to do so. Make sure you click “Generate Output Products”. If you are more experienced than me in Vivado, perhaps you know if this step is required.
The default directory is fine.
The directory will be named “FpgaUart.sdk”
Part 6 – Building a C Executable for Running on the MicroBlaze Soft-Core Processor
Run the executable by referencing the lvbitx file
Now open Xilinx SDK and select the “FpgaUart.sdk” directory as the workspace
Create a new “Hardware Platform Specification” project by clicking on “File->New->Other”
Select “Xilinx->Hardware Platform Specification”
Select the only file in the sdk directory, the .hdf file.
The Project name will be automatically populated
Here is what everything looks like after creating the Hardware Platform Specification
Now create an Application Project by selecting “File->Application Project”
Name the project “mb_uart_1” and click next.
Select “Empty Application”
Now we will add a new source file
We will call it main.c
Here is the application with an empty main.c file. Note that there are errors listed in the bottom window because this cannot build without a main function!
After I cut and paste the source code for a simple UART application, the errors go away.
Now I am creating a second project with the same name but with the number 2 instead.
I am going to use the same Board Support Package of “BSP” file as before, and I am creating another Empty Application.
I add a new main.c as before.
I paste the same source code, but this time I replace all instances of “1.0” with “2.0”
I set the active configuration to Release for both projects. This isn’t really necessary for a small design such as this one, but it is a good habit to have.
Now we go back to Vivado and click “Tools->Associate ELF Files…”
We do not have to select an elf file for simulation, but you can if you wish to create a test bench for this project. I have done this in the past and it takes me about 3 hours to simulate 100 milliseconds. With the method described on this page, it becomes much easier to just swap out the elf file and to regenerate the bitstream.
I normally avoid shop words or “corporate speak” because I feel it dehumanizes us, but sometimes these phrases are necessary. So here is the “30,000 foot” view. And please pardon the appearance of my flow charts and diagrams, I am not a graphic designer…
All images open in a new tab, so just click on them with ease until I figure out how to make this WordPress theme wider.
Take look at this:
There are four (4) 10 Gigabit ports available on this device, but I am using only one of them for this design.
The FPGA design contains a MicroBlaze soft-core Processor
This MicroBlaze soft-core Processor is connected to a LabVIEW for Windows Executable, which is the same thing as a standard .exe file.
Now some more details:
Any type of Ethernet Frame can enter the 10 Gigabit PHY, as shown in the diagram:
An Ethernet Frame with an ARP Request
An Ethernet Frame with an IPv4 Packet containing an ICMP packet, otherwise known as a “Ping” message
An Ethernet Frame with an IPv4 Packet containing a UDP packet. You may known this as “multicast” or “broadcast” messages.
And finally, everything, the full shebang:
I have broken out the details of the hardware comprising the 10 Gigabit connection. Technically it consists of 4 SFP+ connectors going to a Multi-Gigabit Transceiver. Now normally I would think that there is some sort of chip in between the SFP+ and the FPGA, but I believe that National Instruments has used some Xilinx IP that handles this for us. See, I don’t even know what hardware is being used, but I am able to use it and make an FPGA-based Network Card!
The data from the Multi-Gigabit Transceiver then goes to the 10 GE MAC Core, which sends all received packets
First, a Definition
CLIP – Stands for Component Level IP and is a method of bringing in non-LabVIEW FPGA code in to LabVIEW. Basically you take a synthesized design, wrap it up in some VHDL and import this VHDL file to LabVIEW. In this case an instance of the OpenCores 10 GE Mac is being brought in to the the design.
A – The reading of incoming frames is wrapped up in to a nice library by National Instruments, that even includes some IP to respond to ARP messages. I have stripped all of this usage out for this design because I wanted simplicity for learning. Anyway, on each clock cycle you have the following variables:
data valid [boolean]
data [64 bit WORD]
byte enables [array of 8 booleans]
End of Good Frame [boolean]
End of Bad Frame [boolean]
Here is a screenshot of the usage for this, it should be very easy to understand once you have an idea of what LabVIEW is and how it works.
B – Now the data coming from the 10 Gigabit PHY contains 64-bit WORDs, and 2 booleans, one for a good frame, and one for a bad frame. Now I do not know how to configure and properly use a 64-bit AXI-Data Stream FIFO with a MicroBlaze processor, so I had to convert this data manually myself. It did not take long, in fact I documented this in my log where it took me 1 hour and 15 minutes to do this following the LabVIEW FPGA State Machine paradigm. Think of the LabVIEW FPGA State Machine paradigm or pattern as the absolute best of both worlds in terms of VHDL/Verilog and LabVIEW.
So, we have data coming in what I call “AXI-64bit format” and we have to convert it/write it to a LabVIEW FIFO. Here is a close-up of this code:
The code above is running inside a loop clocked at 156.25 MHz, and on the left is how we get the data from the 10 GE MAC. If “data valid”, or “End of Good Frame” or “End of Bad Frame” are true, we enter the self explanatory Case Structure, which is the same thing as an if statement. Inside this case we package all the data in to a custom “Cluster type”, which is the same thing as a C structure and write it in to a LabVIEW FIFO.
C – Now we read one element on each clock cycle of the custom LabVIEW Cluster defined in step B, and convert this in to a 32-bit AXI Data Stream to be read by the MicroBlaze.
Here is a screenshot of the entire loop, which runs at 100MHz, because I clocked my MicroBlaze to that speed. I could probably increase my MicroBlaze to 156.25MHz, but that will decrease my productivity in terms of longer synthesis times.
I zoomed out a bit further for this screenshot and included the clock specifier, which is 100 MHz. Also notice how there is another “Case Structure” inside this loop, but the case is not “True”, but it is a State Machine with the “Read-Top” case showing as the default state. This state checks if the incoming data is valid, and if so writes the upper 32 bits of the 64-bit data WORD in to the AXI-Data Stream FIFO that is connected to the MicroBlaze.
Here is a close-up of the “Read-Top” state:
Here is the other state:
The “Read-Bottom” state. This state will write the lower half of the 64-bit WORD and will check if this is the final element in the Ethernet Frame. If this is the final element, it will enter the “Append-Size” state, which is incorrectly named, will fix that later “TODO: Rename Append-Size”. haha. Anyway, it will append some metadata indicating if this frame should be dropped or kept.
The final state – “Append-Size”:
This code is very simple. I set TKEEP to all one’s, or 0b1111, and I set the first 2 bits of the 32-bit WORD to contain “End of Good Frame” and “End of Bad Frame”. Now why am I setting TKEEP to all 1’s? Well, simple, because I haven’t implemented this part yet, however, by setting it to all 1’s my code will still work because most TCP/IP stacks just ignore all padded 0’s.
D – Now the MicroBlaze has been programmed with a function that reads an incoming frame from AXI-Data Stream FIFO #0 and writes its contents to AXI-Data Stream FIFO #1. It also reads an incoming frame from AXI-Data Stream FIFO #1 and writes its contents to AXI-Data Stream FIFO #0. This is a simple passthrough that exercises my implementation of the FIFOs.
A direct link to the source code of this C code that is running in the MicroBlaze:
E – Now what happens after the source code above executes? Well, data is read from the 10 Gigabit PHY and read on the first AXI-Data Stream FIFO and written out back to the rest of the FPGA via the second AXI-Data Stream FIFO. So we want to read this ethernet frame and write it up to the Host application running on normal/regular Windows. This is very simple, read data, write data to a Target-to-Host LabVIEW FIFO, and if tlast is equal to true, include this in the metadata, which for now is simply the upper half of the 64-bit WORD.
F – Now how do you read this data on the host? Well, if you are familiar with LabVIEW, the code would look like this:
The green box contains a reference to the running FPGA. The first box on the left polls the FIFO to see if any elements are available, and if the number of elements available is greater than 0 it reads that number of elements.
If you wanted to do this from C++, you could use LabWindows CVI to read from the FPGA interface as such:
Please note that you can also link to the LabWindows CVI library and use it from your existing C++ applications. Drivers for this specific board are only available for Windows, but if you are a big bank or financial firm with deep pockets, I’m sure you can set up some sort of agreement with National Instruments to port this code and drivers to <Operating System of your Choice>.
Okay, that is great, now what about writing data from the Host application back to the FPGA for sending out of the 10 Gigabit PHY? Well, you do the opposite, you enter the codes in reverse. (Spies Like Us).
Instead of a “Target-to-Host” FIFO, use a “Host-to-Target” FIFO, and in my case, I prepend the size in WORDs to the packet to be sent.
Again, the green box is a reference to the running FPGA. The square light green colored box is a function, also known as a “sub-VI” that I wrote that generates a UDP packet (or is it a UDP datagram? I forget). The output of this UDP packet is converted in to 32-bit WORDS by the box with a white background, and then the size is prepended to this array and written in to the “HT_WRITE” LabVIEW Host-to-Target FIFO.
G – So we are receiving data in the following format from the host:
We read the size and then the rest of the elements from the FIFO and write them to the 2nd AXI-Data Stream FIFO that is connected to the MicroBlaze. Again note that I have not yet fully implemented the proper usage of the TKEEP signals, so in this case the TKEEP signal can be dynamically set from the Host application for testing purposes.
H – Now that the MicroBlaze has read our outgoing ethernet frame on Fifo #1 and has written the same outgoing frame on FIFO #0, we have to convert this 32-bit AXI Data Stream in to 64-bit words that are suitable for our 10 Gigabit Ethernet PHY.
This time however, I used a proper state machine and named all of the states correctly.
The left-most box connects the signals from the MicroBlaze to LabVIEW and wires them in to the state machine. If the data is valid and it is not the last element, the top half is stored in to a shift-register and the next state is “Read-Bottom”. Here is a close-up of the “Read-Top” state:
And here is a close-up of the “Read-Bottom” state:
I – Now we have written the 32-bit data coming from the MicroBlaze in to a LabVIEW FIFO with 64-bit data WORDS and we have to write this out using the CLIP, via the National Instruments provided wrapper.
Look at how simple and beautiful this code is!
You can look at the source code here: (you must clone the repository to your local machine to see it easier, just clone it and open the 2nd file – the html file)
So what now? Do I continue cleaning up the code and updating documentation? Do I make a youtube video demonstrating this? Do I modify the MicroBlaze code to no longer just be a “passthrough” but instead to send all data through the lwIP TCP/IP stack? If I do this, I will have to modify the elf file (compiled binary) that is embedded in to my design, breaking this design, so I can make multiple Xilinx checkpoints and it will work, but that will confuse all of my readers… Man decisions, decisions.
How about this, I finalize this project, make a new sub-directory in the source code and make a brand new LabVIEW FPGA project and this time I will use the lwIP version of the source code and I will make sure that everything is reproducible. It is raining now anyway and I want to stay inside and code…
The 10 Gigabit MAC/transceiver gives me 64 bit data words. I currently think I am giving and getting 64 bit data words, but I am really only using 32 bits. I came to this conclusion after I tried reading a 64 bit word and saw the data was simply two repeated 32 bit words. Additionally some random person on the internet said that the MicroBlaze data bus is 32-bit and you have to use some sort of data width converter ip.
Out of luck… I don’t know how to use the converter, but I am sure there is a way to properly convert this by using LabVIEW FPGA. So for starters, this means I can remove my AXI4 Stream Data FIFOs and keep the two 32-bit versions. I’ll also throw in support for TKEEP while I am at it.
So the “Receive Ethernet Frame” code from the 10 Gigabit transceiver/MAC looks like this:
I have to convert this 64-bit data stream in to a 32-bit data stream before I send it in to the MicroBlaze. Here is the current/erroneous implementation:
So what do I have to do? I have to read one element from the LabVIEW FIFO – the FIFO on the left, write the upper half of the 32 bit word in one cycle, and not read from the LabVIEW FIFO for the next clock cycle and to write the lower half of the 32 bit word. Want to see the power of LabVIEW? It is 7:22 AM right now… [elevator music/jeopardy music starts playing in the background]
Now it is 8:07 AM and I have finished re-factoring this loop. I am writing the upper half of each 64 bit word in one cycle, and am writing the bottom half during the next clock cycle. I am also keeping the logic that appends an extra word which contains the “EndOfGoodFrame”, and “EndOfBadFrame” boolean values. Since I am writing 32-bit words now, I am only appending one word.
Here is the full loop:
And a close up of Case 0 of the innermost Case Structure:
And a close up of Case 1:
I now have to do this for the other direction – convert a LabVIEW FIFO packet to an AXI 32-bit stream. Here is the current implementation:
The signal on AXI_STR_TXD_data is a U32 and I have to collect 2 of these values and insert them in to the FIFO on the right side. I am going to have to think about this for a bit, but I have to get ready and go to work. So I may not finish this before leaving.
Thanks and have a nice day!
Update: Okay, this is not that pretty, but here is my first-cut “20 minutes” version:
Now I have to go and get ready! But I’ll be sure to set everything to synthesize before I leave…
I wired up the 10 gigabit ethernet MAC to my MicroBlaze instance to my host computer and compiled/synthesized everything. I then turn on my “quiet” PXIe-1062Q and fire up my tester application and it did not work… I open up an isolated tester – “Fpga-Mac-Top.vi”, and it worked. I open up the isolated MicroBlaze tester – “Fpga-MicroBlaze-Top.vi”, and nothing. Not even a read from the GPIO.
This is quite strange… why is it not working? I spend some time looking over everything, re-generating output products, synthesizing from Vivado, bringing the design back in to LabVIEW, and long story short I was not setting the MicroBlaze Reset to ACTIVE_LOW, whereas in all of my previous designs I was setting it to ACTIVE_HIGH. Anyway, while I wait for it to compile, I have something to say. Which do you prefer? Using an IP Integration Node or a CLIP (Component Level IP) for using a MicroBlaze Processor from LabVIEW?
Well, first off, let me link to some National Instruments documentation on both:
And now let me show you some screen shots. Here is a close up of what using an IP Integration Node looks like: (right-click to open in a new window for a larger version until I figure out how to modify this wordpress theme to be wider)
Here is a zoomed out version of this same VI:
And finally, what it looks like without an IP Integration node, but with a CLIP (Component Level-IP):
Can you see the difference? I can… for starters, I can read the full name of each signal when using a CLIP. Additionally, with a CLIP I can split up my nodes in to separate locations, so that I can organize my VI in a much cleaner way. And finally, since I can read the full signal name when using a CLIP node, I no longer have to hover over each signal to get the signal name, thus removing any reason for having comments as in the IP Integration Node version.
Anyway. CLIP node is my recommended method of using LabVIEW FPGA to import Xilinx Vivado IP.
Also, this code was from a project that I implemented in order to learn how to use the AXI Stream FIFO inside of LabVIEW via a MicroBlaze. In other words, how to communicate with a MicroBlaze processor via an AXI Stream FIFO from LabVIEW FPGA.
Ever since I started developing this LabVIEW FPGA project that uses a MicroBlaze soft processor to process TCP streams, I have learned a lot and can comment on the pros and cons of using LabVIEW FPGA vs using a traditional Xilinx/Altera based FPGA development approach.
For starters, LabVIEW FPGA blows every single other FPGA development system out of the water when it comes to developing prototypes. I made a prototype for implementing a Monero miner in record time. I don’t remember how long it took, but you can see my commit history here: https://github.com/JohnStratoudakis/CryptoCurrencies
Then I decided that I wanted to make my orderbook support TCP/IP, which is what most Market Data Feeds are using, so I embarked on learning how to make LabVIEW FPGA play well with Xilinx Vivado. I did not realize it at the time, but the knowledge I have gained over the past year is enough to make one not have to live with any of the cons that LabVIEW FPGA comes with.
I have learned how to integrate basic VHDL/Verilog IP in to a LabVIEW FPGA project.
I have learned how to integrate more complex Xilinx IP such as Adder/Subtractors, Fast Fourier Transforms, and AXI Stream FIFOs.
I have learned how to integrate an entire soft-core processor based system in as well. Including both the simplified MicroBlaze MCS, and the more complex MicroBlaze processors developed by Xilinx.
Furthermore, I have been able to communicate between LabVIEW FPGA and the MicroBlaze processor via AXI Stream FIFOs, General Purpose Input/Output registers, and have implemented Interrupt handlers.
Using all of this together, I can develop in a very efficient manner the perfect prototype that uses existing Xilinx IP, IP from opencores.org, or proprietary IP that can use a MicroBlaze soft-core processor all from within LabVIEW FPGA. This serves a great risk-mitigating factor in that one can tell if an FPGA will be a viable solution for a particular type of problem. Then, one can choose to keep the LabVIEW FPGA implementation and scale it out, or one can rewrite the portions written in LabVIEW in another language such as Verilog or VHDL.
Usually, the first product that works is what makes it to market and is successful, not because it is the best, but because it is the most adaptable to change. Think Evolution… think VHS, think DVDs, think about the iPod. These products were market leading because they got the job done right now, not later when all of the features were fully implemented. Additionally these products were easy to use.
Anyway, I have fully wired up the 10 Gigabit Transceiver in to my MicroBlaze, and have wired the MicroBlaze to my host application, and I am anxiously awaiting my FPGA synthesizer to complete so I can test it out…
I spent some time analyzing the Monero CryptoCurrency source code to understand the algorithm, how it works and to see if it is doable with an FPGA via LabVIEW for FPGA, our secret weapon.
I learned that there are 4 steps to the Monero “CryptoNight” algorithm and that step 3 is the part that does the heavy lifting, with around 500k reads and writes to a small section of memory that is 2 megabytes in size. This section of memory was specifically selected to be a size that coincides with the size of most processor Level 3 caches. This is supposed to be what makes the algorithm “memory-hard”.
Locks are meant to be broken, codes cracked… and secrets revealed.
I am thinking – what if I put step 3 inside an FPGA have it use Block RAM?
Block RAM is limited on an FPGA, so this may not be worthwhile
Okay, what about DRAM?
My FPGA may have DDR3 RAM, but other FPGAs have faster RAM. If my implementation works well on DDR3 RAM, then I can move it to another FPGA with faster RAM.
Will an FPGA user of DRAM be faster than a CPU usage of L3 Cache? Taking in to account of course that the FPGA is the only user of this DRAM controller? What about an FPGA with multiple DRAM controllers?
Well, I know that DRAM is “slow” when compared to other types of memory, but the difference here is that the FPGA is the only user of the DRAM controller. On any operating system, there are many users, i.e. programs, processes, kernel threads. So would doing this from an FPGA make the cut? Would it make that much of a difference?
Well, there is only one way to find out. Try it out!
I have created a github repository with my work so far here:
I then implemented the same algorithm, based on the same source file by using LabVIEW for Windows. The values match, so we have a working C++ version, a working LabVIEW for Windows version, and now we can determine if an FPGA version will be worth it.
Please note that the LabVIEW version is not optimized code, and I am not a LabVIEW for Windows Developer, and that is probably why it runs so slow… for now. Yes, it takes over an hour to create one hash. However, I have consulted with some LabVIEW experts, and they have told me what I should do to make it faster. I will start working on that, and in the meantime, you can take a look at the ever-changing source code to see what the algorithm involves. Remember, LabVIEW code is very easy to understand, so this may be the “flow-chart” explanation of what a cryptocurrency miner looks like.
Now if you do not have access to LabVIEW from your current machine, I have included a screen shot for each VI with the words “Front” or “Back” added to the filename, and in the case where there are many case structures, I have added the case structure element number.
The example has three features:
Send a packet of data over the IO Bus to the MicroBlaze MCS and read the same packet back over the IO Bus
Write a value to GPI channel #1 and read the value multiplied by 2 over GPO channel #2
Read the values of GPO channels 1, 2, and 3
Now I am continuing to work on integrating the 10 gigabit ports with the MicroBlaze MCS and to get the lwip TCP/IP stack working on this board – NI PXIe-6592R.
This guide shows how to use the version of Xilinx Vivado that is bundled with the “LabVIEW 2017 FPGA Module Xilinx Compilation Tools” to create a Vivado FPGA design that uses a MicroBlaze MCS core, to create and overlay an executable on top of that core using Xilinx SDK 2015.4, and finally how to import this design and to run it on the National Instruments PXIe-6592R High-Speed Serial Instrument.
Step 4 – Just click next, I did not set anything here
Step 5 – Make sure the “Target language” is VHDL.
Step 6 – Just click next in the Add Existing IP page
Step 7 – Click next, we are not adding any constraints, nor do we have to in this project.
Step 8 – Select the appropriate FPGA part
We are using the PXIe-6592R board for this example and the FPGA has the following specifications:
Speed Grade: -2
Part #: xc7k410t
Step 9 – Click finish to create your new project
Step 10 – Here is what the project looks like after creation. Click on the image below for a higher resolution image to appear in a new window.
Step 11 – Click “Create Block Design”
Vivado makes it easy to create designs. By clicking on create block design, you can make a design that uses several cores and makes it easy to synthesize, package and to export to an application such as LabVIEW
Step 12 – Name the design
I like using the d_ prefix followed by a short description of my design. Since we are using the MicroBlaze MCS, I name my design “d_mcs”
Step 13 – Here is the blank design
Step 14 – Click the “Add IP” button to get a list of all Xilinx available cores
Step 15 – Make sure you select the “MicroBlaze MCS” and not the “MicroBlaze”
The MicroBlaze core is more customizable and supports more features, I will cover this in a future article.
Step 16 – After adding the MicroBlaze MCS core. Notice that no peripherals have been added, nor has the core been configured.
Step 17 – Right click (away from any terminals) and customize the block.
Step 18 – Configure the MicroBlaze MCS core
Step 19 – Set the memory size to 64 KB and enable the IO Bus if you like. I will use the IO Bus in a future article
Step 20 – Enable the General Purpose Output (GPO) channel 1, 32 bits is fine.
Step 21 – Enable the General Purpose Input (GPI) channel 1, 32 bits is fine.
Step 22 – Click on “Run Connection Automation” at the top of the window containing the design. Check the Clk box.
Step 23 – The defaults for GPIO2 should be fine as well.
Step 24 – Same for Reset. Note how the Reset Polarity is set to ACTIVE_HIGH. This will not matter for our design, but it will in other cases. i.e. I was dealing with the Arty Artix-7 board, and I had to flip the reset polarity for that board to work.
Step 25 – Now if you enabled the IO Bus, you have to manually make it external. Make sure you right-click and that the cursor becomes pencil-like as shown below.
Step 26 – Click “Make External”
Step 27 – Here is what it looks like after clicking “Make External”
Step 28 – Now go back to the block design and right-click on the design containing the MicroBlaze MCS and select “Generate Output Products”
Step 29 – Global should be fine.
Step 30 – When Vivado is finished, you should see the following.
Step 31 – Now right-click on the design and select “Create HDL Wrapper”
Step 32 – Either option should be fine here, but I like having Vivado manage this for me automatically in case I make changes to my block design.
Step 33 – Notice how there is a new VHDL file, that contains an instantiation of everything in our Block Design.
Step 34 – A preview of the contents of the VHDL design wrapper. From LabVIEW we will be importing this wrapper.
Section 2 – Xilinx SDK Write a C program
Step 1 – Now we take a break from Vivado and launch the Xilinx SDK. You can normally export the hardware from Vivado and ask it to launch the SDK, but there is a bug in this version of Vivado (2015.4) that prevents us from doing so only in the case that our design is using the MicroBlaze MCS. Note that this does not apply to designs using the MicroBlaze core.
Step 2 – Create a directory named “MicroBlaze_Mcs_GPIO.sdk” as a sub-directory inside the Vivado project directory and set this to be your workspace.
Normally, if you select “File->Export->Hardware”, this directory will automatically be created for you, but remember that the hdf file that will exist in the root directory will not work due to a bug in Vivado 2015.4, so I typically create the directory myself.
Step 3 – Click “File->New->Other” to get to the New Project Wizard.
Step 4 – Select Hardware Specification.
Step 5 – Click the “Browse” button and add the following file, make sure you select the file ending with “_sdk.xml”
The full path from the root of the Vivado project is as follows:
Step 4 – Click ok, the new elf file should automatically be selected.
Step 5 – Now run the Synthesis.
Step 6 – You can verify that synthesis is running by looking at the top-right of the Vivado Window.
Step 7 – After a couple of minutes when Synthesis finishes, click Cancel, because we do not want to run the implementation. LabVIEW will handle that!
Step 8 – Write a checkpoint by clicking “File->Write Checkpoint”
Note that you must write a Synthesized Checkpoint, which means that you have to have followed the steps above and not have run Implementation. If you run implementation, the checkpoint file will be larger than if you only ran synthesis. In case you made a mistake and implemented your design, simply open the Synthesized design and write a new checkpoint.
Section 4 – LabVIEW 2017
Step 1 – LabVIEW 2017 splash screen. I like the new look.
Step 2 – Create a new Project.
Step 3 – Blank Project is fine.
Step 4 – Here is what an empty project looks like before you save it.
Step 5 – Add a new target, for this tutorial we are going to be using the PXIe-6592R High-Speed Serial Instrument.
Step 6 – If you do not have live hardware plugged in, select New target or device, if you do, it should show up automatically.
Step 7 – After adding the PXIe-6592R
Step 8 – Click File->Save and save the project.
Step 9 – Add a new FPGA-scoped VI. Make sure you right-click on the FPGA-target for this.
Step 10 – I opened the VI and cleaned up the windows and show the block diagram here.
Step 11 – Now click “File->Save” to save the FPGA-scoped VI
Step 12 – I usually save FPGA-scoped VIs in a sub-directory named after the FPGA target. In this case “Fpga-6592”
Step 13 – Right-click anywhere in the blank white space and select “Timed Loop” to add a Single-Cycle Timed Loop.
Step 14 – Add a new FPGA clock since the default of 40 MHz is not suitable for our 100 MHz MicroBlaze MCS.
Step 15 – Just type 100 in the “Desired Derived Frequency box and click ok
Step 16 – Now right-click on the clock input to the Single-Cycle Timed Loop and select Create->Constant
Step 17 – Here is what the loop looks like before selecting the clock type.
Step 18 – Dropdown should reveal 2 clocks. You select 100 MHz
Step 19 – After selecting 100 MHz
Step 20 – Now to configure the CLIP node. CLIP stands for Component Level IP.
Step 21 – Click Component-Level IP in the left, and then click on the Create File icon on the right.
Step 22 – Add the checkpoint dcp file, and the wrapper vhdl file.
Step 23 – Depending on your target, you may have to limit the device families.
Step 24 – Click Chek Syntax, this requires the Vivado Compilation Tools to be installed in order to work.
Step 25 – Set the reset_rtl Signal type to be reset, the clock_rtl signal type to be clock, and set the data type for the gpio_rtl_tri_i and gpio_rtl_tri_o to be U32.
Step 26 – Nothing to do here, just click next.
Step 27 – Nothing to do here, just click next.
Step 28 – Use the shift key and the mouse to select all signals on the left, and make them all require the clock_rtl clock domain and to be required to be inside a Single-Cycle Timed Loop.
Step 29 – Click Finish
Step 30 – And now you have a CLIP available to your project.
Step 31 – Now create an instance of this CLIP by clicking New->Component Level IP
Step 32 – Select the ip from the drop down, I usually name the instance to match what is in the wrapper vhdl file. In older versions of LabVIEW this was required, but I am not sure if that is still the case.
Step 33 – Select the appropriate clock
Step 34 – What the project looks like with the added CLIP after expanding it.
Step 35 – Inside the Single-Cycle Timed Loop, right-click and select an “I/O Node”
Step 36 – From here select the gpio_rtl_i, and gpio_rtl_o signals.
Step 37 – Add a control to the input, and an indicator to the output.
Step 38 – Create a Build Specification
Step 39 – And build it!
You can now run the top-level VI. Video demonstration to come shortly.
This post will cover the next iteration of implementing an OrderBook inside an FPGA that is based on a NASDAQ ITCH 4.1 market data feed.
Some time has passed and I have finally found enough time to finish all the code changes required for the two (2) components listed below, along with the requisite test harnesses to validate.
Starting off, here are the components of an FPGA-based OrderBook
FPGA loop that listens to incoming data from a Network Interface Card that parses, filters and translates each incoming message and sends the appropriate message/command to the OrderBook loop.
FPGA loop that reads and writes Orders to memory using an insertion sort algorithm. The Orderbook is currently able to support only one instrument and one side. It’s capacity is 1,000 elements, which through the power of LabVIEW for FPGA can be easily adjusted, but that is not important right now. The OrderBook currently supports two commands: add order and get all orders. The get all orders command is meant to be used by a user or client application for trading and other purposes.
Using Test Driven Development, Here Are the Test Harnesses
ITCH Parser Test Harness
Input: A file containing raw ITCH 4.1 market data messages (generated using createItch.py)
Output: Array of OrderBook operations
OrderBook Test Harness
Input: An array of OrderBook operations
Output: A sorted array of Orders
What Does a Test Harness Look Like?
Here is a screenshot of the Front Panel diagram for the ITCH Parser Test Harness:
and here is a flow chart of what is going on:
What exactly is going on? The vi Host-ItchParser-TestHarness.vi, reads the file containing raw NASDAQ ITCH Market Data messages, sends them in to the FPGA Test Harness via the Host-to-Target DMA FIFO “HT-TEST_IN”. The Fpga test harness is located in the “Tests” folder and is named “Fpga-ItchParser-TestHarness.vi”, this Test Harness passes the raw Itch data as is to the Fpga-ItchParser.vi, which parses, normalized and filters each message for Add order message types only for symbol AAPL. It then sends an OrderBook command for each appropriate message back out to the Fpga-TestHarness which sends the results up to the host
And for the OrderBook Test Harness
Here is what it would look like in a production system
Why this is so important?
Well, normally to create an FPGA based anything, one needs to use Verilog, VHDL or one of any numerous “high-level” design languages. Here you can accomplish the same thing, but with a really great programming interface that matches the Verilog programming model, but only with a graphical interface.
This means you can create a custom FPGA based solution, reduce your datacenter power usage, increase your applications performance, and reap the rest of the great benefits of FPGA-based computing.
I encourage you to download the source code for this and to see for yourself what LabVIEW for FPGA can do for you and to then try it in one of your own applications.
Stay Tuned… What is next?
Hook the ITCH Parser up to an actual Network Interface Card, preferably a 10 Gigabit, since I already own the hardware to do so.
Hook up either a MicroBlaze processor or the host computer to the OrderBook so that something can be done with the OrderBook data itself.