Difference between revisions of "Tahoe-LAFS"

From the Linux and Unix Users Group at Virginia Teck Wiki
Jump to: navigation, search
imported>Mjh
(Categorization)
imported>Mjh
Line 1: Line 1:
 
Tahoe-LAFS is a distributed filesystem which provides redundancy and security for files.
 
Tahoe-LAFS is a distributed filesystem which provides redundancy and security for files.
  
=FAQs=
+
==Connecting to VTLUUG's Tahoe Grid==
 +
VTLUUG now operates an ''onion grid'', a grid of tor hidden services. All nodes must be tor-enabled using ''torify'' and storage nodes must also advertise a tor hidden service.
  
 +
=== Storage nodes ===
 +
To connect a storage node, do the following:
 +
* Install the latest version of Tor and torsocks. Enable tor to start at boot.
 +
**[https://www.torproject.org/docs/rpms.html.en RPMs] for RHEL-based distros
 +
** [https://www.torproject.org/docs/debian.html.en DEBs] for Debian-based distros
 +
** Available in Arch's community repo
 +
* Install various dependencies.
 +
** On CentOS 6, you'll need to <code>yum -y install libffi libffi-devel python-devel openssl-devel</code>
 +
* Get the latest version of Tahoe-LAFS. Until [[User:Mjh|mhazinsk's]] patch for Torsocks 2.x gets merged, clone it from [https://github.com/matthazinski/tahoe-lafs his github].
 +
* Create a hidden service by editing the <code>torrc</code> file, usually found at <code>/etc/tor/torrc</code>. Add the following:
 +
  HiddenServiceDir /var/lib/tor/tahoe_storage/
 +
  HiddenServicePort 4456 127.0.0.1:4456
 +
* Get the hostname for the hidden service by restarting tor and running <code>cat /var/lib/tor/tahoe_storage/hostname</code>
 +
* cd to where you cloned the Tahoe-LAFS repo and do the following:
 +
** <code>python setup.py build</code> to build the necessary binaries
 +
** <code>bin/tahoe create-node ''path''</code> to create a Tahoe directory in the given ''path''. Note that your this will be used for both configuration data and encrypted blob storage.
 +
** <code>vim ''path''/tahoe.cfg</code> and make it look like the following:
 +
  [node]
 +
  # Nicknames are optional but useful
 +
  nickname = mhazinsk-2 
 +
  # Optional web interface.
 +
  web.port = tcp:3456:interface=127.0.0.1 
 +
  web.static = public_html
 +
  # This is what what you defined in tor.
 +
  tub.port = tcp:4456:interface=127.0.0.1
 +
  tub.location = yourhiddenservicehostname.onion:4456
 +
 
 +
  [client]
 +
  introducer.furl = pb://getthisstringfromanofficer@hiddenservice.onion:37204/otherstuff
 +
 
 +
  [storage]
 +
  enabled = true
 +
  # You can change this if you have less space, but less than a
 +
  # few 10's of GB is not useful
 +
  reserved_space = 100G
 +
  expire.enabled = false
 +
 
 +
  # Read tahoe's docs if you want to use the other options
 +
  [helper]
 +
  enabled = false
 +
 
 +
  [drop_upload]
 +
  enabled = false
 +
* Finally, run <code>torify bin/tahoe start ''path''</code>. This will daemonize.
 +
 +
==Troubleshooting==
 +
This is a list of various problems I've encountered. --[[User:Mjh|Mjh]] ([[User talk:Mjh|talk]]) 00:47, 30 December 2014 (EST)
 +
=== tahoe daemonized and then terminated immediately ===
 +
This can be caused by several factors when running with torsocks.
 +
* You're trying to bind to an IP other than localhost and torsocks blocked this
 +
** For the introducer '''only''', it doesn't appear to be possible to restrict the interfaces it binds to. Instead, modify <code>/etc/tor/torsocks.conf</code> and add <code>AllowInbound 1</code>. Then use iptables to deny inbound connections to non-localhost on that port.
 +
** For all other nodes (including storage nodes), modify the tub.port or web.port lines in tahoe.cfg
 +
* Tahoe is attempting to establish a UDP connection to identify its local IP address. Torsocks restricts UDP connections, causing Tahoe to throw exceptions and terminate. Use [https://github.com/matthazinski/tahoe-lafs mhazinsk's fork] until this is merged upstream.
 +
 +
==FAQs==
 
Technical documentation on tahoe can be found at its website. However, for the prospective user, here's a simple explanation in Q&A format:
 
Technical documentation on tahoe can be found at its website. However, for the prospective user, here's a simple explanation in Q&A format:
  
==What does it do?==
+
===What does it do?===
 
 
 
You set up a node with a few hundred gigs of free space and connect it to the tahoe grid. Then, you put files in it. It encrypts each file and puts part of it on ten of the nodes on the grid in such a way as to be able to recover the entire file even if up to 7 of the nodes are unavailable.
 
You set up a node with a few hundred gigs of free space and connect it to the tahoe grid. Then, you put files in it. It encrypts each file and puts part of it on ten of the nodes on the grid in such a way as to be able to recover the entire file even if up to 7 of the nodes are unavailable.
  
==But what if I don't want people seeing my files?==
+
===But what if I don't want people seeing my files?===
 
 
 
They're encrypted, remember? Each file has an automatically generated key which also tells where the file is located. You can share this "filecap" with anyone else you'd like to see the file.
 
They're encrypted, remember? Each file has an automatically generated key which also tells where the file is located. You can share this "filecap" with anyone else you'd like to see the file.
  
==So the nodes aren't trusted?==
+
===So the nodes aren't trusted?===
 
 
 
No. Files stored on them are encrypted, signed, split into pieces, and distributed among the nodes. The only way to get the file back without the filecap is by finding the storage index, retrieving the pieces from the nodes, breaking the encryption, and reassembling the pieces. This is designed to be difficult.
 
No. Files stored on them are encrypted, signed, split into pieces, and distributed among the nodes. The only way to get the file back without the filecap is by finding the storage index, retrieving the pieces from the nodes, breaking the encryption, and reassembling the pieces. This is designed to be difficult.
  
==And all these nodes are hooked together?==
+
===And all these nodes are hooked together?===
 
 
 
No. Groups of users set up grids, often arranged by geographical location for improved bandwidth and latency.
 
No. Groups of users set up grids, often arranged by geographical location for improved bandwidth and latency.
  
==How do I access files?==
+
===How do I access files?===
 
 
 
When a file is uploaded (using tahoe put or tahoe cp), tahoe gives you a filecap. A filecap looks something like this:
 
When a file is uploaded (using tahoe put or tahoe cp), tahoe gives you a filecap. A filecap looks something like this:
  
Line 31: Line 82:
 
The filecap does NOT include the name of the file or its type. Types may be found using the unix file utility. To retrieve the file, use tahoe get [filecap] [filename]. This will cause tahoe to get the filecap's shares from the nodes, reassemble them, decrypt them, verify the integrity of the file, and write it to filename.
 
The filecap does NOT include the name of the file or its type. Types may be found using the unix file utility. To retrieve the file, use tahoe get [filecap] [filename]. This will cause tahoe to get the filecap's shares from the nodes, reassemble them, decrypt them, verify the integrity of the file, and write it to filename.
  
==How do I delete files?==
+
===How do I delete files?===
 
 
 
You can't. The nodes are not trusted and therefore cannot be relied upon to remove the file's shares when asked. To render a file inaccessible, destroy all copies of the filecap. After 31 days, the file's lease will expire and its shares will be automatically garbage collected, or deleted, by the nodes.
 
You can't. The nodes are not trusted and therefore cannot be relied upon to remove the file's shares when asked. To render a file inaccessible, destroy all copies of the filecap. After 31 days, the file's lease will expire and its shares will be automatically garbage collected, or deleted, by the nodes.
  
==Wait, files expire? But I thought...==
+
===Wait, files expire? But I thought...===
 
 
 
Don't panic. To stop a file from being deleted after 1 month, simply renew its lease. The recommended way of doing this is setting up an alias using tahoe create-alias tahoe, adding the filecap to the alias, and setting up a weekly cronjob to run tahoe deep-check --renew tahoe. This will renew the leases on all the files in the alias, which is similar to a directory.
 
Don't panic. To stop a file from being deleted after 1 month, simply renew its lease. The recommended way of doing this is setting up an alias using tahoe create-alias tahoe, adding the filecap to the alias, and setting up a weekly cronjob to run tahoe deep-check --renew tahoe. This will renew the leases on all the files in the alias, which is similar to a directory.
  
==Directory?==
+
(Note: expiration was true of the old Tahoe grid. The new one (established in Dec 2014) has storage nodes that should be configured to ''never'' expire files.)
  
 +
===Directory?===
 
Yeah, you can have directories. They are implemented basically as lists of filecaps with associated filenames. They are referenced using dircaps, which come in read-write and write-only forms. As a result of storing the filecaps of the contained files inside the dircap's shares (a dircap, remeber, is treated similarly to a file with regards to storage), all files may be read as a result of knowing only the dircap. This does not, however, work in reverse. If you give another user the filecap of a file (or the dircap of a directory) in a directory, they cannot find the names or contents of the other files in the directory containing the filecap.
 
Yeah, you can have directories. They are implemented basically as lists of filecaps with associated filenames. They are referenced using dircaps, which come in read-write and write-only forms. As a result of storing the filecaps of the contained files inside the dircap's shares (a dircap, remeber, is treated similarly to a file with regards to storage), all files may be read as a result of knowing only the dircap. This does not, however, work in reverse. If you give another user the filecap of a file (or the dircap of a directory) in a directory, they cannot find the names or contents of the other files in the directory containing the filecap.
  
==How come there are no mountable filesystem frontends?==
+
===How come there are no mountable filesystem frontends?===
 
+
There are; they just aren't built-in. Tahoe's high latency makes it rather unwieldy for use as part of a conventional filesystem. Append operations in particular are extremely inefficient. It is recommended that you use the web and CLI interfaces to manage files stored in tahoe.
There are; they just aren't built-in. Tahoe's high latency makes it rather unwieldy for use as part of a conventional filesystem. It is recommended that you use the web and CLI interfaces to manage files stored in tahoe.
 
  
 
[[Category:VTLUUG Projects]]
 
[[Category:VTLUUG Projects]]
 
[[Category:Infrastructure]]
 
[[Category:Infrastructure]]
 
[[Category:Software]]
 
[[Category:Software]]

Revision as of 05:47, 30 December 2014

Tahoe-LAFS is a distributed filesystem which provides redundancy and security for files.

Connecting to VTLUUG's Tahoe Grid

VTLUUG now operates an onion grid, a grid of tor hidden services. All nodes must be tor-enabled using torify and storage nodes must also advertise a tor hidden service.

Storage nodes

To connect a storage node, do the following:

  • Install the latest version of Tor and torsocks. Enable tor to start at boot.
    • RPMs for RHEL-based distros
    • DEBs for Debian-based distros
    • Available in Arch's community repo
  • Install various dependencies.
    • On CentOS 6, you'll need to yum -y install libffi libffi-devel python-devel openssl-devel
  • Get the latest version of Tahoe-LAFS. Until mhazinsk's patch for Torsocks 2.x gets merged, clone it from his github.
  • Create a hidden service by editing the torrc file, usually found at /etc/tor/torrc. Add the following:
 HiddenServiceDir /var/lib/tor/tahoe_storage/ 
 HiddenServicePort 4456 127.0.0.1:4456
  • Get the hostname for the hidden service by restarting tor and running cat /var/lib/tor/tahoe_storage/hostname
  • cd to where you cloned the Tahoe-LAFS repo and do the following:
    • python setup.py build to build the necessary binaries
    • bin/tahoe create-node path to create a Tahoe directory in the given path. Note that your this will be used for both configuration data and encrypted blob storage.
    • vim path/tahoe.cfg and make it look like the following:
 [node]
 # Nicknames are optional but useful
 nickname = mhazinsk-2  
 # Optional web interface. 
 web.port = tcp:3456:interface=127.0.0.1  
 web.static = public_html
 # This is what what you defined in tor.
 tub.port = tcp:4456:interface=127.0.0.1 
 tub.location = yourhiddenservicehostname.onion:4456
 
 [client]
 introducer.furl = pb://getthisstringfromanofficer@hiddenservice.onion:37204/otherstuff
 
 [storage]
 enabled = true
 # You can change this if you have less space, but less than a 
 # few 10's of GB is not useful
 reserved_space = 100G 
 expire.enabled = false
 
 # Read tahoe's docs if you want to use the other options
 [helper]
 enabled = false
 
 [drop_upload]
 enabled = false 
  • Finally, run torify bin/tahoe start path. This will daemonize.

Troubleshooting

This is a list of various problems I've encountered. --Mjh (talk) 00:47, 30 December 2014 (EST)

tahoe daemonized and then terminated immediately

This can be caused by several factors when running with torsocks.

  • You're trying to bind to an IP other than localhost and torsocks blocked this
    • For the introducer only, it doesn't appear to be possible to restrict the interfaces it binds to. Instead, modify /etc/tor/torsocks.conf and add AllowInbound 1. Then use iptables to deny inbound connections to non-localhost on that port.
    • For all other nodes (including storage nodes), modify the tub.port or web.port lines in tahoe.cfg
  • Tahoe is attempting to establish a UDP connection to identify its local IP address. Torsocks restricts UDP connections, causing Tahoe to throw exceptions and terminate. Use mhazinsk's fork until this is merged upstream.

FAQs

Technical documentation on tahoe can be found at its website. However, for the prospective user, here's a simple explanation in Q&A format:

What does it do?

You set up a node with a few hundred gigs of free space and connect it to the tahoe grid. Then, you put files in it. It encrypts each file and puts part of it on ten of the nodes on the grid in such a way as to be able to recover the entire file even if up to 7 of the nodes are unavailable.

But what if I don't want people seeing my files?

They're encrypted, remember? Each file has an automatically generated key which also tells where the file is located. You can share this "filecap" with anyone else you'd like to see the file.

So the nodes aren't trusted?

No. Files stored on them are encrypted, signed, split into pieces, and distributed among the nodes. The only way to get the file back without the filecap is by finding the storage index, retrieving the pieces from the nodes, breaking the encryption, and reassembling the pieces. This is designed to be difficult.

And all these nodes are hooked together?

No. Groups of users set up grids, often arranged by geographical location for improved bandwidth and latency.

How do I access files?

When a file is uploaded (using tahoe put or tahoe cp), tahoe gives you a filecap. A filecap looks something like this:

URI:CHK:7fdtkb3smrcczbduzkg6nxex44:rvg2fwo7poziydflo5jmjmbejczunqe5emhcisxx6uefosw4in3q:3:10:102015

This string includes the location, or storage index, of the file (in this case 2zmhsnky3x34wz2c523vzery6e, which is cryptographically encoded), the keys used to decrypt the file and verify its signature, the file's size (in human-readable format), in this case just over 102kB, and the encoding of the file, in this case 3-of-10. 3-of-10 encoding means that the file is stored across 10 nodes and that at least 3 of these are required to recover the file.

The filecap does NOT include the name of the file or its type. Types may be found using the unix file utility. To retrieve the file, use tahoe get [filecap] [filename]. This will cause tahoe to get the filecap's shares from the nodes, reassemble them, decrypt them, verify the integrity of the file, and write it to filename.

How do I delete files?

You can't. The nodes are not trusted and therefore cannot be relied upon to remove the file's shares when asked. To render a file inaccessible, destroy all copies of the filecap. After 31 days, the file's lease will expire and its shares will be automatically garbage collected, or deleted, by the nodes.

Wait, files expire? But I thought...

Don't panic. To stop a file from being deleted after 1 month, simply renew its lease. The recommended way of doing this is setting up an alias using tahoe create-alias tahoe, adding the filecap to the alias, and setting up a weekly cronjob to run tahoe deep-check --renew tahoe. This will renew the leases on all the files in the alias, which is similar to a directory.

(Note: expiration was true of the old Tahoe grid. The new one (established in Dec 2014) has storage nodes that should be configured to never expire files.)

Directory?

Yeah, you can have directories. They are implemented basically as lists of filecaps with associated filenames. They are referenced using dircaps, which come in read-write and write-only forms. As a result of storing the filecaps of the contained files inside the dircap's shares (a dircap, remeber, is treated similarly to a file with regards to storage), all files may be read as a result of knowing only the dircap. This does not, however, work in reverse. If you give another user the filecap of a file (or the dircap of a directory) in a directory, they cannot find the names or contents of the other files in the directory containing the filecap.

How come there are no mountable filesystem frontends?

There are; they just aren't built-in. Tahoe's high latency makes it rather unwieldy for use as part of a conventional filesystem. Append operations in particular are extremely inefficient. It is recommended that you use the web and CLI interfaces to manage files stored in tahoe.