[FX.php List] Security Concerns with FileMaker Website
Edward L. Ford
elford at cs.bu.edu
Thu Jan 25 12:04:23 MST 2007
In the case of the website I started this thread about, it's hosted
by a major academic institution (see my email address for a clue), so
bandwidth isn't really a problem -- in August 2006, the number of
pages on our servers was between 400k and 500k (not an exaggeration!)
-- I don't know the stats, but I can only imagine our monthly data
transfer is on the order of tens of terabytes, if not more.
My main concern is about information confidentiality. The search
engines can be told where and where not to go, but the nasty bots
aren't so kind. Since the information provided on this site is valid
contact info, it's sellable info that can be easily picked up
automated bots.
The purpose of my site is essentially to be a "Musician White Pages",
so that these people may be contacted by anyone within or outside the
University for music jobs. Within this website, contact info such as
phone numbers, email addresses, personal websites, and bios are
provided, based on how much info people provide when they sign up to
be listed.
Now, when someone signs up for the service, they must agree to our
privacy policy, which includes a statement:
"By signing up for this service, you agree to release your contact
information publicly through this website. Steps have been taken to
prevent your contact information from showing up in public search
engines, but the information is still listed on a public website,
accessible to anyone in the world. By signing up for this service,
you acknowledge the risks associated with listing contact information
publicly."
I have a robots.txt file that indicates that information about the
service should be indexed, but the actual pages with people's info
are not to be indexed.
How nasty are these bots? I got an eye opener this past week. A new
entry on these white pages had been public for only 6 hours before
the email address received a spam message -- and this address was
brand new, never ever published anywhere before. This spam message
was sent to about 20 people, the majority of which were addresses
listed on this site, so I know this email was harvested from that site.
A new concern of mine is also regarding phone numbers. TXT spam
isn't big yet, but let's face it, it's coming -- and so by similar
harvesting methods, spammers can harvest phone numbers and start
sending spam TXTs to them -- it costs them nothing, but it costs the
users, and the cell companies have been slow so far to implement
controls since it's more money in their pocket.
So while these folks have agreed to the "dangers" of listing info
publicly online and I could stop there and let things take their
course, I feel it is my responsibility to do what I can to adequately
prevent abuse of the information these white pages provide for
protection of the individuals using our service, while still
maintaining a user friendly system for those users who have
legitimate needs for the contact information listed.
I could go on about my position, but I think this is long enough :-)
--Ed
------------------------------------
http://www.edwardford.net
On Jan 25, 2007, at 1:26 PM, Joel Shapiro wrote:
> Can anyone briefly explain the risks of these bots and how
> "nasty" (a term used in this thread) they can really be?
>
> I've looked at the link Gjermund posted, and it looks like the
> biggest real problems are email harvesting and use of bandwidth.
> Ed's original post was because he has many emails and websites on
> his site, but if there are just one or two contact email addresses
> on a site, what's the big risk?
>
> Thanks for enlightening me.
>
> -Joel
>
>
> On Jan 25, 2007, at 10:11 AM, Edward L. Ford wrote:
>
>> I didn't think about checking user agents. I just did a cursory
>> investigation of this, and I may personally implement this
>> system. For the bots that say they're a real browser, I'll set up
>> other roadblocks to keep the info on my site (relatively)
>> protected from harvesting.
>>
>> For others interested in checking user agents, I found the
>> following to investigate:
>> o) look in the $_SERVER['HTTP_USER_AGENT'] string. More info @
>> http://us3.php.net/reserved.variables There's a built in
>> get_browser() function in PHP, which I never knew.
>>
>> o) PEAR also has some tools: Check out http://pear.php.net/
>> package/Net_UserAgent_Detect
>>
>> --Ed
>>
>> -----------------------------------
>> http://www.edwardford.net
>>
>> On Jan 25, 2007, at 11:41 AM, Gjermund Gusland Thorsen wrote:
>>
>>> Alternative 3 works in theory but still leaves some bots crawling,
>>> as the worst bots tells your webserver that it's the most popular
>>> browser.
>>>
>>> ggt667
>>>
>>> On 1/24/07, Jason LEWIS <jasonlewis at weber.edu> wrote:
>>>> Ed,
>>>>
>>>> There are a few things you can do:
>>>>
>>>> 1. Sue to get them to stop (I don't like this option because it
>>>> makes
>>>> enemies and takes a VERY long time.)
>>>> 2. use a robots.txt (This option is good for bots that actually
>>>> honor
>>>> this. Google honors the robots.txt.)
>>>> 3. Detect the type of browser before you give them anything and
>>>> ship
>>>> out bad information for unwanted browser types. (This is only
>>>> good if
>>>> the bot owner does not imitate browser variables.)
>>>>
>>>> As for my options, I prefer #3 as I can give the bots something
>>>> that
>>>> they are looking for, bad information. In perl, I use
>>>> $ENV{HTTP_USER_AGENT}, but I am not sure how to call this in php.
>>>> Anyone else know? A quick search returned nothing on this.
>>>> Could this
>>>> involve $HTTP_ENV_VARS?
>>>>
>>>> Jason
>>>>
>>>> >>> elford at cs.bu.edu 01/23/2007 10:18 PM >>>
>>>> Hello everyone,
>>>> In the past hour, I've done some analysis of various logs and
>>>> emails,
>>>>
>>>> and I've come to a chilling realization that I've never had before
>>>> about bots harvesting information from websites -- I knew it
>>>> happened, but I never knew the scope of the problem until
>>>> tonight --
>>>> and this is a low traffic website!
>>>>
>>>> So, I have a website which contains a public listing of email
>>>> addresses and websites from a FileMaker database. I want to stop
>>>> unknown bots from crawling the site. All of the data comes out of
>>>> FileMaker, nicely formatted as links for the end user's clicking
>>>> convenience. I have a solution to fix email addresses from being
>>>> harvested, but I was wondering if anyone knows of a way to prevent
>>>> website addresses from being harvested, but still clickable as a
>>>> hyperlink.
>>>>
>>>> I thought maybe a PHP redirect link, like redirect.php?id=16 where
>>>> redirect puts a user at the website listed in record 16, but
>>>> once the
>>>>
>>>> PHP is all said and done, we're still at the linked website, so
>>>> that
>>>> doesn't really prevent anything from being harvested.
>>>>
>>>> Is there a way to maybe detect is a link was actually clicked by a
>>>> person, and not just passed through by an automated bot? PHP is
>>>> preferable for such a solution -- JavaScript is too easy to turn
>>>> off. Or, is there a way to specify that only bots from places like
>>>> Google, Live, and Yahoo are allowed to crawl the site?
>>>>
>>>> Hopefully my predicament is clear. I need to solve this ASAP...
>>>>
>>>> --Ed
>>>> ---------------------
>>>> http://www.edwardford.net
>>>>
>>>>
>>>> _______________________________________________
>>>> FX.php_List mailing list
>>>> FX.php_List at mail.iviking.org
>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>>
>>> _______________________________________________
>>> FX.php_List mailing list
>>> FX.php_List at mail.iviking.org
>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>
>> _______________________________________________
>> FX.php_List mailing list
>> FX.php_List at mail.iviking.org
>> http://www.iviking.org/mailman/listinfo/fx.php_list
>
> _______________________________________________
> FX.php_List mailing list
> FX.php_List at mail.iviking.org
> http://www.iviking.org/mailman/listinfo/fx.php_list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.iviking.org/pipermail/fx.php_list/attachments/20070125/4278c658/attachment-0001.html
More information about the FX.php_List
mailing list